正向最大匹配（自然语言处理）(机器学习)

本文主要是介绍正向最大匹配（自然语言处理）(机器学习)，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

代码如下：

# -*- coding:utf-8 -*-
"""
author: 15025
time: 2021/8/4 9:05
software: PyCharmDescription:正向最大匹配：Maximum Match Method(MM)
"""class MM:def __init__(self, dict_path):# define a dictionary setself.dictionary = set()# define a variableself.maximum = 0# read dictionarywith open(dict_path, 'r', encoding="utf-8") as f:for line in f:line = line.strip()# jump the blank row in IMM_Dict fileif not line:continue# add reading element in our dictionaryself.dictionary.add(line)# get the maximum length of phrase in our dictionaryif len(line) > self.maximum:self.maximum = len(line)# print the element in dictionary# print(self.dictionary)def cut(self, text):# create a list to save the final resultresult = []# get first index of stringindex = 0# if text is not bland, start matching processwhile index < len(text):word = None# start from the index of first word and end at the index of first word(len(text))# use the maximum matching phrase to matchfor size in range(self.maximum, 0, -1):# if the final index exceed the len(text), keep doing loopif index + size > len(text):continue# get textpiece = text[index:(index+size)]if piece in self.dictionary:word = pieceresult.append(word)index += sizebreak# if no matching is find, just increase the index value by 1if word is None:index += 1return resultif __name__ == '__main__':text_ = "西安市大雁塔"file_path = r"C:/Users/15025/Desktop/NLP/IMM_Dict.txt"NLP = MM(file_path)print(NLP.cut(text_))
"""
['西安市', '大雁塔']
"""