Font Size: a A A

Research On Chinese Word Segmentation Method Based On Multi-model

Posted on:2020-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:D H LiFull Text:PDF
GTID:2428330578969609Subject:Engineering
Abstract/Summary:PDF Full Text Request
Character-based tagging method has become an effective method in the field of Chinese word segmentation by means of excellent learning algorithms.However,due to the different language functions and meanings of Chinese characters,which lead to the differences in the word formation rules of each character.Therefore,multi-model word segmentation modeling based on characters becomes a word segmentation strategy.However,there are two phenomena in the existing word segmentation methods:(1)Word segmentation modeling based on characters can learn the special word formation rules of each character,but ignore the commonality in the word formation rules.The problem causes redundancy of the models;(2)The rise of representation methods based on neural networks makes automatic feature learning possible.The use of neural networks for automatic representation learning has become a natural choice in the field of word segmentation.This paper proposes corresponding methods to solve the above-mentioned word segmentation phenomena:For phenomena one,this paper proposes a multi-model Chinese word segmentation method based on character clusters.The method tries to explore the distribution structure of word formation rules by means of clustering algorithm,which can be used as the basis of word segmentation modeling,and the feature extraction and model training are re-examined.The experimental results show that the proposed method can reduce the number of models and avoid the redundancy of the models while improving the performance of word segmentation.For phenomena two,this paper proposes a Chinese word segmentation method based on multiple Bi-LSTM models.The method is based on word segmentation modeling for each character,and uses the unique advantages of Bi-LSTM neural network structure to perform automatic feature learning,which avoids the influence of feature engineering on word segmentation performance.At the same time,in order to avoid the model redundancy problem,the clustering idea is introduced,and the Chinese word segmentation method based on multiple Bi-LSTM models of character cluster is proposed.The experimental results show that the performance of word segmentation can be greatly improved by using Bi-LSTM neural network.Finally,a Chinese word segmentation system based on multi-model is designed and implemented.The system can perform word segmentation on input sentences or texts,and supports the functions such as word segmentation details display.
Keywords/Search Tags:Chinese Word Segmentation, Multi-model, Clustering Algorithm, Neural Networks
PDF Full Text Request
Related items