Font Size: a A A

Research On Chinese Word Segmentation Method Based On Two-way Long And Short-term Memory Model

Posted on:2020-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhengFull Text:PDF
GTID:2438330599955744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation is the key link of semantic understanding and the bottleneck of Chinese information processing.Chinese word segmentation is also a research difficulty in word segmentation technology because of its unique writing style and its own complexity.At present,four main methods have been used for Chinese word segmentation that they include character matching method,rule-based method,statistical method and deep learning method.Deep learning can effectively learn atomic features and context representation by optimizing the ultimate goal,and avoid tedious feature engineering,which can more effectively depict long-distance sentence information.At present,the universal word segmentation tools do not work well in the field such as metallurgy.Domain-specific Chinese word segmentation is rarely studied.Specific domain contains corresponding domain knowledge,domain concepts,and domain terms.Due to the particularity of the specific field,the existing word segmentation tools can not achieve a good word segmentation effect.In recent years,with the development of deep learning,neural network has been proved to be effective in Chinese word segmentation.However,this promising performance relies on large-scale training data.Neural networks with conventional architectures cannot achieve the desired results in low-resource datasets due to the lack of labelled training data.For small-scale specific field such as metallurgy,a domain-specific Chinese word segmentation method based on Bi-directional Long-Short Term Memory(Bi-directional LSTM)model is proposed in this paper.The Bi-directional LSTM neural network model is one that combines the advantages of the Bi-directional RNN and LSTM models.It can capture long-distance information.Firstly,we use the Chinese word segmentation method based on the Bi-directional LSTM model to segment the common data set,and compare it with the previous work.The experimental results show that the Bi-directional LSTM model can achieve a better segmentation effect.Then the Bi-directional LSTM model is applied to the metallurgical field and a domain-specific Chinese domain word segmentation method based on the Bi-directional LSTM model is proposed.The method uses ensemblelearning,combines the weight of label probability and the optimal transition probability to solve the result of word segmentation.The experimental results show that the proposed domain-specific Chinese word segmentation method can achieve better segmentation results.Finally,the corresponding word segmentation system is designed.By choosing the specific field,the domain-specific Chinese word segmentation method can be applied to the task of word segmentation in the corresponding field,and it has a certain domain applicability.
Keywords/Search Tags:Bi-directional Long-Short Term Memory(Bi-directional LSTM) model, Domain-specific Chinese word segmentation, Ensemble learning, Combination of weight
PDF Full Text Request
Related items