Font Size: a A A

Research On Chinese Word Segmentation Based On Machine Translation Technology

Posted on:2020-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y K WeiFull Text:PDF
GTID:2428330590958381Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese word segmentation(CWS)is the most important step in Chinese natural language processing.The result of word segmentation will seriously affect the performance of subsequent tasks.Most of the current research work uses deep learning methods to carry out Chinese word segmentation.Most of the work only uses some local context information in sentences.In recent years,more attention has been paid to the new idea of treating CWS as a machine translation(MT)problem.The Chinese word segmentation method based on this idea uses machine translation model to process the whole sentence directly,which can effectively utilize the global context information.However,the incorrect translation produced by the MT model during the translation process leads to a decrease in the accuracy of the word segmentation.By studying the differences between MT and CWS,a new translation word segmentation method based on cyclic correction strategy is proposed to solve the problem of translation errors.In the process of translation,this method directly uses the original sentence of the word to be segmented to correct the wrong translation and improve the accuracy of segmentation.CWSTransformer,a Chinese word segmentation model integrated with translator and error corrector,is designed and implemented.The translator is used to realize preliminary word segmentation,and the corrector is used to correct the translation results.Machine translation model is adopted in the translator.In the error correction module,translation word segmentation method based on cyclic correction strategy is adopted.By improving the output part of CWSTransformer translator,a faster Chinese word segmentation model CWSTransformer-S is obtained.The experiment compares different translation segmentation methods on standard data sets PKU and MSR.The experimental results show that under the same experimental environment,the word segmentation method based on CWSTransformer can exceed the other translation segmentation methods by 4.2 percentage points.CWSTransformer-S can increase the word segmentation speed by up to 5.5 times compared to CWSTransformer.
Keywords/Search Tags:Deep learning, Chinese word segmentation, Machine translation, Translation correction
PDF Full Text Request
Related items