Font Size: a A A

Research And Implementation Of Chinese Abbreviations Reduction Methods Based On Statistics

Posted on:2017-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2358330485495687Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid pace of life, for the economic principle of saving time and labor, people use more and more abbreviations in expression. However, the use of abbreviations brings convenience to peoplebut also brings inconvenienceto many fields. For example, in information extraction, abbreviation as noise can reduce the extraction results, in machine translation, the presence of abbreviations cause bad influence on translation accuracy and so on. Therefore, how to expand Chinese abbreviations accurately has become an important challengefor natural language processing.In this paper, on the basis of deeply analyzing the abbreviation featuresinnews text, we use N-gram language model, statistical machine translation modeland semantic similarity method to explore Chinese abbreviation expansionproblem. Specifically, this paper launches the research from the following two aspects:(1) Chinese abbreviation expansion based on language model and machinetranslation model.We construct an abbreviation expansion knowledge base to form abbreviation expansion candidates based on the language model, in the framework of N-gram model, we decode the abbreviation expansion candidates and choose the best candidate; to expand Chinese abbreviation based on machine translation model, we construct word-based and phrase-based machine translation model and use Moses to get target language which contains abbreviation expansion. The experimental results show that phrase-based machine translation has a betterexpansion performance.(2) Chinese abbreviation expansion based on semantic similarity:from the semanticpoint of view, we use semantic feature to fulfill Chinese abbreviation expansion task. Firstly, we unsupervisedtrain out word vectors by using word embedded model from a lot of un-annotated newstext. Secondly, we get word vectors of expansion candidates and their contexts. Finally, we select the best expansion candidate by computing the semantic distance and complete the Chinese abbreviation expansion task. At the same time, we also combine language model and semantic similarity to expand Chinese abbreviation. The experimental results show that the method based on N-grams combined with word embeddings semantic similarity is effective for Chinese abbreviation expansion.
Keywords/Search Tags:Chinese abbreviationexpansion, language models, machine translation models, semantic similarity, word embeddings
PDF Full Text Request
Related items