Font Size: a A A

The Research In The Improvement And Optimization Of Chinese Automatic Word Segmentation

Posted on:2014-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhangFull Text:PDF
GTID:2268330422467265Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese automatic segmentation is an important basic issue in Chinese informationprocessing. It is the basis for many applications such as Information Extraction, InformationRetrieval, Data Mining, Machine Translation, and Question Answering System. This thesismake a comprehensive study about the major techniques in Chinese automatic Segmentationfield, including the structure of the dictionary for Chinese automatic word segmentation,and the Chinese word segmentation algorithm. Besides, we carried a relatively deepresearch on the difficulties of Chinese automatic word segmentation. Finally, combining thecurrent popular search engine technology, we described an application of ChineseAutomatic Segmentation.The main contributions of this paper are as follows:First of all, we conducted extensive and intensive studies on the dictionary structureabout the Chinese automatic word segmentation. On the basis of analyzing the existingclassic dictionary structure dictionaries and learning from the many improvementsDictionary structure, a new dictionary structure based Multi-Hash AVL-Tree is presented.Second, we achieved good results in the research of Named Entity Recognition.Combined with and draw on existing research results, we designed a new Chinese nameidentification method and gave a concrete realization process.Third, we achieved good results in the research of Chinese organization namerecognition. On the basis of the CRF statistical model and integrate into the rules andknowledge of the field of linguistics, we established a name method for recognizing Chineseorganization name based on both statistical analysis and linguistic rules. The experimentalresults show that the closed test precision and recall rate reached91.68%and95.21%,respectively. The method provided a practical new idea to the field of institutional namerecognition.Finally, by combining the application of search engine in the current era of the "BigBang", we did a more detailed description of Chinese Automatic Segmentation, whichpromotes Chinese automatic segmentation techniques, and makes a good point to theoptimization and development of new search engines.
Keywords/Search Tags:Chinese automatic segmentation, Organization Name Recognition, NamedEntity Recognition, Search Engine, Dictionary Structure
PDF Full Text Request
Related items