Font Size: a A A

Research And Improvement Of ICTCLAS Chinese Lexical Analysis System

Posted on:2013-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:G Y DaiFull Text:PDF
GTID:2348330518989049Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Chinese Word Segmentation is the process that combines the sequence of Chinese characters into a word sequence according to certain rules.It is an important part in Chinese information processing system.It is one of the basic steps in information system such as the Chinese literature retrieval,search engine,machine translation(MT),speech synthesis.The accuracy and processing speed of Chinese word segmentation will directly affect the process of follow-up system.In order to improve the performance of the Chinese word segmentation in the condition of ensuring the speed of the Chinese word segmentation and improving the accuracy of Chinese word segmentation,research and development of high-performance Chinese word segmentation system has become a hot spot in recent years.Chinese lexical analysis system ICTCLAS(Institute of Computing Technology,Chinese Lexical Analysis System)is the best open-source Chinese lexical analyzer in the world.In this paper,through the study of ICTCLAS Chinese lexical analysis system,combined with the existing Chinese word segmentation results,dictionary mechanism,number and time word recognition rules,organization name recognition and Implementation Based on Hidden Markov segmentation algorithm are improved.The improved newICTCLAS segmentation system uses double array Trie tree structure dictionary mechanism,perfects matching rules of the human names,the location names and number words,at the same time,adds time word recognition and organization name unknown word identification,and achieves the Hidden Markov segmentation algorithm based on the class.The experimental results show that the improved newICTCLAS segmentation system compared with the ICTCLAS system in the segmentation accuracy,processing speed,recall and precision have been improved,it is proved that the improved system is advanced.
Keywords/Search Tags:Chinese word segmentation, ICTCLAS, hidden Markov model, double array Trie tree algorithm
PDF Full Text Request
Related items