Font Size: a A A

Comparative Research On Open-Source Chinese Word Segmentation Machines

Posted on:2014-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y B HuangFull Text:PDF
GTID:2248330398477666Subject:Information Science
Abstract/Summary:PDF Full Text Request
Chinese word segmentation is the basis of Chinese information processing. With Chinese word segmentation technology continues to in-depth study, more and more Chinese word segmentation machines born. These open-source-software bring a lot of convenience for research and exchanges of Chinese word segmentation machines, however problems also come. Confronted with so many free Chinese word segmentation machines, how can we choose according to the specific application when we need? In order to solve this problem, In the paper the author selected eight more representative ones as the research objects from26open-source Chinese word segmentation machines. Based on different evaluation criterion the author compared and analyzed their performance. And finally based on the level of their performance, the rank of the8Chinese word segmentation machines was made in order to provide reference for people to select a better one.Word accuracy, word segmenting speed, unknown word recognition and resource overhead are all the evaluation criterions to measure the pros and cons of the Chinese word segmentation machines’performance In order to be tested in accordance with these standards, the paper designs five tests. They are the segmentation effect test, word accuracy test, test of Onomastics identification, word segmentation speed test and resource overhead test. Through comparative analysis of the experimental results, the selected objects of study are evaluated and provide a reference to the choice of applications for people.
Keywords/Search Tags:words segmentation machine, Chinese Word Segmentation, Corpus, Segmentation Speed, Unknown Word Recognition
PDF Full Text Request
Related items