Font Size: a A A

New Words Identification Base On Feature Filter

Posted on:2013-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhuFull Text:PDF
GTID:2235330374455128Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
New words automatic identification is an important link in the languagemonitoring, is an important means of study of the new words. In addition, thefurther development of this technology can effectively promote the developmentof Chinese information processing and dictionary compilation. New wordsautomatic identification is essential to contrast new and old string, this needs getstrings from corpus, but it will produce a lot of junk strings. No matter base onrules or statistical method, the junk strings will seriously affect the result.Therefore, through the analysis of the different characteristics of the new words,put forward a method which base on feature filter. Some of the poor ability oflanguage ingredient will be deleted by this method before getting strings. It isable to reduce the string of generation effectively. In the garbage string of filterstage, according to the structure characteristics of the new words, put forward afilter method which base on bi-gram structure. It’s able to reduce the junk stringswhich are composited by three or more fragments. To investigate the candidatein terms of several statistical characteristics of the values, for instance, theprobability of combination, word-formation power and average mutualinformation. Without using statistical model, the precision is0.15%, the recall is86.22%, on the contrary, the precision is43.86%and recall is49.92...
Keywords/Search Tags:new words, feature filter, bi-gram structure, average mutualinformation
PDF Full Text Request
Related items