Font Size: a A A

Research On Application Of Waf In Text Processing

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2248330398971030Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The most basic and most important research in Chinese text processing and natural language processing is word segmentation and new word identification, the result of which affects the following research in text processing and natural language processing.There are some shortcomings of existing methods, such as relying on dictionary, relying on labeled corpus and low efficiency of low-frequency words’identification. This paper amend WAF model on the basis of Bi-Gram language model, and proposed a WAF-based and statistics-based unsupervised machine learning thought which does not rely on dictionary and labeled corpus to deal with word segmentation and new word identification at the same time.For word segmentation and new word identification, this paper tests the maximum matching method, inbound link and out link comparing method and sorting method, and proposes a method which contains dynamic programming and iteration at last. This method improves the efficiency of low-frequency words’ identification by using the relationship among words, completes word disambiguation by using dynamic programming, and also filters garbage strings by using the result of word segmentation.This paper collects1000,000messages from micro blog for experiment. The result shows that the WAF based methods can effectively solve those problems, and WAF model has a good application effect for text processing.
Keywords/Search Tags:Word Activation Force, Text Processing, wordsegmentation, new word identification, dynamic programming
PDF Full Text Request
Related items