Font Size: a A A

Text Categorization Based On The Conditional Random Fields

Posted on:2011-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:S G ZhangFull Text:PDF
GTID:2178360308958133Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increasement of the electronic text day by day, it is a problem that the modern information technology facing to organize and manage these information efficiently to help users find their needed information quickly.As an important technology of information management, text categorization has become a major research direction of Information area. Word segmentation which is a important factor to text categorization has also become a hot point of research.There's no inherent interval between Chinese words, so word segmentation is required before information extraction. There're two main methods of Chinese word segmentation: Dictionary-based word segmentation and Statistics-based word segmentation. The Dictionary-based method can be quickly in the segmentation, but it is faced with the problem of ambiguities. The Statistics-based word segmentation performs well in precision. The paper proposed a Dictionary-based Word segmentation method combined with CRFs. It labels the place where an ambiguity may happen to solve the ambiguity problem so that the precision can be improved. The word segmentation system based on the method has the features from both of the two. The introduction to the system and its trial can also be found in the paper.The definition of text categorization is that categorizing the category-unknown text into the predefined category. The paper proposed a text categorization method based on CRFs. As a statistics model, CRFs can improve the effectiveness of text categorization by integrating all kinds of text information. In the paper, the model construction and system implementation are introduced. The influence of term and term group to the effectiveness of text categorization is investigated through the trials. It is proved that, the method can get a good result in text categorization.
Keywords/Search Tags:Chinese Word Segmentation, Text Categorization, Conditional Random Fields, Dictionary
PDF Full Text Request
Related items