Font Size: a A A

A Feature Extraction Method Using Base Phrase And Keyword

Posted on:2010-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhaoFull Text:PDF
GTID:2178360302461956Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The fast development of Internet has made the rapid increase of information. How to get the useful information within such a huge amount messages has become a problem which must be resolved immediately. Automatic text analysis could be an effective way to solve this problem, and one of the main techniques in text analysis is text categorization. Text categorization let the users find the helpful information based on their own requirements. So the text categorization technology can not only increase the efficiency of its usage, but also have extensive and great importance in research and business application.The text categorization technology has four parts, including preprocess, feature extraction, weight calculation and classification. Feature extraction is the key of the categorization, and WSD and the dimension reduction of the vector space are always the difficulties in feature extraction. The methods in feature extraction are presented emphatically. The traditional text categorization methods usually use words as text features for the extraction, but the content covered by a single word is quite limited. According to this problem, the uniqueness of the feature item is improved in this paper, and a method is explored on the basis of the mixed mode of basic phrases and words. We start with POS tagging (part-of-speech tagging) and then CILIN is used to tackle the synonyms and ploysemes. Not only WSD is successfully applied to the feature items both semantically and grammatically, but the vector space dimension is reduced.The text categorization experiments are done with KNN classifier and SVM (support vector machine). The experiment data are shown that the precision and recall of the categorization are improved.
Keywords/Search Tags:Text categorization, Feature extraction, BaseNP, BaseVP
PDF Full Text Request
Related items