Font Size: a A A

Research On Feature Selection Of Text Classification

Posted on:2011-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:J SongFull Text:PDF
GTID:2248330338496199Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology and improvement of Internet technology, the number of documents is exponentially increased. How to effectively access the information people need has already become the question urgently awaited to be solved in the information processing domain. One of effective methods to management texts is to classify them,also called text classification. So the text categorization is an effective solution, which can help users to locate, organize and manage their information effectively.This paper firstly introduces the conception of text categorization and explains the process and difficulties in text classification. This thesis analyzes the essential technologies detailedly, such as pretreatment of text, text expression, weight computation and so on. However, the feature selection of text classification has always been a key and bottle-neck technology of text classification. So, the thesis is focused on feature selection algorithms. The thesis deeply researched and evaluated many texts feature selection algorithm. This paper proposes a new method TDE based on TFIDF by applying traditional feature item weighting function TFIDF to feature selection, combining the knowledge of information entropy. And we apply TDE to text classification. Also this paper discusses several general text classification methods. The dissertation analyzes and compares the advantages and disadvantages of those classification methods as well.Finally, the rules for evaluating text categorization performance are discussed in the dissertation. Experimental results show that the TDE is feasible compared with other traditional methods.
Keywords/Search Tags:Text categorization, Feature selection, Vector Space Model, TFIDF algorithm
PDF Full Text Request
Related items