Font Size: a A A

A Text Automatic Classification System Of Class-Based Feature Selection Algorithm

Posted on:2005-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:W Z JiangFull Text:PDF
GTID:2168360125954655Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text Automatic Classification is an important application field of natural language process, an efficient means and necessary trend to substitute the troubled traditional manual classification. Especially, with the development of Internet technology, the network becomes an effective platform for people to exchange and process information, and digital information increases daily with high speed. Facing such a great deal of information, manual classification becomes helpless, and must be substituted by Text Automatic classification.Recently, for the study of Text Automatic Classification technology, researchers mostly focus on the exploration and improvement of different classification algorithms. However, the feature selection of Text Classification has always been a key and bottle-neck technology of Text Classification. So, it is necessary to study feature selection algorithms.This dissertation fully discusses several technologies involved in Text Classification, and specially analyzes the performance, the merit and the defect of various feature selection algorithms frequently used in text classification. By above analysis, the author noticed that current feature selection algorithms are all based on term frequency, and neglect the class information in the trainning sample set For this, the author puts forward a new feature selection algorithm based on class information, and designs an English Text Automatic Classification system on the basis of this algorithm.Then, the dissertation determines the initial threshold value according to the classification performance under different threshold values, and has done the tests for this system under different experimental conditions with the initial threshold value and huge actual text set, including the sytem performance under the close and open tests; the performance with different original term dimensions; the system performance comparing with SVM classifier and Naive Bayes classifier under the same condition. After this, the dissertation analyzes the test results, and gets the conclusion that this algorithm can improve classifier's performance tosome extent Furthermore, by comparing the trainning time of the classifier with that of Naive Bayes classifier, the dissertation analyzes the efficiency of feature selection algorithm and the classifier based on this algorithm.Finally, through the analysis of above realization technology and the test result, the dissertation proposes some opinion about Text Automatic Classification and feature selection algorithm, and gives the expectation for future work on them.
Keywords/Search Tags:Text Automatic Classification, Feature selection, VSM model, SVM, Naive Bayes
PDF Full Text Request
Related items