Font Size: a A A

The Application And Evaluation Studies Of Multiconlitron In Text Categorization

Posted on:2017-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:M L WangFull Text:PDF
GTID:2348330563952180Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text categorization is a process of labelling text according to definite standards by computer automatically.It's a popular research direction in the field of machine learning,and it includes text representation,classifier selection and training,evaluation method usually.Some methods are widely used at present,such as Naive Bayes text classifier?Naive Bayes?,support vector machine?SVM?,k-Nearest Neighbour?KNN?etc.As a hot research topic in the field of pattern recognition,the piecewise linear classifier is characterized by its decision to determine the surface which is composed of several hyperplane segments,compared with the general hypersurface,this hypersurface requires less memory consumption with more simply and easier to implement.At the same time,It can approach various shapes hypersurface,so it has strong ability to adapt.Currently,scholars also have raised a number of methods to design it.Such as committee machine,Linear Programmin,multiconlitron.etc.As a commonly used framework,the multiconlitron has the advantages of low computation cost and strong adaptability especially.However,the performance of multiconlitron in the field of text classification needs further research and evaluation.In this paper,we propose a text categorizations method based on multiconlitron form the perspective of piecewise learning,and evaluate their performance in text categorization.It includes the following three aspects:1)We introduce the framework of multiconlitron.multiconlitron as a general theoretical framework for designing piecewise linear classifiers which is the theoretical basis in the paper.So,the crucial content of this paper is to study and evaluate the application of multiconlitron in the classic text classification field.At the same time,in this process,we need to focus on is how to applied these methods to the text classification fields efficiently.To solve this problem,firstly,the text is expressed as a unified form using vector space model to be treated by computer,and then we use some classification algorithms in text categorization.2)We research a range of methods of feature extraction and empowerment.Some widely used feature extraction methods are described in this paper,and they are studied and assessed the effect?tf.idf?tf.?2?tf.ig?tf.rf?on the classification result comprehensively.3)We apply the multiconlitron framework to the field of text classification.The text dimension may be still high after the process of feature extraction.To avoid of the high cost memory when deal with the text with high dimension.we use the principal component analysis method to reduce the sample feature space dimension.A number of experiments on the Reuters-21578 and RCV14 data sets were executed comparing the traditional support vector machine and k-Nearest Neighbour in text classification.and using the assment value like accuracy?precision?recall?F1 to show and evaluate the multiconlitron performance in text classification.
Keywords/Search Tags:Text categorization, Piecewise linear classifier, Multiconlitron, Feature empowerment
PDF Full Text Request
Related items