The Application And Evaluation Studies Of Multiconlitron In Text Categorization

Posted on:2017-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:M L Wang

Full Text:PDF

GTID:2348330563952180

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Text categorization is a process of labelling text according to definite standards by computer automatically.It's a popular research direction in the field of machine learning,and it includes text representation,classifier selection and training,evaluation method usually.Some methods are widely used at present,such as Naive Bayes text classifier?Naive Bayes?,support vector machine?SVM?,k-Nearest Neighbour?KNN?etc.As a hot research topic in the field of pattern recognition,the piecewise linear classifier is characterized by its decision to determine the surface which is composed of several hyperplane segments,compared with the general hypersurface,this hypersurface requires less memory consumption with more simply and easier to implement.At the same time,It can approach various shapes hypersurface,so it has strong ability to adapt.Currently,scholars also have raised a number of methods to design it.Such as committee machine,Linear Programmin,multiconlitron.etc.As a commonly used framework,the multiconlitron has the advantages of low computation cost and strong adaptability especially.However,the performance of multiconlitron in the field of text classification needs further research and evaluation.In this paper,we propose a text categorizations method based on multiconlitron form the perspective of piecewise learning,and evaluate their performance in text categorization.It includes the following three aspects:1)We introduce the framework of multiconlitron.multiconlitron as a general theoretical framework for designing piecewise linear classifiers which is the theoretical basis in the paper.So,the crucial content of this paper is to study and evaluate the application of multiconlitron in the classic text classification field.At the same time,in this process,we need to focus on is how to applied these methods to the text classification fields efficiently.To solve this problem,firstly,the text is expressed as a unified form using vector space model to be treated by computer,and then we use some classification algorithms in text categorization.2)We research a range of methods of feature extraction and empowerment.Some widely used feature extraction methods are described in this paper,and they are studied and assessed the effect?tf.idf?tf.?²?tf.ig?tf.rf?on the classification result comprehensively.3)We apply the multiconlitron framework to the field of text classification.The text dimension may be still high after the process of feature extraction.To avoid of the high cost memory when deal with the text with high dimension.we use the principal component analysis method to reduce the sample feature space dimension.A number of experiments on the Reuters-21578 and RCV1₄ data sets were executed comparing the traditional support vector machine and k-Nearest Neighbour in text classification.and using the assment value like accuracy?precision?recall?F1 to show and evaluate the multiconlitron performance in text classification.

Keywords/Search Tags:

Text categorization, Piecewise linear classifier, Multiconlitron, Feature empowerment

PDF Full Text Request

Related items

1	Research On Construction Of Piecewise Linear Classifiers In The Multiconlitron Framework
2	A Study On Chinese Text Categorization
3	The Research Of Multiclass Categorization Algorithm Based On Multiconlitron
4	Studies On Some Essential Problems In Automatic Text Categorization
5	The Research And Implementation Of Automatic Text Categorization For Chinese Web Documents
6	The Text Categorization Algorithm Based On Nearest Subspace Search
7	A Study On Chinese Text Automatic Categorization
8	A Study On M3-kNN Network And Application In Text Categorization
9	Research On The Method Of Chinese Text Categorization Based On Machine Learning
10	Research On XML Text Categorization Based On Bayesian Classifier