Arithmetic Research For Multi-label Chinese Text Classification

Posted on:2015-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhou

Full Text:PDF

GTID:2298330452464145

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the explosion of massive Internet information,text classificationtechnology has gradulally become an emerging core technology in datamining. Therefore, the study of classification algorithm performanceevaluation has also become an important issue. Information through thelabel text classification can be quickly and accurately positioning to itsrelated topics and categories.At present,this field mainly focuses on theformer feature selection and the research and development of the algorithm.This paper pays more attention to the classification characteristics of highdegree of differentiation of texture,deletes the sparse feature,and reservesfor classification feature dimension.Then, based on the results of thefeature selection,put forwards a Adaptive Algorithm for Multi-LabelClassification Based on Related Information Weighting,which featured asthe Single-Label classification result weighting, adaptive thresholdadjustment,related information noting.In the Chinese label text feature selection algorithms, this paper goalsto find an effective feature selection method, which reduces the dimensionof feature space, and improves the classification accuracy andefficiency.Due to the characteristics of the frequency of the uneven in thedocument class, namely the discrete characteristics of distribution, tend tojudge the more important characteristics of category, make use of this nature can be important degree of examination features inclassification.Discrete degree usually can be used to calculate the standarddeviation or variance, this article USES the characteristics of probabilitystandard deviation in the document class to quantitative description ofcharacteristics of importance, this features probability standard deviationwill be used as the basic weight in text categorization.Experimental resultsshow that the algorithm on the performance indexes of some commonlyused tabbed text categorization is superior to the existing feature selectionmethods, a combination of a variety of feature evaluation function tochoose feature subset, not limited by the specific text corpus, to reduce theeffect of "noise".For multi-label in Chinese text classification algorithm, this papercombines the problem and improvement of tabbed algorithm, proposed is akind of feature selection benchmark adjusted, based on the existing singletag classification results are weighted, adaptive threshold setting, thecombination of different weighted voting method, classification instancetreated with more tags, can improve the classification accuracy andprecision of the tabbed text classification. Experimental results show thatthe algorithm provides a more effective and classification reliability highermultiple tags classification algorithm, in some performance index issuperior to the existing tabbed classification of some commonly usedmethods.

Keywords/Search Tags:

Multi-Label, Feature Selection, Adaptive Regression, RelatedInformation Weighted Noting, Strong Category Texture

PDF Full Text Request

Related items

1	Research On Label Weighted Multi-label Feature Selection Algorithm
2	Research On Multi-label Feature Selection Based On Weighted Labels And Consistent Neighborhood
3	The Research Of Multi-Label Learning Problem About Feature Selection And Classification
4	Streaming Feature Selection Algorithm Research For Multi-label Classification
5	Research On Acquisition And Application Of Label Correlation In Multi-label Learning
6	Multi-label Feature Selection Method In The Context Of Missing Labels
7	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
8	Research On Feature Selection Algorithm Based On Multi-label
9	Research On Multi-label Feature Selection Algorithm Based On Sparse Learning
10	Feature Selection Research For Multi-label And Weak-label Based On Fuzzy Entroy