Font Size: a A A

Arithmetic Research For Multi-label Chinese Text Classification

Posted on:2015-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:2298330452464145Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosion of massive Internet information,text classificationtechnology has gradulally become an emerging core technology in datamining. Therefore, the study of classification algorithm performanceevaluation has also become an important issue. Information through thelabel text classification can be quickly and accurately positioning to itsrelated topics and categories.At present,this field mainly focuses on theformer feature selection and the research and development of the algorithm.This paper pays more attention to the classification characteristics of highdegree of differentiation of texture,deletes the sparse feature,and reservesfor classification feature dimension.Then, based on the results of thefeature selection,put forwards a Adaptive Algorithm for Multi-LabelClassification Based on Related Information Weighting,which featured asthe Single-Label classification result weighting, adaptive thresholdadjustment,related information noting.In the Chinese label text feature selection algorithms, this paper goalsto find an effective feature selection method, which reduces the dimensionof feature space, and improves the classification accuracy andefficiency.Due to the characteristics of the frequency of the uneven in thedocument class, namely the discrete characteristics of distribution, tend tojudge the more important characteristics of category, make use of this nature can be important degree of examination features inclassification.Discrete degree usually can be used to calculate the standarddeviation or variance, this article USES the characteristics of probabilitystandard deviation in the document class to quantitative description ofcharacteristics of importance, this features probability standard deviationwill be used as the basic weight in text categorization.Experimental resultsshow that the algorithm on the performance indexes of some commonlyused tabbed text categorization is superior to the existing feature selectionmethods, a combination of a variety of feature evaluation function tochoose feature subset, not limited by the specific text corpus, to reduce theeffect of "noise".For multi-label in Chinese text classification algorithm, this papercombines the problem and improvement of tabbed algorithm, proposed is akind of feature selection benchmark adjusted, based on the existing singletag classification results are weighted, adaptive threshold setting, thecombination of different weighted voting method, classification instancetreated with more tags, can improve the classification accuracy andprecision of the tabbed text classification. Experimental results show thatthe algorithm provides a more effective and classification reliability highermultiple tags classification algorithm, in some performance index issuperior to the existing tabbed classification of some commonly usedmethods.
Keywords/Search Tags:Multi-Label, Feature Selection, Adaptive Regression, RelatedInformation Weighted Noting, Strong Category Texture
PDF Full Text Request
Related items