Font Size: a A A

Research On Text Classification Based On Ensemble Learning

Posted on:2019-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:P P LiFull Text:PDF
GTID:2428330545457839Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of data on the Internet,the cost of retrieving various types of data is getting higher and higher,and it is very important to classify the information more conveniently and efficiently.Text classification techniques make it easier to retrieve information and make it more accurate and efficient.In recent years,some new technologies and ideas have been applied to text classification.However,there are still some aspects for improvement such as the feature weighting algorithm and the application of ensemble learning.With fully studying the basic theory and framework of text classification,this paper improves the accuracy and generalization ability of text classification by improving the traditional method.The traditional feature weighting algorithm TF-IDF lacks the distribution information of document between categories and the distribution information between documents in same category.This paper propose the factor introC and interC which considering the distributed information,and compare with traditional algorithms.Combining them then have got higher F1 value.The factor intro C also takes into account the information of document distribution between categories,effectively improving the ability of the classification algorithm to adapt to the unbalanced data set.Some experiments are designed by this paper to verify the effectiveness of the improved TF-IDF-dist algorithm.The mainstream ensemble learning algorithms are based on homomorphic classifiers,which disturb the stability from the training samples and get multifarious classifier.In this paper,based on the experimental results of heterogeneous classifier with rich diversity,an ensemble learning model based on multi-angle disturbance heterogeneous classifier is designed.With enriches the diversity of base classifiers from disturbing the feature selection algorithm,the characteristic dimension and the perturbation of the classifier parameters,the accuracy and generalization ability of classification models has been effectively improved.The experiment of this paper verifies the feasibility and effectiveness of this algorithm.
Keywords/Search Tags:Text Classification, TF-IDF, Unbalanced Dataset, Ensemble Learning
PDF Full Text Request
Related items