Research On Text Classification Based On Active Self-Paced Learning

Posted on:2019-12-07

Degree:Master

Type:Thesis

Country:China

Candidate:T Mei

Full Text:PDF

GTID:2428330572955611

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Today there are various kinds of information around us,as one of the most common text information carriers,text classification is inspired.Traditional text classification methods can not obtain satisfactory accuracy,and often require a large amount of manual annotation,which costs a lot of labor.So how to improve the accuracy of text classification,and reduce the labor of manual annotation has become a very significant problem.Firstly the existing text categorization methods are analyzed in this thesis.Because of the high cost of manual annotation,only a part of unlabeled text data can be manually annotated,but if there are not enough labeled samples,the supervised learning method often performs bad.In order to solve this problem,the semi-supervised learning methods such as self-training and co-training have been put forward one after another.With using the unlabeled samples,the classification effect has been improved to a certain extent,but it is still not satisfactory.In order to improve the accuracy of text categorization,we use active learning method to annotate data with high information in the iterative process.Then,we introduce self-paced learning and propose a text classification algorithm based on active self-paced learning to further improve classification performance.Aiming at the problem that the accuracy of text categorization tasks is low when the number of manual annotations is small,a text categorization algorithm based on active learning is proposed.Firstly,the feature extraction scheme is studied and improved.word2 vec model is used to propose text features based on word meanings,and combined with TF-IDF features based on word frequency to form a comprehensive text feature.After that,a text classification method based on active learning is proposed.A small number of samples are manually annotated and the SVM classifier is used to train the initial classification model.The classification model is used to evaluate the information of the unlabeled samples,and the samples with large amount of information are selected and added to the training set,and the final text classification model is obtained after several iterations.Aiming at the problem that the text classification method based on active learning only selects informative samples but does not use other samples,we introduces self-paced learning and the text classification method based on active self-paced learning is proposed.In the iterative process,the SVM classification model is used to predict and calculate the confidence of the unlabeled samples.The samples with higher confidence are used to be pseudoly labeled with the prediction label automatically.The pseudo-labeled samples at the self-paced learning stage and the manually labeled samples at the active learning stage are added into the training set to update the classification model.Using the above methods,we conduct experiments on Sohu news and Sina News dataset.The experimental results show that the text classification method based on active learning can significantly improve the classification accuracy;The text classification method based on active self-paced learning can further improve the accuracy of the text,and can reach at the accuracy with less annotation at which the supervised learning method can reach with a large amount of manual annotation.

Keywords/Search Tags:

Text Classification, Active Learning, Self-paced Learning, Feature Extraction

PDF Full Text Request

Related items

1	Research On Chinese Text Classification Algorithm Based On Active Learning Approach
2	Research On Feature Description And Classifier Construction Algorithm In Chinese Text Classification
3	The Deep Self-paced Learning Approachs For Fully Polarimetric SAR Image Classification
4	Research On High Performance Chinese Text Classification Based On Machine Learning
5	Robust Semi-supervised Classification Method Search For Noisy Labels Based On Self-paced Learning
6	Design And Implementation Of Text Classification System Based On Active Learning
7	Research On Information Parsing Based On Text Classification
8	The Design And Implement Of A Mongolian Text Classifier Based On Active Learning SVM
9	Research On Key Techniques And Applications In Text Classification
10	Research Of Dimension Reduction Algorithm And Its Application