Font Size: a A A

Research And Application Of Chinese Text Classification Based On Active Learning

Posted on:2021-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:J J MingFull Text:PDF
GTID:2518306512987849Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Chinese text classification plays an important role in the network information management and network platform construction.Domestic information publishing and exchange platforms mainly rely on Chinese text for information transmission.With the increase of users,the number of emerging texts has also increased rapidly.The platform's personalized recommendations and spam filtering functions are mainly based on text classification technology.However,in the process of network information dissemination,there will always be an explosive increase in the number of fresh category texts.A large number of sample annotations for the new category of text will cause the problem of poor text classification,which will also affect the related functions.Therefore,the fast and accurate text classification capability in scenarios with high real-time requirements is of great significance for ensuring the normal operation of network platform functions.Based on this background,this thesis studies the problem of Chinese text classification based on active learning,which can reduce the use of labeled samples while ensuring the performance of the text classifier.The main research work of this thesis is:1.A topic paragraph vector(TPV)representation model for active learning text classification is proposed.Based on the topic model,the topic data is extracted from the text data.The topic vector of the text is integrated into the training process of the segment vector,so that the vector representation of the text includes global topics.Semantics,which provides a data source for subsequent active learning of Chinese text classification models,reduces the computational pressure of feature extraction in the classification model.2.This thesis proposes an active learning Chinese text classification model based on measuring samples density and information entropy weighting(MSD-IEW).In order to make the samples in the initial sample set representative,the MSD algorithm is introduced to select the initial sample set.Considering the value of the sample to the classifier,IEW algorithm is proposed to perform weighted training on the sample to complete the classification of Chinese text.While ensuring the performance of the text classifier,the amount of labeled samples is reduced.3.This thesis combines TPV text representation model and MSD-IEW active learning model to design and implement a Chinese text classification system based on active learning.The system requirements are analyzed,the overall system framework is designed,the structure composition,implementation method and working mechanism of each module are introduced and shown in detail,and the system is tested.The results show that Chinese text classification system based on active learning can effectively classify Chinese texts while reducing sample annotation.
Keywords/Search Tags:Text Classification, Active Learning, Paragraph Vector Model, Topic Model, Uncertainty
PDF Full Text Request
Related items