Font Size: a A A

Application Of Weak Supervised Learning On Text Classification

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:S S LiuFull Text:PDF
GTID:2428330632963026Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text classification is a basic and important task in natural language processing.With the rapid expansion of network information,text classification can solve the problem of information clutter to a certain extent,which is conducive to the accurate acquisition and application of information.The application of neural network model to solve the problem of text classification has achieved good results and is widely used,but the lack of training data is still the key bottleneck of their application in many practical scenarios.In fact,training a text classification model with good effect and strong generalization ability usually requires a million level of marked corpus.To collect such training data,experts and scholars in relevant fields need to read millions of documents and use domain knowledge to mark them carefully,which is too expensive and difficult to achieve.In addition,researchers often face the situation of only a small amount of labeled data.Therefore,how to effectively use dimensionless data for text classification has become an important research direction in natural language processing.In view of the current situation of weak supervised learning in text classification,this paper attempts to use "self encoder" and "cooperative training" based weak supervised text classification methods.The two solutions correspond to the two kinds of models respectively.The first model uses self encoder to learn unlabeled data.In the training stage,the hidden layer neurons of self encoder compete with each other to guide self encoder to pay more attention to the features that are more guiding for text classification.In general,the model can learn the text features that are meaningful for classification.The second idea is to propose a semi supervised text classification idea based on collaborative training,and to optimize the task of semi supervised text classification by collaborative model and collaborative rules.The experimental results show that the proposed method can effectively utilize the dimensionless data and improve the performance of text classifier.Using the method of weak supervision to solve the problem of scarce marked data in text classification can save manpower and material resources,make full use of unmarked data,and greatly reduce the cost of manual marking.In addition,the method of weak supervision can be extended to other tasks,which can also provide some reference and inspiration for the major tasks of deep learning,and has high value and significance for solving the problem of the scarcity of marker data in deep learning.
Keywords/Search Tags:Text classification, Weak supervision, Autoencoder, Collaborative training
PDF Full Text Request
Related items