Research On Short Text Classification Of Semi-supervised Pre-training Based On Autoencoders And Word Order Dependencies

Posted on:2021-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:B Guan

Full Text:PDF

GTID:2428330614465909

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology and the arrival of the era of intelligence,the global information reserves show an exponential growth trend.As an important carrier of information interaction,the short text is especially active in social networks with a large number of users and in daily comments.These unstructured essays contain a lot of valuable information that requires complex engineering and can be very expensive to extract manually.Therefore,the use of machine learning to annotate a large number of unmarked short texts in the Internet and how to efficiently organize and manage the data of short texts has become one of the hot topics in the current natural language processing(NLP)task.At present,the pre-trained language model based on deep learning has been proved to effectively improve the effect of text classification.The basic idea is to pre-train the language model from a large number of unmarked text and fine-tune it through the downstream task of supervision.However,these models require large amounts of reliable data and industry-level computer resources,which limits their use in resource-constrained environments.In addition,compared with the long text,the short text classification is faced with the difficulties of fewer feature words and irregular diction.Therefore,short text classification is generally optimized and improved in preprocessing,text representation,classifier construction and other links to improve the speed and accuracy of classification.Based on the above requirements and problems,this paper mainly focuses on a lightweight semi-supervised text pre-training classification method.Firstly,the variational document model is used to preprocess a large number of unmarked short texts,extract the probability distribution characteristics of hidden variables in the text data,and then take the internal state of the pre-training model as the feature input of the downstream classifier.As a variant of the generated model,this method has gained a competitive advantage in the task of short text classification under the condition of limited data and computation.However,there are some problems in the existing models which need to be optimized.Based on these problems,DPCNN and Free Bits technology were used to improve.The experimental results show that the improved model is more effective than the original model in the task of text classification.

Keywords/Search Tags:

Short Text Classification(STC), Semi-supervised, Variational Encoder, Neural Network, Pre-training Language Model

PDF Full Text Request

Related items

1	Research On Semi-supervised Short Text Classification Based On Co-operative Training
2	Research On Chinese Short Text Classification Based On Semi-Supervised Clustering
3	Research On Network Uncivilized Text Classification Methods Based On Semi-supervised Learning Models
4	Text Classification Based On Semi-supervised Learning
5	Research On Short Text Classification Method Based On Semi-Supervised BTM Model
6	Research On Text Classification Algorithm Based On Graph Neural Network
7	Research On Short Text Classification Based On Graph Attention Networks
8	Research On Text Classification Algorithms Based On Machine Learning
9	Improved BP Neural Network Combined With Semi-supervised Algorithm And Its Application On Text Classification
10	Short Text Classification Based On SVM And Semi-supervised Learning