Research Of Text Classification Algorithm Based On Document Representation

Posted on:2020-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:B Shu

Full Text:PDF

GTID:2428330575496900

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rise of deep learning,the generation of large amounts of data,including text,speech,pictures,etc.,how to learn useful features from large amounts of data is currently the most important component.In the field of natural language processing,learning document representation is essential for the precise understanding of natural language,and can be applied to a variety of natural language processing tasks,including text categorization,text similarity matching,naming recognition,and so on.This paper focuses on the research of recurrent neural network and BERT model,optimizes the input or output of these two network architectures,improves the generalization performance of the model,and studies the text classification task to verify the scalability of the classification algorithm.The results and main work of this paper are as follows:1.The long short-term memory networks training text classification task is not effective.In order to better learn the document representation training text classification,The long short-term memory network with pooling and dropout is designed to represent the document,pooling can retain the main features while reducing parameters and achievies fixed-length output;dropout also acts to prevent over-fitting and improve generalization performance for supervised learning documents.compared with the model of word bag model,convolutional neural network,long short-term memory network,the long short-term memory network optimized on the four data sets has at least 0.2% improvement in accuracy compared with the direct use of long short-term memory networks.2.For the BERT model that currently performs well in the field of natural language processing,the probability distribution of the output layer softmax generation category is too single.At the same time,inspired by the mixture softmax,the softmax layer of the BERT is optimized,and the improved version of the mixture softmax is used,and the integration is utilized.The idea of weighting the output of each softmax,the effect on the four data sets is more than 1% better than the accuracy of the BERT-Base model.

Keywords/Search Tags:

dropout, LSTM, BERT, Pool, mixture softmax

PDF Full Text Request

Related items

1	Latent Sentiment Polarity Analysis Of Chinese Texts In Social Networks Based On BERT-LCA
2	Answer Selection Based On BERT-LSTM Algorithm
3	Research On User Portrait Algorithm Based On BERT
4	Conv-BiLSTM: A New Intelligent WebShell Detection Network Based On Bi-LSTM
5	Research On Social Network Texts Emotion Recognition Based On BERT Model Feature Construction
6	Based On Bi-GRU And L-Softmax Text Classification Model
7	Research And Practice On The Technology Of Softswitch Msc Pool
8	A Study On Angular Softmax
9	Research On Discrete Recommendation Based On Gumbel-Softmax
10	The Transformation Of Msc Pool Base On The Some Mobile Company Network