A Subject Classification To News Text Data Based On BERT Pre-training Model And VAE Feature Reconstruction

Posted on:2022-08-15

Degree:Master

Type:Thesis

Country:China

Candidate:J W Yi

Full Text:PDF

GTID:2518306491977169

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

This paper introduces several main processes of text classification in detail,and makes a theoretical analysis of the common methods of each part.Then we propose a text classification model based on Bidirectional Encoder Representations from Transformer(BERT)pre-training model and Variational Auto-Encoder(VAE)feature reconstruction.The model does different processing for different types of features in the documents.First,for continuous language sequences,through the comparisons with multiple base models,it is found that as a kind of transfer learning,BERT pre-training model only needs a fine-tuning in the existing dataset to achieve an accuracy which is much higher than the best-performing Text CNN model’s in the base models.Therefore,BERT pre-training model is used to express features,and the output is the feature vector of a sentence.Secondly,for the keywords that appear separately in documents,we use Word2 vec method for vector representation,and then use VAE for feature reconstruction to keep the dimension of the keyword vectors consistent with the dimension of the sentence feature vectors.Finally,the vectors output from the above two steps are spliced.After adding a fully connected layer,a fusion model of BERT pre-training model and VAE feature reconstruction is constructed,and the classification results can be output.This article selects the news texts published on the Toutiao website as the experimental data.The experimental results show that the fusion model combined with keyword features works best,especially on some difficult-to-classify samples.

Keywords/Search Tags:

news texts classification, BERT pre-training model, VAE, feature extraction, model fusion

PDF Full Text Request

Related items

1	Research On News Texts Classification Based On Keyword Extraction And BERT Word Embedding
2	Middle-Aged And Elderly Personalized News Recommendation System Based On BERT Model
3	Research And Implementation Of Multi Model Fake News Classification System Based On Bert
4	Classification Of News Short Text Based On Deep Learning
5	Research On Social Network Texts Emotion Recognition Based On BERT Model Feature Construction
6	Classification Of Sexual Harassment Dialogue Texts Based On BERT-CNN
7	A Research On Abstract Summary Extraction Of Long Texts Based On BERT Model
8	Research On Event Extraction Based On Deep Learning
9	Research On News Short Text Classification Method Based On BERT
10	Research On Regional Classification Of News Texts Oriented To Network Education