Font Size: a A A

A Subject Classification To News Text Data Based On BERT Pre-training Model And VAE Feature Reconstruction

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:J W YiFull Text:PDF
GTID:2518306491977169Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
This paper introduces several main processes of text classification in detail,and makes a theoretical analysis of the common methods of each part.Then we propose a text classification model based on Bidirectional Encoder Representations from Transformer(BERT)pre-training model and Variational Auto-Encoder(VAE)feature reconstruction.The model does different processing for different types of features in the documents.First,for continuous language sequences,through the comparisons with multiple base models,it is found that as a kind of transfer learning,BERT pre-training model only needs a fine-tuning in the existing dataset to achieve an accuracy which is much higher than the best-performing Text CNN model's in the base models.Therefore,BERT pre-training model is used to express features,and the output is the feature vector of a sentence.Secondly,for the keywords that appear separately in documents,we use Word2 vec method for vector representation,and then use VAE for feature reconstruction to keep the dimension of the keyword vectors consistent with the dimension of the sentence feature vectors.Finally,the vectors output from the above two steps are spliced.After adding a fully connected layer,a fusion model of BERT pre-training model and VAE feature reconstruction is constructed,and the classification results can be output.This article selects the news texts published on the Toutiao website as the experimental data.The experimental results show that the fusion model combined with keyword features works best,especially on some difficult-to-classify samples.
Keywords/Search Tags:news texts classification, BERT pre-training model, VAE, feature extraction, model fusion
PDF Full Text Request
Related items