Classification Of News Short Text Based On Deep Learning

Posted on:2022-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:R K Wang

Full Text:PDF

GTID:2518306770978479

Subject:Journalism and Media

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,the network news text has exploded.In the face of massive news text data,manual processing can not satisfy the identification and classification of network document information.Therefore,using statistical modeling or deep learning methods to sort out and mine the important information in the chaotic network news texts can not only save manpower and material resources,but also have great significance for news push by users,classification navigation,public opinion monitoring,and spam filtering.Because news short text has the characteristics of short sentences and obscure features,how to effectively improve the classification accuracy of news short text is a big challenge that news media work is facing today.With the wide application of deep learning technology in natural language processing,more and more researchers are using deep neural networks to solve text classification problems.From the perspective of text representation and feature extraction,this paper improved the text representation method and the algorithm of deep learning model,proposed pt F-IDF weighted Word2 vec text representation model and Bert-LSTM mixed depth model,and compared and analyzed the classification results to improve the accuracy of news short text classification.In terms of text representation,Word2 vec model cannot distinguish influential feature words,and TF-IDF model ignores the uneven distribution of feature words between classes and among classes.This paper improves t F-IDF model by introducing part of speech contribution factor,and gives weight to words according to different parts of speech.Combined with word vector trained by Word2 vec model,a text representation model based on PTF-IDF weighted Word2 vec is constructed.Compared with other text representation models on the same data set,the proposed method can effectively improve the accuracy of news short text classification.In the aspect of feature extraction,LSTM model can effectively learn the dependency on observation sequence and extract global features of the preceding and subsequent texts,aiming at the deficiency of Bert model in weakening text location information.Therefore,this paper combines Bert and LSTM models for feature extraction,and introduces Attention mechanism for in-depth screening and fusion of extracted features to build bert-LSTM model,and compares the classification results of this model and Bert model on the same data set.It is proved that the new mixed depth model can improve the classification accuracy of short news texts.

Keywords/Search Tags:

Text classification, Text representation, Feature extraction, TF-IDF model, Bert model

PDF Full Text Request

Related items

1	Research On Improved Text Representation Model Based On BERT
2	A Subject Classification To News Text Data Based On BERT Pre-training Model And VAE Feature Reconstruction
3	Research On Short Text Classification Method Based On Improved BERT Mode
4	Research On Text Classification Based On Subword-level Occlusion Prediction Method Of Bert Model
5	Research Of Text Classification Based On Word2vec And Self-attention
6	Research On News Texts Classification Based On Keyword Extraction And BERT Word Embedding
7	Research On Text Multi-Feature Classification Algorithm Based On BERT-LSTM
8	Research On Short Text Classification Technology Based On Deep Learning
9	Text Representation Model Based On Semantics And Structured Tensor
10	Research On Text Classification Model Based On Dynamic Representation