Font Size: a A A

Classification Of News Short Text Based On Deep Learning

Posted on:2022-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:R K WangFull Text:PDF
GTID:2518306770978479Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the network news text has exploded.In the face of massive news text data,manual processing can not satisfy the identification and classification of network document information.Therefore,using statistical modeling or deep learning methods to sort out and mine the important information in the chaotic network news texts can not only save manpower and material resources,but also have great significance for news push by users,classification navigation,public opinion monitoring,and spam filtering.Because news short text has the characteristics of short sentences and obscure features,how to effectively improve the classification accuracy of news short text is a big challenge that news media work is facing today.With the wide application of deep learning technology in natural language processing,more and more researchers are using deep neural networks to solve text classification problems.From the perspective of text representation and feature extraction,this paper improved the text representation method and the algorithm of deep learning model,proposed pt F-IDF weighted Word2 vec text representation model and Bert-LSTM mixed depth model,and compared and analyzed the classification results to improve the accuracy of news short text classification.In terms of text representation,Word2 vec model cannot distinguish influential feature words,and TF-IDF model ignores the uneven distribution of feature words between classes and among classes.This paper improves t F-IDF model by introducing part of speech contribution factor,and gives weight to words according to different parts of speech.Combined with word vector trained by Word2 vec model,a text representation model based on PTF-IDF weighted Word2 vec is constructed.Compared with other text representation models on the same data set,the proposed method can effectively improve the accuracy of news short text classification.In the aspect of feature extraction,LSTM model can effectively learn the dependency on observation sequence and extract global features of the preceding and subsequent texts,aiming at the deficiency of Bert model in weakening text location information.Therefore,this paper combines Bert and LSTM models for feature extraction,and introduces Attention mechanism for in-depth screening and fusion of extracted features to build bert-LSTM model,and compares the classification results of this model and Bert model on the same data set.It is proved that the new mixed depth model can improve the classification accuracy of short news texts.
Keywords/Search Tags:Text classification, Text representation, Feature extraction, TF-IDF model, Bert model
PDF Full Text Request
Related items