Font Size: a A A

Research On Text Classification System Of Chinese News Headlines Based On Deep Learning

Posted on:2023-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:X D GaoFull Text:PDF
GTID:2568306836472184Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and the more and more extensive ways of news dissemination,the number of news texts continues to increase,and it is more and more difficult for news readers to find specific categories from the massive news headline texts.In the process of text classification,due to the short length of news headlines and sparse text features,it is difficult for headline samples to provide sufficient and valuable features for classification learning,resulting in low classification accuracy.In this thesis,algorithm analysis,system design and development are carried out for Chinese news headline text classification,and a text classification system with high accuracy is constructed.The specific work is as follows:(1)A feature extraction algorithm for news headlines based on feature expansion is proposed.In view of the characteristics of short length of news titles,this thesis improves the traditional TF-IDF algorithm,introduces synonym expansion and category density to compare the importance of words in the text,and proposes an E-TF-IDF algorithm;on this basis,this thesis proposes a feature proximity expansion algorithm,which expands the keyword in the corpus by the occurrence probability of the adjacent words of the keyword.Combining the above two points,this thesis proposes a news headline feature extraction algorithm based on feature expansion,and conducts experiments on the algorithm on Sogou news data set.The experimental results show that this algorithm is better than other traditional algorithms in the experiment.Better performance and help improve the performance of text classification.(2)A deep learning-based text classification model for news headlines is proposed.This theses combines Text CNN and GRU model,and uses the extraction algorithm proposed in this thesis for word embedding,and proposes the TC-GRU(Text CNN-GRU)model.In order to verify the effectiveness of the model,experiments were carried out on THUCNews,Sogou news dataset and Weibo hot search dataset.Through the experimental results,it is found that the TC-GRU model has higher accuracy than other models,and the advantage is more obvious on the Weibo hot search dataset.(3)Design and implement a news headline text classification system based on deep learning.In order to facilitate users to verify the accuracy of classification,data marking,sample addition and other steps,this thesis builds a news headline text classification system based on the proposed classification model.The system includes text classification,data correction,data marking,sample addition,data collection and other functions.Users can enter the title text to be classified in the system and obtain the classification result and confidence.
Keywords/Search Tags:Chinese news headlines, Text classification, Feature extraction, Deep learning, TC-GRU, System
PDF Full Text Request
Related items