Font Size: a A A

Research And Implementation Of News Text Classification System Based On Deep Learning

Posted on:2020-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:J W LiFull Text:PDF
GTID:2428330572973594Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology and the explosive growth of social media,various Chinese short texts such as news headline,instant message and online comment are produced continuously.One of the most notable characteristics of these short text is sparseness.A Chinese short text usually contains very few valid information because it consists of very few words,for instance,therefore the sample features of these short text are sparse and high dimensional.So,it is very hard to extract critical features accurately from short texts for classification learning.The thesis mainly studies the application of deep learning in the field of Chinese text classification,and proposes a text classification model based on the mixed features of word level and character level.According to Chinese text analysis process and improved text classification model,the news text classification prototype system is designed and the news short text classification system platform is develpoed.The main works are as follow:1.In this thesis,a novel method is proposed to solve the problem of insufficient representation of single character-level features or word-level features.In view of the short length,sparseness and strong context dependencies of short text,our method takes word-level vectors and character-level vectors as inputs simultaneously,and encodes sentence semantics by two Long Short-Term Memory or bidirectional Long Short-Term Memory.The outputs of the entire sentence combined two outputs from word-level vectors and character-level vectors.The results of experiments using NLPCC 2017 News Headline Categorization data set show that the combination of word embedding and character embedding can complement each other in the sentence semantic representation,which helps to improve the classification performance of Chinese short text.2.According to the process of Chinese text analysis with the improved text classification model proposed in this thesis,the news text classification prototype system is designed.The main function of the system is divided into three parts:news acquisition and storage module,news text classification module and news display module.The news acquisition and storage module mainly completes the functions of crawling news text data on Internet pages and data cleaning and processing after crawling,and saving them in the database.The news text classification module is responsible for feature construction and automatic categorization labeling of the crawled news text data.The news display module is mainly responsible for displaying the classified news text to the users.Through the construction of the system environment,the text classification algorithm and the implementation of each functional modules are completed,and a complete and reliable news text classification system is formed.3.The news text classification system function implementation and testing are completed.Firstly,deploy the system environment,and then elaborate the function realization process of the three system core modules of the news gathering module,the news storage module and the news classification module,give the key functions,and display the implementation results,including data crawling,database operation and text classification model construction,and show the overall operation results of the system.Through the function and performance test of the system,it shows that the implementation of each module meets the requirements of system design.
Keywords/Search Tags:Text classification, Deep learning, Long short-term memory, Word embedding
PDF Full Text Request
Related items