Font Size: a A A

Research On Short Text Classification Algorithm Based On LDA And Deep Learning

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhengFull Text:PDF
GTID:2428330614958186Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the way people obtain information is not only through life,but also in the form of networks.The great convenience brought by network information in daily life is inseparable from the development of text classification technology.It is urgent and necessary to dig out the rules of large-scale messy data information.The research objective of this thesis is to use the topic model and deep learning model to classify short text data on the network.The feature-based short text classification algorithm proposed in this thesis,after large-scale text data is segmented,it is not directly filtered by a conventional stopword list,but is filtered by a specific stopword list.The specific stop list is generated by the common stop list,corpus list,and topic model,so that the text retains more effective semantic features.Aiming at the problems of sparse features and different numbers of text words in the document set,the model uses a short-text maximum probability theme to fill the word vector matrix and fusion layer convolutional neural network,the effective semantic features of short texts have been added.Improved part of the short text classification algorithm based on feature expansion.The Convolutional Neural Network was replaced by a two-way long short-term memory network,and two feature paths were added,which are the latent topic feature path and weighted representation path.The latent topic feature pathway generates text-topic feature vectors from the topic model.The weighted representation feature path is a weighted sum of word frequency-inverse document frequency and word vector,and to a certain extent,avoids errors caused by word segmentation.The short text feature vectors generated by the three paths are feature-fused at the input part of the fully connected layer,so as to form deep short text representation vectors that greatly enrich the effective semantic features and better represent short texts.The experimental results show that the average accuracy rate is 97.58%,the average recall rate is 97.16%,and the average F1-Score is 97.37%.The data set used in the experiment comes from the network news data set.Correctly categorize web news data set with headlines and summaries as short text,In order to verify the performance of the proposed short text classification algorithm,aseries of experimental comparisons were performed on the data set,with accuracy,recall,and F1-Score as model evaluation indicators.The final results show the effectiveness and accuracy of the proposed short text classification algorithm.
Keywords/Search Tags:short text classification, topic model, deep learning, feature expansion
PDF Full Text Request
Related items