Font Size: a A A

Research On Short Text Classification Based On Generalization And Memorization

Posted on:2020-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2428330590495762Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet and the rapid update of its hardware level,the number of short texts is exploding,especially on social networking platforms where users are huge.particularly in the rapid development of social networking platforms such as Twitter,Facebook,Weibo and etc.The number of users of these social software has reached billions,especially the daily comments of active users have led to the continuous increase in the size of short text.Therefore,there is an urgent need for automated language understanding techniques to process and analyze these texts.Among these techniques,text classification technique has proven to be a basic,critical,natural language processing task method that is useful in various scenarios,but how to make full use of its information in short texts with fewer characters will greatly affect the accuracy of short text classification.At present,mainstream methods for short text classification include traditional machine learning text classification methods and deep learning text classification methods.In the traditional machine learning method,there are problems of text representation of high latitude sparseness,feature engineering complexity and classifier selection,which leads to effect of short text task is not ideal for traditional machine learning methods.Although the deep learning method solves the above three problems to some extent,its use of information on local relevance of text is not sufficient.Based on the above problems and requirements,this paper proposes the use of generalization and memorization by using the advantages of memorization to record the correlation and co-occurrence of known information and the advantages of generalization low-latitude dense and can express unknown new features for short text classification task.By integrating generalization on CNN depth learning model proposed GM-CNN model information and memory information,GM-CNN makes full use of text information,and the results in the experiment are better than some existing benchmark models.After the GM-CNN model is proposed,some problems that need to be optimized in the GM-CNN model are studied.Based on these problems,the BatchNormalization technique and the Chunk-Max Pooling technique were improved,and the IGM-CNN model was proposed.The experimental results show that IGM-CNN has better classification result than GM-CNN model.At the same time,the number of chunks of the Chunk-Max Pooling experiment is also carried out,which can minimize the number of parameters of the model and the complexity of the model while maintaining the better classification result of the model.
Keywords/Search Tags:Short Text Classification(STC), Generalization, Memorization, Deep Learning, BatchNormlization, Chunk-Max Pooling
PDF Full Text Request
Related items