Font Size: a A A

Application And Implementation Of Micro-blog Recommendation System Based On LDA Topic Model

Posted on:2019-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2428330596964840Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,various kinds of network applications have been emerging,and micro-blog is a typical representative.With the rapid development of micro-blog,micro-blog short text information has shown explosive growth.The micro-blog short text data is extremely large.How to classify these micro-blog short text is the key to further tap the value of short text,and it is also the key to micro-blog personalized recommendation.However,due to characteristics of less content,micro-blog short text and sparse feature,micro-blog short text classification faces enormous challenges.The traditional text categorization method for long texts does not work well on micro-blog short text.Therefore,it is imperative to study the classification methods for short texts.The study of short text classification methods is mainly reflected in the feature extension of short texts.Today,there are two methods for feature extension of short text.One is a feature expansion method based on knowledge base,and the other is a feature expansion method based on search engine.These two methods have a wide range of applications in the field of short text classification,but there are also some problems.The feature expansion method based on the knowledge base does not play any role for vocabulary that is not included in knowledge base,and its use scope is limited.The feature expansion method based on search engine inevitably introduces noise and it is very time consuming while extending the feature.Aiming at the problem of sparse features in micro-blog short text,after analyzing and researching,this paper proposes a method of short text feature extension,and combines the LDA topic model to classify and recommend micro-blog short text.The main contributions are as follows.1.A micro-blog short text classification algorithm based on lexical chain feature extension and LDA model is proposed,namely “Lexical Chain Expansion plus LDA” algorithm.Aiming at the problem of sparse features and poor classification effect in the classification process of micro-blog short text,put forward a method for the extension of lexical chain features based on Tongyici Cilin.Lexical chain not only covers words that are included in Tongyici Cilin,but also covers other words that are not included in Tongyici Cilin.In addition,it can enrich the lexical chain while extending micro-blog text.Based on the problem that VSM exists high dimension, semantic features are not obvious in the text classification.We can extract subject probability vector from LDA topic model as vector representation of micro-blog text.Compared with VSM,the LDA effectively reduces the dimension of similarity calculation,and incorporates some semantic features.Compare the classification algorithm of Lexical Chain Expansion plus LDA with existing algorithms,the experiment results show that the algorithm of Lexical Chain Expansion plus LDA takes full advantage of lexical chain feature extension and LDA model and effectively improves the effect of micro-blog short text classification.2.Based on the classification algorithm of “Lexical Chain Expansion plus LDA”,micro-blog recommendation system is designed and implemented.The system includes four functional modules: data import module,preprocessing module,feature extension module,and LDA recommendation module.The function of the data import module is to import the obtained micro-blog data into the system and divide the training set and test set according to random ratio.The preprocessing module mainly includes four functions: text cleanup,Chinese word segmentation,stop words elimination,and view of preprocessing results.The feature extension module mainly includes for functions: the generation of the lexical chain,lexical chain extension,view of the generated lexical chain,and view of the result of the feature expansion.The LDA recommendation module mainly includes four functions of LDA modeling,classification processing,viewing classification results and micro-blog recommendation.Accord to the calculation of the similarity in the micro-blog short text classification,the top three micro-blog with the highest similarity in each category are selected for recommendation.Finally,compare the “Lexical Chain Expansion plus LDA” algorithm with existing algorithms,the experimental results show that the proposed algorithm has slightly poor classification effect in some categories,but in general,the algorithm effectively improves the effect of micro-blog short text classification.
Keywords/Search Tags:Short Text Classification, Lexical Chain, Feature Expansion, LDA, Micro-blog Recommendation
PDF Full Text Request
Related items