Font Size: a A A

The Personalized Recommendation Based On LDA Text Topic Mining And Implementation On Spark

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:J LiangFull Text:PDF
GTID:2308330503985309Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, the Internet has depth all aspects of lift. The development of data scale is from GB level to TB level, even PB and EB level. Traditional information retrieval technology usually returns results only base on user’s query and cannot meet the real demand of user’s. Facing the massive result, Traditional information retrieval technology cannot achieve accurate retrieval to meet the different needs among users. Personalized recommendation technology through the user’s historical data mining, analysis of historical records the user to generate user interest model, generating initiative recommended content information based on user interest generated by the model. Such personalized recommend method change the traditional of user actively retrieving information to the active recommendation information from the website, which meeting the information needs of the user while establishing the differences.Based on text resource mining, this paper study how to analyze the history of user behavior and use text topic mining to get the preferences of user in topic level. This work includes the following aspects:This paper proposes the Personalized Topic Network model. Constructing undirected graph model based on the user, the document and the topic. Through the analysis of user’s history behavior and latent topic mining of document, connecting user to the latent topic via document. Making the description of user’s preferences from topic level as possible. This Model uses the latent dirichlet allocation algorithm mining the latent topic of document to get the topic description of document in the form of vector. By mining the Personalized Topic Network model, getting the vector representing the user’s preference of topic. It can be used to calculate the similarities between the user and documents to form the list of recommendation which making the personalized recommendation based on user’s preference on topic comes true.For the Personalized Topic Network, the traditional Gibbs Sampling method is optimized in this paper. A new parallelized Gibbs Sampling algorithm on Spark platform is designed. Explaining the splitting and reconstructing method in parallel optimization algorithm. Giving the update formulas between each iteration of parallel Gibbs Sampling on reconstructing data sets and the method of adjusting the statistics. This paper designs the system architecture and processing flow on Spark platform, explaining the function of the main modules.According the system design of the Personalized Topic Network model, this paper sets up a pseudo-distributed Spark testing platform to experiment on the open source texting data sets form Stanford and simulation web log of user browsing behavior. Detailed the process of deploying the pseudo-distributed Hadoop platform and Spark computing framework. Explaining the parameter selection of LDA algorithm. Using charts to analysis the effect of computing result and the accuracy of recommendation list.
Keywords/Search Tags:Personalized Topic Network, Personalized Recommendation, LDA, Spark
PDF Full Text Request
Related items