Font Size: a A A

Research On News Recommendation Algorithm Based On LDA In Hadoop Platform

Posted on:2016-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:K SongFull Text:PDF
GTID:2208330470952868Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the approaching of the information age, the Internet has been growing and developing with ultrahigh speed, and the data of online news is being produced more and more rapidly. So nowadays how to get useful and significant information through the vast amounts of data is a serious problem. In such a severe situation, the recommendation system based on big data has began to be applied, and is becoming an indispensable tool for the network media of news.Because of clustering and recommendation for news based on hadoop can analyse and calculate all of the metadata quickly, it can deal with the problem of cold start effectively, and it can also achieve balance and coordination through the cost, efficiency, and accuracy of the recommendation system. However, the segmentation and de-noising of text, the selection and tuning of clustering algorithm and the optimization of operation in the hadoop platform are still having many issues to be analyzed and improved, so it is necessary to take a depth research for it.Within the word segmentation and noise reduction part, based on a deep analysis of the advantages and disadvantages of the existing word segmentation tool, this thesis shows that the acuity of segmentation can be wiped out by the using of the new word dictionary, and it also presents the establishment of the balanced binary tree based on speech analysis for the first level of denoising, the establishment of the tree storage based on stop words for the second level of denoising, and feedback denoising for the third level to eliminate the existing problems of segmentation and denoising. For the algorithm of clustering, we carry on a thorough analysis of the advantages and disadvantages with the clustering of keywords extraction and topic extraction, and propose the use of keyword extraction for topic extraction to strengthen word segmentation and optimize text clustering. As to the topic extraction clustering, this paper makes a depth analysis on the clustering parameters based on the theory of topic extraction clustering, puts forward the optimized scheme for the LDA algorithm, and carries out the experiment to verify our hypothesis.The result of the comparative experiment shows that the hierarchical denoising algorithm can significantly reduce the effect of noise in clustering and recommendation, and the feedback-noise-reduction can avoid the error of word segmentation and the incompleteness of denoising, which are caused by a single way of segmentation and denoising. What is more, the optimized clustering of subject extraction can effectively avoid the sharp problem of clustering, and can enhance the accuracy of the extraction result. The recommended news shows that the proposed algorithm optimization and parameter tuning can effectively increase the accuracy of clustering and recommendation. On the whole, the final result can basically reach the practical need of clustering and recommendation for the news.
Keywords/Search Tags:Hadoop, LDA, recommendation system, word segmentation, balancedbinary tree, noise reduction, Chinese text
PDF Full Text Request
Related items