Font Size: a A A

A Research On Recommendation System Based On Big Data

Posted on:2015-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:B J YuFull Text:PDF
GTID:2298330467463763Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In the age of the Internet, more and more people prefer to get knowledge from the encyclopedia websites. However, the information explosion of the Internet has seriously affected the quality of the encyclopedia websites. The big data has brought a lot of difficulties to the edit work. Because the work cannot be finished simply by hands, it is necessary to build a recommendation system which could acquire the suitable information from numerous data sources on the Internet automatically and then recommend it in time.KBA2012has raised a similar question. The task is to filter a data stream for documents which are relevant to a set of entities in Wikipedia. Firstly, the paper researches the methods, which have been used in KBA2012and then produced better results, including the method combining entity linking with document ranking, the one exploiting Google-Cross-Lingual dictionary, and the one using Random Forests text classification algorithm. Then the paper uses the method of query expansion to retrieve the relevant documents from the corpus. The corpus is pre-processed, and then using the dynamic document indexing technology of Indri to build a full-text index. After that, using the improved TF-IDF formula to calculate the words generated from query expansion. These words are used to retrieve the first stage of relevant documents from Indri index. The documents are used to calculate the Jaccard coefficient with the document of entity from Wikipedia. The paper set a threshold and selects the final relevant documents which have a higher coefficient than the threshold.The result of this query expansion method is not very well. On this basis, the paper deploys KNN algorithm on Storm cluster to retrieve the relevant documents. The document set is pre-processed specifically. Then KNN algorithm is implemented through feature selection, document expression, similarity calculation, and classification vote. With the topology of KNN algorithm deployed on Storm cluster, the distribution recommendation system applied for text processing is completed.Finally, the paper conducts a series of experiments by adjusting the parameters, and then analyzes and compares the results. It also compares these results with the official ones of KBA2012. Consequently, it shows a better performance on relevant documents recommendation, which proves the high efficiency of the recommendation system based on mass data.
Keywords/Search Tags:recommendation system, KNN classifier, Storm cluster, feature selection, KBA
PDF Full Text Request
Related items