A Research On Recommendation System Based On Big Data

Posted on:2015-01-16

Degree:Master

Type:Thesis

Country:China

Candidate:B J Yu

Full Text:PDF

GTID:2298330467463763

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In the age of the Internet, more and more people prefer to get knowledge from the encyclopedia websites. However, the information explosion of the Internet has seriously affected the quality of the encyclopedia websites. The big data has brought a lot of difficulties to the edit work. Because the work cannot be finished simply by hands, it is necessary to build a recommendation system which could acquire the suitable information from numerous data sources on the Internet automatically and then recommend it in time.KBA2012has raised a similar question. The task is to filter a data stream for documents which are relevant to a set of entities in Wikipedia. Firstly, the paper researches the methods, which have been used in KBA2012and then produced better results, including the method combining entity linking with document ranking, the one exploiting Google-Cross-Lingual dictionary, and the one using Random Forests text classification algorithm. Then the paper uses the method of query expansion to retrieve the relevant documents from the corpus. The corpus is pre-processed, and then using the dynamic document indexing technology of Indri to build a full-text index. After that, using the improved TF-IDF formula to calculate the words generated from query expansion. These words are used to retrieve the first stage of relevant documents from Indri index. The documents are used to calculate the Jaccard coefficient with the document of entity from Wikipedia. The paper set a threshold and selects the final relevant documents which have a higher coefficient than the threshold.The result of this query expansion method is not very well. On this basis, the paper deploys KNN algorithm on Storm cluster to retrieve the relevant documents. The document set is pre-processed specifically. Then KNN algorithm is implemented through feature selection, document expression, similarity calculation, and classification vote. With the topology of KNN algorithm deployed on Storm cluster, the distribution recommendation system applied for text processing is completed.Finally, the paper conducts a series of experiments by adjusting the parameters, and then analyzes and compares the results. It also compares these results with the official ones of KBA2012. Consequently, it shows a better performance on relevant documents recommendation, which proves the high efficiency of the recommendation system based on mass data.

Keywords/Search Tags:

recommendation system, KNN classifier, Storm cluster, feature selection, KBA

PDF Full Text Request

Related items

1	The Research Of Two-stage Feature Selection Ensemble Classifier Based On Bagging
2	Research And Design Of Real-time Recommendation System Based On Storm
3	The Research And Implementation On Personalized News Recommendation Based On Storm
4	Research On Key Technologies Of High Availability For Storm Cluster
5	Design And Implementation Of Real-time Microblogging Storm Based On The Recommendation System
6	Research On Tabu Search Algorithm-based Feature Selection
7	Application Of Optimal Feature Selection Algorithm In Text Classification
8	Research On Multi-label Feature Selection And Classifier Chains Algorithms
9	A Study On The Multiple Classifier Systems
10	The Design And Implementation Of Self-health Services System Based On MongoDB And Storm