Research And Implementation Of Distributed Retrieval And Recommendation System Based On Spark Platform

Posted on:2018-08-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y Hu

Full Text:PDF

GTID:2348330542986995

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology,we entered the era of information explosion.Information filtering has received more and more attention in recent years.On the one hand,the user needs to retrieve interest information through conditions and keywords;on the other hand,the recommendation algorithm pops interest information out.Retrieval treated as a means for users to actively filter information.By establishing the binary relationship between user and information,personalized recommendation explored interest information for users through a certain way and regarded as a powerful complement for information retrieval.From the background and significance of the subject,this paper introduces the research status of elated fields,describes mainly popular retrieval technology,collaborative filtering recommendation technology and distributed cluster-related technologies in current industry,etc.By analyzing the exsited problems of index creation efficiency and retrieval load balance of the single-machine full-text retrieval technology and the Map/Reduce retrieval technology,a tree based distributed inverted index scheme and a distributed retrieval scheme based on hash and redundant shards storage are proposed.Furthermore,a collaborative filtering recommendation scheme based on co-occurrence model is proposed by analyzing users' historical retrieval behavior which belongs to no rating scenarios.The factorization machine model is used to predict the recommendation sequence so as to optimize the quality of recommendation result.In the experimental stage,compared with the traditional single-machine retrieval implementation based on Lucene and the distributed retrieval implementation based on Map/Reduce,the proposed scheme improved index creation speed a lot and finished full text index creation for rich text data in polynomial time.As for metadata retrieval,the retrieval time is linearly related to the number of documents.As for full text retrieval,the retrieval speed and the number of documents were presented in a similar-linear relation.As for concurrency query,it built indices back up mechanism,adopted divide and rule strategy and almost achieved load balancing.Compared with the traditional collaborative filtering recommendation,the proposed scheme improves the proportion of click-through-rate.

Keywords/Search Tags:

information filtering, spark, distributed retrieval, co-occurence, recommendation

PDF Full Text Request

Related items

1	Research On Context-Aware Information Collaborative Filtering Recommendation Algorithm Based On Spark
2	An Item-based Collaborative Filtering Recommendation Algorithm Optimization And Parallel Implementation On Spark Platform
3	Research And Optimization Of Recommendation Algorithm Based On Spark Platform
4	Research On IPTV Users' Behaviors Analysis And Distributed Collaborative Filtering Recommendation Algorithm Based On Spark
5	Research And Application Of Distributed Hybrid Recommendation Algorithm Based On Spark
6	Research And Implementation Of Hybrid Collaborative Filtering Recommendation Based On Spark
7	Design And Implementation Of Movie Recommendation System Based On Spark
8	Research On Distributed Collaborative Filtering Recommendation Algorithm Based On Fast Matrix Factorization
9	Experiment And Research Of Recommendation System Based On Spark Parallel Framework
10	Research On Collaborative Filtering Recommendation System Based On Spark