Font Size: a A A

Research And Implementation Of Distributed Retrieval And Recommendation System Based On Spark Platform

Posted on:2018-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2348330542986995Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,we entered the era of information explosion.Information filtering has received more and more attention in recent years.On the one hand,the user needs to retrieve interest information through conditions and keywords;on the other hand,the recommendation algorithm pops interest information out.Retrieval treated as a means for users to actively filter information.By establishing the binary relationship between user and information,personalized recommendation explored interest information for users through a certain way and regarded as a powerful complement for information retrieval.From the background and significance of the subject,this paper introduces the research status of elated fields,describes mainly popular retrieval technology,collaborative filtering recommendation technology and distributed cluster-related technologies in current industry,etc.By analyzing the exsited problems of index creation efficiency and retrieval load balance of the single-machine full-text retrieval technology and the Map/Reduce retrieval technology,a tree based distributed inverted index scheme and a distributed retrieval scheme based on hash and redundant shards storage are proposed.Furthermore,a collaborative filtering recommendation scheme based on co-occurrence model is proposed by analyzing users' historical retrieval behavior which belongs to no rating scenarios.The factorization machine model is used to predict the recommendation sequence so as to optimize the quality of recommendation result.In the experimental stage,compared with the traditional single-machine retrieval implementation based on Lucene and the distributed retrieval implementation based on Map/Reduce,the proposed scheme improved index creation speed a lot and finished full text index creation for rich text data in polynomial time.As for metadata retrieval,the retrieval time is linearly related to the number of documents.As for full text retrieval,the retrieval speed and the number of documents were presented in a similar-linear relation.As for concurrency query,it built indices back up mechanism,adopted divide and rule strategy and almost achieved load balancing.Compared with the traditional collaborative filtering recommendation,the proposed scheme improves the proportion of click-through-rate.
Keywords/Search Tags:information filtering, spark, distributed retrieval, co-occurence, recommendation
PDF Full Text Request
Related items