Font Size: a A A

The Research And Implementation Of Data Cleansing And Ranking Algorithm On Vertical Search Engine

Posted on:2015-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:J B LiFull Text:PDF
GTID:2348330518986380Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the information on web is dramatically increasing,and the search engine is always an important portal to all the information.As the categories of information become increasingly diverse,the general search engine seems attend to one thing and lose another,and therefore the demand for domain-specific vertical search engine is growing.In this paper,we firstly introduce the concept of vertical search by comparing with the general search engine,and then we analyze some problems facing vertical search and give solutions.For information on the Internet there are a lot of duplication,errors,incomplete and constantly updating,this paper proposes a data cleansing framework based on record aggregation.This framework can effectively clean data,while satisfying the real-time demand.For the record linkage problem in data cleansing,we propose a blocking method based on Lucene,which can improve the precision and recall while maintaining high efficiency.In order to constantly improve the user's search efficiency,this paper first analyzes the possible ways to improve search rankings,and then expand the study from the perspective of personalization.By drawing on the studies on personalized recommendation system,we applied the content-based algorithm and latent factor model to the personalized ranking problem.A ranking oriented latent factor model was proposed for personalized ranking,and tested it on the MovieLens Dataset.Finally,a vertical search engine for video was designed.We applied the method we proposed to the system,and verified the effectiveness.A system testing shows the high data quality,fast response,high search efficiency this system offers.It meets the user's requirements.
Keywords/Search Tags:Vertical Search, Data Cleansing, Record Linkage, Personalized Ranking
PDF Full Text Request
Related items