Font Size: a A A

Research And Implementation Of Medical Vertical Search Engine Based On Improved PageRank Algorithm

Posted on:2018-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:M X ZhouFull Text:PDF
GTID:2348330536484882Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,Internet has gradually become an important platform for people to access healthcare information,in which the search engine brings great convenience to the people in the process of inquiring the medical information.The existing medical search engine,however,is insufficient in topic similarity judgment and page sorting algorithm.As a result,we made improvements from topic similarity judgment and PageRanking algorithm,then built a medical vertical search engine in this paper.The main research contents and achievements are as follows:(1)Chose the initial URL,established the medical field lexicon,researched the vector space model.After crawling the web page,the topic similarity judgement is computed from three aspects: hyperlinks,meta-information and lexicon,this way can effectively filter out the page which has nothing to do with the topic,and greatly improves the efficiency of search engine.(2)The PageRank algorithm and HITS algorithm are studied and analyzed in this paper.PageRank algorithm was used in this paper because of its high computational efficiency and larger amount of data calculation.In order to improve the deficiencies in emphasis on the old webpages,unreasonable weight distribution and theme offset,we added time feedback factor to raise the PageRank of the “new” page;added authoritative feedback factor to improve the outbound link's weights of web pages and added topic-related factor to inhibition “theme offset”.(3)According to the above research,this paper designed a medical field vertical search engine.In the design of this search engine,there were crawling module,and query service module.In addition,Nutch has high scalability and plug-in mechanism,thus we use IKAnalyzer segmentation in Nutch to improve the ability of search engine to process Chinese information.(4)Finally,we deployed the project on the server and conducted experimental tests.Experiments showed that the vertical search engine achieved a cut by word and the accuracy rate of the word segment reached 90%;after the page through the subject similarity judgment,the crawl efficiency increased by 8%;through the improvement of the PageRank algorithm,the vertical search engine's precision significantly improved,and the first 10 result's precision which search engines returned to the user was 0.7 or more.
Keywords/Search Tags:medical field, vertical search engine, subject discrimination, ranking algorithm, Nutch
PDF Full Text Request
Related items