Font Size: a A A

The Research And Realization Of Vertical Search Engine System Based On Nutch For Medicine

Posted on:2016-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LvFull Text:PDF
GTID:2308330479495244Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the way people access information is getting more and more, not only the great convenience it brings for people because of all kinds of variety information which crowed in our lives but also leaving them at a loose end. In the face of numerous information, how to get the specific information what they need troubled people for a long time. But with the emergence of search engines, it alleviates this situation greatly. However, with the number of pages on the Internet showing exponential growth, it becomes more and more difficult to access these data for general search engines. Instead of general search engines, vertical search engines, with the high degree of concentration of information and strong domain expertise, became the hot research nowadays. A number of outstanding vertical search engines appears one by one. However, there isn’t a good search platform related to people’s life and health. Only by the doctor can people get information about disease and treatment, it is obviously that the way to get it is too little.And,due to the geographical environment,economic development and other factors,the development of good medical resources is not balanced. If there is a medical vertical search engine, people can get medical information without getting out of their house, it will help alleviate the sense of our current medical problems and weak infrastructure.Based on the open source framework Nutch, this article do some research and design about focused crawl model and information retrieval then build a vertical search engine about medical field. In the process of building, the focused crawl have been the research hotspot. This article do some research about Fish-Search and Shark-Search algorithm.Based on web links and web content, this system make an objective appraisal about the relevance. On the basis of “tunnel phenomenon” limitation, the system crawl and download webpage about medical field. After that, with the block webpages technique, the system parses the associated webpages with web analytic tools and Chinese participle technology. Then, the system build the inverted index structure about webpages. For theranking of webpages which processed by information retrieval, the system do some research about the result’s score by HITS and HillTop algorithm. Due to the weight transfer,the PageRank algorithm add time feedback factor to reduce the superiority of old webpages. The system combine Page Rank algorithm and the VSM of Lucene to improve the focused relevance and authority on the phenomenon of theme drift. After that, the system return back these webpages to users by new technology of webpage ranking, the vertical search engine come true in the field of medical.With the design and research of the vertical search engine, users can get more authoritative medical knowledge by efficient manner. This efforts not only have a positive impact on health and hygiene but also bring people a healthy lifestyle.
Keywords/Search Tags:vertical search engine, focused crawler, information retrieval, medical field
PDF Full Text Request
Related items