Font Size: a A A

Research And Development Of Medical Search Engine Based On Nutch

Posted on:2015-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:E G YuanFull Text:PDF
GTID:2298330431992018Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the continuous improvement of living standards, The attention of the people to theirown health is also increasing.The rapid growth in the amount of information in the network,so that the Internet becoming an important choice for the public access healthcareinformation.In order to found the useful information in the vast Internet data, people usuallyuse a search engine.General search engines easy to use and the search resultscoverage.However, when we need to retrieve only one areas of specialized information,thekind of engines search results are often exposed low accuracy态information lag and othershortcomings.Vertical search engine are used in a particular industry,it can collection of relatedresources from the Internet intelligently. This search engine can integrate resources and buildthe data resource for the industry, in order to achieve the retrieval requirements of the specificpeople.Today,Vertical search is a hot topic in information retrieval field.In order to facilitatethe public accessing medical and health information from the Internet fast and efficient,Thispaper designs a medical vertical search engine that based on Nutch components.This paper analyzes and research the key technologies of medical vertical searchengine,and deep learn the working principle of the Nutch1.2Based on the the practicalneeds of the public,this paper complete the design of the system. The design of this systemis based on Nutch1.2after secondary development. Focused on solving the problem is theChinese word segmentation system, themes judgment and result sorting parts.The specificmethod is as follows.The system realized the function of Chinese word segmentation byIKAnalyzer.It also obtained Term Library by training texts. Using of SVM,The enginecalculated the correlation between web page and medical domain.It realized the function ofweb page filtering.Finally,this system joined the theme relevant factors in the sorting algorithm.Finally, thissystem will be deployed on the Tomcat server. The feasibility of the method is verified by experiment.The advantage of this system to retrieve information in the field of medical healthis shown,by comparing the search results between this system and the general search engines.
Keywords/Search Tags:vertical search engine, Nutch, chinese word segmentation, textcategorization, VSM
PDF Full Text Request
Related items