Font Size: a A A

Research And Application Of Key Technologies For Biomedical Thematic Information Tracking And Service System

Posted on:2012-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2214330371962975Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Internet search engine is one of main methods to access information, and vertical search engines can provide more accurate and efficient information services for specific areas of expertise. At present, there are various search engines and retrieval systems at home and abroad, but there are some limitations: low information retrieval quality, rare tracking and retrieval systems which can initiatively provide professional information services, very expensive price and so on. As the top level institute of military medical research and disease control, the research and management personnel of our academy have strong demand for initiative and customized research information. To meet the scientific researchers'information services demand and provide intelligence support for scientific decision-making of the leading organs, the author researched the key technologies required for a biomedical-oriented vertical search engine, and built a biomedical thematic information tracking and service system based on these technologies.First, the author analyzes requirements of the system, through literature research, expert consultation and system analysis methods, researches and discusses the key technologies required for implementing the system functions, including web crawling, full-text search, vertical search engines, Chinese Word Segmentation, re-crawling and so on. The reason why to choose these key technologies and software component tools is presented. Second, the author discusses the principle, current status of some key technologies, and conducts a careful comparison and selection for these influential technology products and software components at home and abroad. The author finally determines open source software components required for system such as Nutch, Lucene, and Paodingjieniu, and analyzes the technical implementation of these components and custom development methods. Third, based on the latest software development thoughts of component, the author uses Eclipse, MySQL, Tomcat and other development tools and platform, assembles and integrates these key components, transforms the re-crawl module of Nutch to make it actually available, and implements biomedical thematic information tracking and service system. Finally, the paper discusses some problems such as low search precision due to the lack of professional thesaurus in the biomedical field, and less massive search capabilities and other issues. The author proposes to use UMLS and cloud computing to improve and perfect the system.System not only has the ability to fetch information from the Internet, process format, index and retrieve content, but also has the Chinese Word Segmentation and re-crawl functions. So that researchers obtain more accurate and timely search results. Meanwhile, the system also provides real-time information, category navigation, full text search and other personalized services for users with the latest biomedical information. The current system has been put into trial operation, and is able to track and crawl 30 sites, with the response time for search results less than 2 seconds in 20 concurrent users. Research results have been published in the Chinese core journals such as the "Beijing Biomedical Engineering" and the "Bulletin of The Academy of Military Medical Sciences", and provide reference for the relevant institutions and researchers who want to design and implement a similar system.
Keywords/Search Tags:search engine, component development, Nutch, Lucene, full text search, Chinese Word Segmentation, re-crawl
PDF Full Text Request
Related items