Font Size: a A A

Research Of Web Biological Information Retrieval And Extraction Technologies Based Ontology

Posted on:2006-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChengFull Text:PDF
GTID:2178360212482701Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid development of both Internet and biological information science, it is very important to find the biological information data sources in time. Due to ignoring the semantic information which keywords include, traditional search engine based on keywords obtains lower recall and precision. And so, it becomes gradually misfit for this requirement. Moreover, Web has been evolving a tremendous,distributed and shared information resources. But at present most of Web data are wrapped by HTML , which leads to applications indirectly reusing these Web information. So the technology of Web information extraction appears and solves the problem.In this paper, through researching semantic Web and Ontology technologies and making a whole study of information retrieval and semi-structured Web information extraction technologies, the author puts an emphasis on implementation of discovering biological information data sources and extracting biological information data. In order to discovering useful biological information data sources, the author presents a biological information retrieval system based on ontology and feature phrase. Meanwhile, the author also presents a method driven by ontology and locating the key information through the structure of documents and pattern matching to extract requiring data. The author has implemented an user-guided,interactive information extraction prototype system. Firstly it gets specified Web page, and convert the page into well-format XML document using HTML JTidy. Then through XML parser the XML document can be presented a DOM tree. The next step is to specify XPath expression by user to get the requiring data slot and extract data included in data slot by means of OntPMatch algorithm. Finally we can store the extracted data in a structured way.The paper has implemented a prototype system of discovering biological information data sources and extracting biological information data. So it makes users obtain more useful and satisfied information from Web than before and provides a valuable tool to make full use of Web tremendous data.
Keywords/Search Tags:Ontology, Information Retrieval, Information Extraction, XML, DOM, Feature Phrase
PDF Full Text Request
Related items