Font Size: a A A

Research And Implementation A Wrapper For Web Data-Extraction Based On Ontology

Posted on:2010-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2178360275455066Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a focus of research on information retrieve and a very potential technology,Web data-extraction technology has become the emphasized research in a large number of universities and institutions.Web data-extraction as well as is named web informational collection(Web Crawler,Web Spider,Web Robot or Web Worm).The main function of Web data-extraction is to recognize data which users are interested in from unstructured or semi-structured concluded in web pages,and complete it by transforming in focus on structure and semantic(XML,relational data,objectoriented data etc.).Web data-extraction is completed mainly by Web wrapper,Web wrapper is a kind of software architect,which picks up web pages'information collected by Web Spider and transform into the information in a special definite format by the definite rule.Generally A web wrapper is for a kind of web page of the data source.Web wrapper will implement date source extraction actually needed in principles.Contemporary Web wrapper mainly estimate suited template base for the objective extraction,and implement it by matching documents and the template in template base.The ability of template's explain directly affect the accurate of system. It ordinary adopts to unite the key words and mutually matching as a template,but the construction has many limits.Firstly in many conditions there are not firm orders between the words;secondly the multivocal words often cause wrong analysis result. To overtake this limitation,one is to improve the expressional ability of the template; the other is to clear up the different meanings:First,Put forward to a design method of wrapper based on ontology(Ontology-Based Wrapper,OBW for short) The method of data-extraction wrapper is driven by ontology,by which to analyze the web page and make relevant data extraction.The web documents including relevant realm of data pages are named semantic web page file through analyzed wrapper(SWPF).Query processor combines with ontology for analyzing and dealing with those SWPFs to find relevant data and links of returning to users. Then,Design the Ontology for a special domain,and analyze its construct characteristic particularly in detail.Then we describe the work flow of the OBW based on the ontology.Include data extraction from Web page and the SWPF file bring out.Finally,Design and implement an OBW Data-extraction System based on the methodology and technology described above,and validates it through real Web data, analyze and give solution to the problem of the result.The ontology is the core and basement of the OBW,it is important to construct a well and application realm ontology,which makes role in improve the precise of dataextraction. But now there is not a skilled technology to give a universal one,only for some special area and must have people take part in to cooperate,so it require person of special ability,still need deep research.
Keywords/Search Tags:Data-Extraction, Wrapper, Ontology
PDF Full Text Request
Related items