Research On Semi-structure Information Extraction For Web

Posted on:2010-10-17

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Zhou

Full Text:PDF

GTID:2178360272979365

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularization of Internet, more and more people obtain information from Web. To find necessary information quickly and efficiently from Web has become a serious problem. Web information extraction technology comes into bring. Many approaches have been proposed to generate wrapper, but they have too different limitations to make wrapper accurate, robust or general. So, the preparing better wrapper has become the research emphases of information extraction.After having analyzed and researched the technologies of XML and information extraction, a system of Web information extraction based on XML is developed in this paper. With this system, users can extract interested information from HTML pages, the extraction results are expressed in XML which have strong structure and expansion. The system has the generality and flexibility. Users can quickly customize the web information extraction wrapper applied to different areas. In this paper, by using the character of the XPath positioning technology in data area, a algorithm of XPath based on DOM is implemented. XSLT is used as the description language of extraction rules and XPath is used to locate information to be extracted.The method in Web information extraction presented in this dissertation can better solve the problem of Web information extraction, and also the precision and recall of the system can reach a higher proportion.

Keywords/Search Tags:

data mining, information extraction, semi-structured data, Web

PDF Full Text Request

Related items

1	Research And Application Of Extraction Method Of Semi-structured Text Information
2	Research On Semi-structure Information Extraction For Web
3	Study Of Mining Data Streams Based On Semi-Structured Data
4	Research On Data Mining Technology Of Semi-structured Data
5	Research On Related Technology Of Frequent Pattern Mining For Semi-structured Data
6	Study On Semi-structured Data Mining
7	Research On Structured Data Extraction From Web Forums
8	Research Of Pattern Extraction From Semi-structured Data Based On Rules
9	Information Extraction Research And Application From Network Data
10	Research On Methods Of Semi-structured Data Implication Rules Extraction