Font Size: a A A

Subject-oriented Mode Of The Xml Page And Data Extraction

Posted on:2005-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:L DengFull Text:PDF
GTID:2208360122995518Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the amount of information on the Web has increased greatly, which provides users a massive and valuable information resources. But with Web's rapid growth, quickly obtaining what users need on WWW is getting more difficult because of Internet's opening and heterogeneity. How to quickly, accurately find the needed information from many information resources has become a difficult problem that puzzled the Internet users.Some XML tagged Web documents have been come forth in WWW, with the development of XML technology. In this paper, we put forward an information extraction method for given topics. It bases on the user's queries topic and attributes of it, and presents a method to extract pattern information from the sample XML documents. And then uses the pattern-matching algorithm to recognize and extract all occurrences of the extraction pattern in the parsed target XML documents.The Web mining technology and information extraction research is first discussed in this paper. The following we discuss a topic oriented pattern and data extracting system for XML document. Concentrate on the realization of the pattern and data extracting algorithm we presented. At last, we use an example to test our algorithm. The result of the experiment are validated our system is reached the expected general and precise requirement.
Keywords/Search Tags:Information Extraction, XML, Pattern Extraction, Data Extracting
PDF Full Text Request
Related items