Font Size: a A A

Based On The Theme And Structure Of The Xml Page Data Extraction

Posted on:2006-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhuFull Text:PDF
GTID:2208360152991670Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Internet has changed the world greatly with its power. In recent years, it takes great change to our society. Internet becomes a big source of Information and how to retrieve useful Information from complex data precisely and completely becomes a very important task.With the increasing application of Web, people feel that HTML can not follow the step of the increasing demand and invent a new kind of Web language—XML. With the development of XML technology, some XML pages appear in the Web. In this paper we put forward an information extraction method for given topics.The following, we discuss a topic oriented pattern and data extracting system for XML document and concentrate on the realization of the pattern and data extracting algorithm we presented. The core technology is parsing XML document, extracting pattern and data from sample documents. In fact, pattern information is the structure pattern of semantic block. We get pattern information of one topic through comparing semantic blocks. We extract the informatipn based on the rules and submit the result to the client...
Keywords/Search Tags:XML, tree structure, Pattern Extraction, Data Extracting
PDF Full Text Request
Related items