Font Size: a A A

Research Of Web Information Extraction & Practice Based On Web Service

Posted on:2005-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ZhangFull Text:PDF
GTID:2168360125454813Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As Internet rapidly developing, World Wide Web has already become the biggest information resources. But the most of valuable Web information is in HTML form, which marked by HTML and aim at representation and lack of schema and semantic information, in order to access the Web information with structured and uniformed way, people apply information extraction technology to Web.In the past work, we have implemented a protype system, it be proved be work well by experiment. But at present, no work has analysised the Web structure in theoretic way.Now, in this paper, we introduce unnest/nest theory to illustrate the Web page structure.After studied some typical information extract systems based on structure, we find there are four kinks of nest/unnest: (1) unnest of set object;(2) unnest of record object;(3) too big granularity of DOM node.Under this theory,we provide deffrent Restructure_Rull for every land of nest/unnest, and analysis the adaptability of the Restructure_Rull. By Restructrue_Rull, we can use other information extract technology in our system, by this way we find a method to integrate many information extract technology together.Web Service is the future of the Internet and provides a good solution for information integrate. We integrate Web Service technology with the information extraction technology and develop a protype system based on Web Service.
Keywords/Search Tags:Information extraction, semantic mode, DOM, unnest/nest, Web Service
PDF Full Text Request
Related items