Font Size: a A A

Design And Implement Of Information Extraction Based On XML

Posted on:2008-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:S H ChengFull Text:PDF
GTID:2178360242971522Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet last years, Web has had a tremendous range and shaped a platform to share information. How do we get information quickly and efficiently in Web? It is a problem to disturb Internet users all the time. Much research has focused on the study of Web data extraction, while its current status is still rare from satisfaction of Web users. XML has become the standard to represent data in Web and it provides a uniform data model for Web data.The dissertation reviews the state of Web information extraction and presents an applicable Web information extraction method based on XML. The further analyses and study are made for some key technologies, such as HTML to XML transformation, Web information extraction method. I wish to make some contributions for information extraction.①Based on XML technology, the dissertation analyzes the information extraction technologies which are now increasingly popular of and the scope of their application. Moreover, the dissertation adopts common algorithm in data structure: the tree traversal algorithm so that we can implement Web to HTML transformation. Thus it simplifies information extraction and conveniently forms XML documents to get ready for extract appropriate date.②The criterion of robust information extraction in XML is analysed. The criterion is applied in XML information extraction: Location special area and mapping merging data. Good methods are provided for each. The results show that the methods are effective.③The prototype is completed. Information extraction, XML and Visual Studio.NET are comprehensively utilized in the prototype by combining those two theories. The prototype provides a general Web information extraction solution based XML. It has good adaptability and portability.Above all, this paper analysis the Web information extraction in technology, standards, designing and implementation, and experiments prove its feasibility. So, designing and implementation of information extraction based on XML has a certain theoretical and practical value. And it provides the technical support for the latter work of the information extraction.
Keywords/Search Tags:Web Information Extraction, XML, Transform, Robustness
PDF Full Text Request
Related items