Font Size: a A A

Identify And Extract Web Information And The Emergence Mode

Posted on:2006-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q LeiFull Text:PDF
GTID:2208360152491714Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and opening characteristic of the Internet, the amount of information has increased greatly. Web has already become an indispensable information sources for people. There is a large amount of information that describes interrelation of entities on the Web; meanwhile lots of valuable information is hidden in the interrelations between the entities. However, today's search engines which search information relying on keywords matches, lack the ability of knowledge manipulating and understanding, so it can not discern relations on the Web.In this paper, we take XML which is a new standard of information issue and exchanging on the Web as the object of our researching, and put forward a method concerning about mining relations and patterns in XML documents on the Web. This method first collects XML documents according to user's requirement, and then it discerns target XML files which contain relations required by user by calculating similarity between XML documents. At last it establishes user's searching pattern and use pattern-matching algorithm to extract all relation occurrences from target document.Experimental results show that our similarity calculating method in this paper can be used to discern target XML document in a goodperformance. At the same time, the way we represents user's requirement and the pattern-matching algorithm we take is able to extract the most target relations from given XML documents accurately.
Keywords/Search Tags:relations, XML similarity, pattern matching, data extracting
PDF Full Text Request
Related items