Font Size: a A A

Application Research On Of Web Content Mining Based On XML

Posted on:2008-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:J DingFull Text:PDF
GTID:2178360212480771Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Aiming at the problem of information blast that is the information is abundant whereas the knowledge is absent; the theme of this paper is the research on application of web content mining based on XML. After introducing the related theory knowledge of data mining, Web content mining, XML technology and so on, the principle of Hypertext Induced Topic Search (HITS) algorithm in web mining is studied in detail firstly. The reason for topic drift problem of this algorithm is analyzed subsequently. And also, kinds of improved algorithms are discussed. And then, ECDM which is a semi-structured data model oriented to XML is put forward in order to express the semi-structured data on Web. At the same time, the Objects of ECDM are described; the data of ECDM model are formalized as well. Furthermore, the corresponding relation between XML document and ECDM is presented, which establish the foundation of web mining. In the end, the main frame of web content mining system based on XML is designed, and then the processes are expounded in a nutshell.
Keywords/Search Tags:Data Mining, Web Content Minging, XML, HITS, Semi-structured Data Model
PDF Full Text Request
Related items