Font Size: a A A

Research Of Web Mining Based On XML

Posted on:2008-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2178360212481357Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
By rapid progress of Internet, it has been the densest and abundant information source. Then finding the information from large data that the users can be interested in has been attracting more and more attention. Web mining is an effective technology of extracting useful patterns and information from Internet. XML can transport structural data because it is extensible,structural, effective. So the combination of XML and web mining has been paid more and more attention in Web mining field.My research started with the combination of XML and web mining. Advanced a web mining system based on XML, designed the function of web page mining subsystem and brought up the solution by applying of XML.In this paper, we study the Internet data switch technology of XML recent years. In the process of data preprocessing, implemented the algorithms from converting HTML web page into XML documents. This approach aims to offer a general purpose methodology that can automatically convert the HTML web page to XML document without any tuning for a particular domain.The web data remained as different formats, which is called semi-structured data. As one of the main technology in the field of data mining, association rule is used to determine the relationships among the attributes or objects, to find out valuable dependencies among the fields. The frequent itemset mining is the key problem in association rule generating, but traditional methods can not be used on semi-structured data directly. This paper implemented the association rule through mining the semi-structure data model, i.e. frequent subtrees. We improved on algorithm TreeMiner by pruning in the process of mining frequent subtrees. The result of experimentation proved that the pruning is effective by reduced the counting times and saved the time.
Keywords/Search Tags:Web Mining, XML, Mining Frequent Subtrees
PDF Full Text Request
Related items