Font Size: a A A

Xml-based Web Data Mining Research

Posted on:2010-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2208360275498903Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Web Data Mining is a mining technology which discover and retrieve information from the Internet. Currently, the datas on the Internet have no specific model, most of which are semi-structured or even unstructured data, which bring data mining a great deal of trouble.First of all, according to the characteristics of web data mining and the application of XML in Web mining, this paper designs a XML-based web data mining model, explains the process of HTML documents transformed to XML documents and analyzes the key technology in the process. The problem of data mining for XML documents are mainly discussed in the paper.Secondly, the paper researchs the basic theory and the process of the Apriori association rules algorithm, analyzes the shortcomings of the algorithm, proposes an improved segmentation algorithm which based on the transaction length. The experiment shows that our method improves Apriori algorithm performance.Finally, the paper describes the basic theory and the process of k-means clustering algorithm detailedly, analyzes the algorithm's dependence on the initial centroids, make some improvments as follows: (1) The paper proposes a new method of excluding isolated point which will be avoided as initial centroids; (2) According to the idea of density, the paper selects initial centroids which should keep the maximum distance. The experiment shows that our method get better results.
Keywords/Search Tags:XML, Web data mining model, Apriori, k-means
PDF Full Text Request
Related items