Xml-based Web Data Mining Research

Posted on:2010-09-26

Degree:Master

Type:Thesis

Country:China

Candidate:P Wang

Full Text:PDF

GTID:2208360275498903

Subject:Control theory and control engineering

Abstract/Summary:

Web Data Mining is a mining technology which discover and retrieve information from the Internet. Currently, the datas on the Internet have no specific model, most of which are semi-structured or even unstructured data, which bring data mining a great deal of trouble.First of all, according to the characteristics of web data mining and the application of XML in Web mining, this paper designs a XML-based web data mining model, explains the process of HTML documents transformed to XML documents and analyzes the key technology in the process. The problem of data mining for XML documents are mainly discussed in the paper.Secondly, the paper researchs the basic theory and the process of the Apriori association rules algorithm, analyzes the shortcomings of the algorithm, proposes an improved segmentation algorithm which based on the transaction length. The experiment shows that our method improves Apriori algorithm performance.Finally, the paper describes the basic theory and the process of k-means clustering algorithm detailedly, analyzes the algorithm's dependence on the initial centroids, make some improvments as follows: (1) The paper proposes a new method of excluding isolated point which will be avoided as initial centroids; (2) According to the idea of density, the paper selects initial centroids which should keep the maximum distance. The experiment shows that our method get better results.

Keywords/Search Tags:

XML, Web data mining model, Apriori, k-means

Related items

1	Design And Implementation Of A Book Recommendation System Based On Apriori And K-means Algorithms
2	Research And Application Of Campus Card Consumption Based On Data Mining
3	Research And Application Of Data Mining In In-Surance Customer Data
4	Application Of Data Mining Methods In B2C E-commerce
5	Research On The Application Of Data Mining Technique In Snort
6	Research And Application Of Apriori Algorithm Based On The Compressed Matrix
7	Research On Financial Loss Customer Mining Model Based On K-MEANS Clustering And Association Model
8	Research On The Application Of Data Mining In The Combination Business Of The Logistics Enterprisesâ€™ Key Customers
9	Research On The Analysis Of Moblie Devices Usaged Based On Data Mining
10	The Research And Application Of Property Transactions Tax Based On Data Mining