Font Size: a A A

Research Of Weighted Association Rules Algorithm On Web Data Mining

Posted on:2010-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:G DiFull Text:PDF
GTID:2178360272979353Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining refers to a procedure where some implicit, undiscovered, useful knowledge is extracted from large amounts of data. Because of the fast development of Internet, more and more data has been generated on the Web. By applying the approaches of Data Mining into web to solve some problems, a new field Web Data Mining is presented.Web Data Mining has many research points. Mining association rules is one of the important hotspots. At first, this thesis expatiates on Data Mining, Web Data Mining, Web data preprocessing and some relative knowledge. Then, the basic theory and classical algorithm in association rule mining are researched. Finally, to solve the unbalance and different importance of individual item in database, the thesis puts the emphasis on weighted association rules algorithm.The famous algorithm New-Apriori algorithm is thoroughly researched. Some problems in this algorithm are pointed out. First, there is a shortcoming in itemsets connection. Second, multiple database scan is needed to add up the count of candidate itemsets, which has severely influenced its efficiency. Third, New-Apriori algorithm doesn't prune candidate itemsets, which can keep down a lot of useless candidate itemsets. An improved algorithm, WARDM(Weighted Association Rules Data Mining), is put forward in this thesis, which can make up the above-mentioned flaws of New-Apriori very well. This algorithm discusses generations of candidate-1 sets, candidate-2 sets and candidate-k sets(k>2), which can avoid missing weighted frequent itemsets; The sets of transaction identification number are used to add up the count of candidate itemsets, which can scan transaction database for only one time, therefore, it reduces the scanning times; On the basis of the character of weighted association rules, candidate itemsets are pruned twice, which can reduce the amount of candidate itemsets. Experimental results show that the new algorithm gets less time consumption than New-Apriori does, greatly enhances the efficiency. Meanwhile, the new algorithm can decrease the scale of candidate itemsets.
Keywords/Search Tags:web data mining, association rules, weighted association rules, New-Apriori algorithm
PDF Full Text Request
Related items