Font Size: a A A

Association Rules Algorithm And Its Application In Web Log Mining

Posted on:2009-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:G L LongFull Text:PDF
GTID:2178360245968388Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the fast developing and spreading of Internet, digital information increases daily with high speed. There is a mass of information on Internet, how to look for useful information is a difficult problem with network users. Web Mining is Data Mining techniques used on Web, is a process of extract interesting and potentially useful patterns and implicit information from Web documents and the data of websites-browse. Web Mining is composed by Web Content Mining, Web Structure Mining and Web Usage Mining. Web Usage Mining can extracting patterns from the data of websites-browse and understanding users' browse behavior to improve the structure of website and provide personal service for users.Association Rules is a very useful knowledge patterns exists in database, and its purpose is discovering interesting relationship from itemsets of data. The Association Rules algorithm has been paid extensive attention and study and has made great progress. Apriori is the most influential algorithm which mining frequent itemsets with boolean association.This article researched the typical Association Rules mining algorithm called Apriori based on research the theory of Data Mining. In the process of mining frequent patterns, Apriori algorithm generates a huge number of candidate itemsets as well as needs multiple scans over database. So the time and space complexity is too high. According to the existing flaws of Apriori algorithm, we want to improve the mining efficiency from two aspects: (1) reduce the size of transaction database, (2) decrease compare times of candidate set is frequency or not. This article gives 3 improved algorithms: (1)To reduce the size of transaction database and decrease the times of scan database, delete the transaction data which length less than k before generate frequent itemsets, (2)to reduce times of scan database, use "count" function and "and" operation of SQL query language to judge the candidate set is frequency, (3)combine 2 methods hereinbefore, delete data which length less than k to decrease the size of transaction database and use "count" function and "and" operation of SQL query language to reduce the times of scan database, improved the algorithm further. At last, this article test the 3 improved algorithms in different size of transaction database, and compared the mining result. Designed a virtual host log mining program use the higher efficient algorithm. In order to record the log of Site of Chinese Engine because the virtual host Site of Chinese Engine used has no log-record function, we designed a module to record log included in pages. And mined association rules from the recorded log data, analyzed the relationship of pages based the mining result, found users' browse habit, provide decision support for improve site's function and structure.
Keywords/Search Tags:Data Mining, Log Mining, Association Rules, Apriori algorithm, Virtual Host
PDF Full Text Request
Related items