Font Size: a A A

Improved Apriori Algorithm In Web Log Study

Posted on:2014-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:T H ShaoFull Text:PDF
GTID:2268330401473436Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet, the contradiction between the rapid growth of information and people’s limite attention increasing, It is hard to find relatively valuable information from the vast amounts of information, how reasonable to construction sites, more targeted to provide users with personalized service, is a topic of concern for all web designers. A variety of behaviors and characteristics of users access the site are implicit in the Web log, Web log mining is an effective way to solve this problem. Web log analysis of user behavior so as to improve the page, optimize the structure of the site, and thus help users of accessing the website more easily find the content of interest.This paper first introduces the Web log mining background and the status quo at home and abroad, Web log mining technology and mining results analysis are summarized by the study of the status quo at home and abroad, on this basis the algorithm to improve.Followed by the structure of the Web log analysis, we analyze this paper’s Web log data source, By the purpose of this thesis, we select the data source, and then data cleaning, user identification, session identification and path supplement pretreatment, and gives the results of the algorithm flowchart and data.The improved algorithm for path supplement PFS algorithm, through improving algorithm simplifies the step of the pretreatment, reduces the pretreatment time and improves efficiency.And then for demanding of Web log preprocessing we analyse the conduct of the classic Apriori algorithm, through summarizing classic Apriori algorithm’s insufficient and analysis of the current improved algorithm, Improve the advantages and disadvantages of the algorithm based on an improved Apriori algorithm based on an array of vectors, By the advantages and disadvantages of the improving algorithm,we set up an improved Apriori algorithm based on an array of vectors. The improved algorithm will change reading disk to memory, reduce the I/O times, itemsets in addition to some condition is not met, in the judgment of the connection to reduce the number of connections to reduce connection time, the dimension through judgments transaction data itemsets and generate the candidate set to reduce the number of scans, and speed up the generation of frequent itemsets speed,and improve the efficiency of mining.Finally, this article through the improved algorithm experiments,analyzes the site of a university web log, experimental results show the efficiency of the improved algorithm, From association rules and the page clustering operation the results of the data mining, we mine user access behavior and implicit interest in the site.lt is important for site improvements and personalized service.
Keywords/Search Tags:Data Mining, Web Log Mining, Association Rules, Apriori Algorithm
PDF Full Text Request
Related items