Font Size: a A A

The Research Of Frequent Pattern Algorithm Based On Web Log Mining

Posted on:2012-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:J FengFull Text:PDF
GTID:2218330338470698Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and rapidly growing in popularity, Web sites has become the main platform to manufacture, release, handle and process the data information for people; at the same time, web sites'structures have become more and more complex, and the quantity of data is also rapid expansion on the web. How to mine the potential, useful knowledge information, and use this knowledge to improve the web site structure, eventually, it gives people better services and make the web site owners get more profit, etc. For these problems which have been paid attention by the web site owners, in web domain, the traditional data mining theory and technology has been introduced by mining Web logs to get useful information and patterns, nowadays, providing personal services for Web users, business intelligence, improving system performance and optimizing Web sites, data mining method and technology has been applied in these domain. Now, web log mining based on clients' web log data has been paid more attention by many researchers.In thesis, about data mining theory and integrated processes of Web log mining are introduced in detail, and some innovations and improvement solutions for these problems of Web log mining are put forward.In the fist place, the research meaning, background, data source and the whole procedures of Web logs data preprocessing are systematically introduced in thesis. Some related knowledge and the solutions to the problem for clients'web log are mainly discussed, and then introduce the characteristics of clients'web log and the difference between the server log and client log.In the second place, an improved method is proposed after analyzing the shortages of current calculating interest-level methods on web pages, which is based on the clients' web log data and calculating the real browsing time. By analyzing improved method is more reasonable and truer to reflect the users'interest-level on web pages. Next step is to analyze the direct structure graph, and the interest-level value is viewed as the weighted value of graph mentioned before, and these weighted values are assigned to the corresponding node of graph, at last, produce the weighted directed graph.Lastly, we can mine user's frequent patterns based on weighted directed graph and user access transactions database, now the new improved algorithm—GTWF algorithm, and this algorithm is used to mine the users'access frequent pattern. In this algorithm, can realize to mine graph by pruning operation and producing operation according to some concepts such as weighted support degree, extensible mode and weighted frequent patterns, eventually, the experiments on performance of the algorithm was verified. This algorithm was verified by doing experiments.
Keywords/Search Tags:Web log mining, data preprocessing, client log date, interest-level of page, weighted frequent access pattern
PDF Full Text Request
Related items