Font Size: a A A

Research And Realization Of Data Preprocessing And Association Rule Algorithm In Web Log Mining Technique

Posted on:2014-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:C L WenFull Text:PDF
GTID:2248330398470885Subject:Logistics engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the server generates large quantities of unstructured disorganized log files every day, and thus the Internet can be seen as a huge database. It has been found that the database log files bury unlimited valuable information. With the maturing of data mining technology, it has become an important research direction of data mining to apply mining techniques to the log files of Web server. The potential use rules and patterns of Web users can be dug out by analyzing the log files. Thereby, applying these research achievements in various fields has important theoretical value and practical significance in the discovery of potential value as well as improving the service quality and efficiency of Web.The paper mainly aims at the analysis and research of the existing data preprocessing methods and association rule algorithms in the Web log mining technique. And the lack of existing algorithms is improved so as to provide methods which have better performance and are more suitable for Web log mining. The paper mainly includes the following two parts:Firstly, the processes and methods of data preprocessing which are used in Web log mining are introduced. The user identification process technology is mainly researched and based on this basis, the rewrite URL session tracking technology and the IASR user identification algorithm with the combination of heuristics are put forward. The algorithm is used to accurately identify users in the server logs by using the session mechanism, IP address and access time to. Experimental results show that the improved algorithm has better performance than the original user recognition algorithm.Secondly, under the premise of the classic Apriori algorithm based on association rule, the degree of interest threshold is introduced to improvement the algorithm and a new association rule algorithm based on the idea of combining the degree of interest, the confidence and support is proposed. The algorithm finds frequent item sets through a hash table and is realized based on association rule which is generated to meet the requirements of minimum support, minimum confidence and minimum degree of interest.The experimental data show that the improved algorithm has better performance in space and time and it improves the execution efficiency of algorithm.
Keywords/Search Tags:Web log mining, rewrite URL, heuristics, degree ofinterest, Apriori algorithm
PDF Full Text Request
Related items