Font Size: a A A

Research And Application Of Web Log Mining Based On Association Rules

Posted on:2017-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:B W TanFull Text:PDF
GTID:2348330533950151Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and Internet, Web has become an important platform for global information sharing and communication. It has brought great convenience to people's study, life and work. However the explosive growth of the web information brings the "information overload" and it becomes very difficult to obtain knowledge from a huge amount of information, so the website designer have to consider that how to provide accurate and personalized service for user. The Web log mining is a good way to solve this problem. By analyzing the Web log data we can find the users' potential information and improve the Web design and service.This paper first introduces the Web log mining and data preprocessing.Data preprocessing is the base of Web log mining, so the quality of the preprocessed data will directly affect the results of data mining. While determining the user identity is a very important and difficult part in data mining, this paper focuses on the existing user identification process of the heuristic rules algorithm, then the Cookie, URL and the user's residence time in the access page as the corresponding heuristic rules are added to the user's identification process. Therefore, a new user identification method is produced. The experimental results show that the improved method has higher efficiency and precision.According to the existing technical development route, the Web log mining can be divided into two categories. One is that the Web log data is processed into a standard format which can be stored in the database, then mining and analyzing the data in database by using data mining algorithm. The other is that mining and analyzing the Web log data directly after it is cleaned. This paper adopts the first method that using the traditional association rule mining method to deal with the Web log data, so as to find the users' frequent access patterns which can be used to adjust the structure of the Web site or to provide users with personalized service. This paper proposes an improved algorithm based on AprioriTid. Experimental data show that the improved algorithm has good performances in time and space efficiency. Finally, the improved algorithm was applied to mine the preprocessed Web log, and a brief analysis is made on the mined results.
Keywords/Search Tags:Web log mining, Web log preprocessing, Association Rule, AprioriTid
PDF Full Text Request
Related items