Font Size: a A A

Research And Application On Web Log Mining Technology

Posted on:2013-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y PanFull Text:PDF
GTID:2248330371999445Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularity of Internet and the development of computer technology, the scale of network users becomes larger and larger and the behavior that users access to the network also becomes more and more diversified and complicated. The network is widely used in e-commerce, providing online services and information search. In order to attract more customers, many commercial web sites constantly improve the service quality, the performance of web sites and the competitiveness. So how to improve the service performance of the sites? How to improve the site structure? And how to provide personalized service? For solving these problems, Web log mining technology is proposed by researchers, which is a research direction with more attention at present. The main data sources for log mining come from the log data in server-side. We can get the user access mode by processing the data sources with pretreatment and pattern mining, and then can know the access mode and interest of the group users and provide decision support for optimizing site structure and users personalized service.By studying and analyzing a large number of literatures, this paper studied on the key steps in the pretreatment phase of Web log mining and proposed the relevant algorithm. As to the disadvantage of K-means algorithm, this paper also proposed the improved algorithm, and put it into the application on Web log mining.Firstly, this paper introduced the current research actuality and then introduced the concept of data mining and Web mining in Chapter2.Secondly, this paper introduced five key steps such as data cleaning, user identification, session identification, path complement and transaction identification mainly and detailed. The accuracy of the data after pretreatment can affect the performance of data mining directly, so it is necessary to preprocess the log data. Chapter3gave detailed analysis and study to the three key steps as data cleaning, user identification and transaction identification in the current pretreatment, and proposed three algorithms for data cleaning, user identification and transaction identification respectively. Combining the time threshold, the algorithm for data cleaning can identify more users, which are in line with actual. According to the link relations between the request page and the reference page of log record and combining the time threshold, the new algorithm for transaction identification was proposed, which can identify many significant transactions.Finally, this paper introduced cluster analysis and mainly studied on the K-means algorithm. Aiming at the disadvantage of randomly choosing the initial clustering center in K-means algorithm, the improved K-means algorithm was proposed in Chapter4. Combining the hierarchical clustering algorithm AGNES, the proposed algorithm can get k initial clustering centers with higher density. The experimental results show that the accuracy has been improvement obviously and the iterations decrease. In this paper, the improved algorithm is still put into the application on Web log mining to cluster users, to get the demand of group users and to provide decision support for site optimization and users personalized service.
Keywords/Search Tags:Web log mining, data preprocessing, clustering, K-means algorithm, user similarity
PDF Full Text Request
Related items