Font Size: a A A

Research And Implementation Of Website Log Analysis System Based On Web Usage Mining

Posted on:2015-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:L WengFull Text:PDF
GTID:2308330461974991Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Website gradually became the most important carrier of storage, release, acquisition and information exchange. However, with the sharp increase in web site data, people met a lot of difficulties in message retrieval. To solve this problem, in addition to relying on search engines, web designers should also start from their own web site design. That is to say, designers should design and optimize their site for the convenience of the users in the web design, which requires them to obtain useful information by analyzing the web data, so as to provide help for the website design. Web log is a part of the web data, but also more complete and more structured data. Through the analysis of web log, we can find out the correlation in the contents of pages of web sites, and analyze different preferences and intentions of access of all kinds of users, so as to provide improvement suggestions of the structure and content of the website and improve the overall performace of the site.This paper refers to a large number of domestic and international literatures on web usage mining, on which basis it analyzes the related algorithms of web usage mining, improves part of the algorithms, and verifys the effect of improving through experiments. This paper designs the website log analysis system based on web usage mining so as to realize the algorithms, and apply the system to a specific web site. This paper includes the following:First, this paper focuses on the preprocessing of web usage mining, including page identifying, user identifying, session identifying, the transaction database generating and so on. Besides, it analyzes the existing algorithms, chooses appropriate algorithm, and proposes a preprocessing algorithm based on "paper content feature", which emphasizes more on the semantic content of the page and makes the subsequent analysis results more valuable. Second, this paper researches on correlation analysis and cluster analysis. It analyzes the classical Apriori association mining algorithm, proposes an improved Apriori algorithm to improve the efficiency of the algorithm, and prove the effectiveness of the algorithm through comparative experiments under various amount of data. This paper also analyses the work of the clustering analysis, proposes the improved "User-Page Feature Access Matrix", and then uses classical k-means algorithm to cluster analysis. Third, this paper introduces the realization process of the website log analysis system based on web usage mining, and presents the outline design and the detailed design, including the functions of the main modules of the system, the database design, data flow diagram, the algorithms used by the various modules, and the key source code of the main classes. Fourth, this paper is also aimed to apply the system to a specific web site, so as to obtain experiental results through mining and analysing the web logs, and the results are analyzed to propose the suggestions for improving the structure and content of the website, thus proving the effectiveness of the system. Finally, this paper summarizes the results of this study and shortcomings, and propose next step to work towards.
Keywords/Search Tags:Web Usage Mining, Data Preprocessing, Association Rules, Apriori Algorithm, Clustering, k-means Algorithm, Website Log Analysis System
PDF Full Text Request
Related items