Font Size: a A A

Web Usage Mining Technology Research And Realization

Posted on:2008-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2208360215450349Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of internet, digital resource in internet become more and more abundant. Thousands uponthousands consumers browse and search useful information for themselves in internet everyday. But, it's very difficult to find useful information in time for each consumer because of the giant communication in internet. To solve this problem, Web mining techniques occur in season. Especially, lots of researchers pay more attention to the Web usage mining which face to Web server logs. Web logs record the visit information of Web site visitor. Therefore, we can obtain the browsing behavior and visiting habit of the customers by analyzing the Web logs, it has very important meaning to recombine pages, optimize the structure of Web site, improve capability of Web system and enhance the application of Electronic Commerce. This dissertation analyse and research the Web mining and Web usage mining by the numbers. Based on this, some novel techniques and methods are given in this paper. The main contents of this dissertation are as follows:1. Summarize the correlative knowledge and technique of data mining and Web usage mining, expatiate the meaning, actuality of research and facing problems of Web usage mining.2. Discuss the three phase of Web usage mining: Data Preprocessing, Pattern Discovery and Pattern Analysis. Moreover, the application fields and research directions of Web usage mining was analysed.3. In chapter 4, we investigate the preprocessing process of Web usage mining and analyse some typical Web log preprocessing techniques. Then a frame-filtering algorithm is applied to the process of Web log preprocessing to eliminate the influence of subframe and improve the efficiency and veracity of Web mining system.4. In chapter 5, this thesis first introduces the existent clustering techniques used in Web mining. Afterwards, a classical clustering algorithm which based on distance matrix was analysed in detail and pointed out that the algorithm have some limitation, such as lack of veracity and maneuverability. In order to solve this problem, this thesis put forward a new and fast clustering algorithm based on relative Hamming distance and design a simple web usage mining experimental system to actualize the algorithm. The results of the experiment indicate that we can get more exact similar customer groups and relevant Web pages by applying this algorithm.
Keywords/Search Tags:Web usage mining, Preprocessing, Clustering algorithm, Relative Hamming distance
PDF Full Text Request
Related items