Font Size: a A A

Research On User Access Pattern Mining Model Based On Web Log

Posted on:2016-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WeiFull Text:PDF
GTID:2308330461968799Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, users’requirements continue escalating, the applications based on Internet technology have penetrated into all aspects of social life with an astonishing rate, and Web site has already become a huge information collection and distribution center. The urgent problem for us to be solved now is how to locate the information from the ocean of information quickly, efficiently and accurately to meet the users’demands. Based on the complex characteristics of the Web site itself, data mining technology has been applied to analyze the Web site, and Web mining technology emerges.As an important branch of Web mining, Web log mining aims to analyze the server logs, capture the information of users’browsing behavior and interests, so as to guide the reconstruction and optimization of the Web site, and provide users with more high-quality service. This paper have summarized the related concepts of Web mining systematically, discussed the entire process of Web log mining from four phases, such as data acquisition, data preprocessing, pattern mining and pattern analysis, and focused on the sequential pattern mining methods and its applications in the field of Web user access pattern mining. The main contents include:(1) We have proposed the modified Variable Support Sequential Pattern Mining algorithm (Variable Support Sequential Pattern Mining, VS_SPM). We have given a brief summary of the existing sequential patterns mining algorithms, in view of defects of setting minimum support threshold for the existing sequential pattern mining algorithms, VS_SPM algorithm has been proposed. The algorithm employs the matrix storage structure to reduce the times of scanning database, and then sets variable minimum support threshold for different levels of frequent sequential patterns. At last, test datasets generated by IBM Data Generator are to verify the validity of the algorithm. The results show that the proposed algorithm can overcome the "combinatorial explosion" and "rare item problem" due to the unreasonable threshold.(2) We have advanced the Web User Access Pattern Mining algorithm based on browsing interest (Interested Web User Access Pattern Mining, IWUAPM). We have modified the VS_SPM algorithm according to the sparsity presented in the Web server log data. First, the proposed algorithm constructs the model of users’browsing interest by combining visiting time, accessing frequency, page size and degree. Later, SD (Support Difference) and LS (Least Support) are defined, and then based on the idea of multiple minimum supports and weighted, the users’browsing interest as page weight is introduced to excavate user access patterns. Finally, simulation experiments have been done with the processed log of the server of Chongqing Agricultural and Rural Information Network. Test results show that the proposed algorithm can obtain users’ interested patterns, thus to guide managers to improve the design of Web site and the quality of customer service.
Keywords/Search Tags:Web mining, Web log mining, Sequential Pattern mining, User access pattern
PDF Full Text Request
Related items