Font Size: a A A

Research On User Access Sequential Pattern Mining Based On Web Log

Posted on:2015-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330503475086Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of web mining, web log mining occupies a pivotal position,which is currently one of the most popular application of research. The purpose web log mining is to analyze and research users to access web sites when web logs left to find the regularity implicit knowledge,get user access patterns that can enhance the performance of web servers and improve the topology of the web site while to provide users with intelligent service.This paper describes the web log mining system basic theoretical foundation and the general process and analysis of the current status in research web log mining.We study several key steps in web log mining deeply and submit a corresponding improvement and innovation.Web log mining mainly includes three processes: data cleaning, pattern discovery and analysis and application. First, data cleaning includes data collection and data preprocessing, so data preprocessing can be said as the foundation of the web log mining process,which can cause the direct impact of mining results and quality. In this paper, the current data preprocessing carried out a detailed analysis and an illustrative example.We find some problems in the session identifiction and propose a new method to indentify the session based on users' interests whose new session identifier is web site home page and navigation,according to the web site user's browsing habits.This method avoids the drawbacks of the original method.Meanwhile, on the basis of the original preprocessing joined the frame page filtering,we reduce the space consumption in session identification at log preprocessing stage and get access behavior more accurately.Secondly, pattern discovery is the core of web log mining, which aims to dig out some interesting knowledge through sequential pattern mining algorithm.In thispaper,we compare some classical algorithm of sequential pattern mining and select Prefix Span algorithm in depth study.Just for a large number of drawbacks intermediate data structure for its use of frequent pattern search strategy,we put forward our own ideas to improve and optimize the process of building a database of projection,aimed to reduce repeated scanning of sequence databases and improve the efficiency of the Prefix Span algorithm.Finally, the experimental results on these key issues are analyzed and compared,which achieves the desired effect.
Keywords/Search Tags:web log mining, sequential pattern mining, session identification, framework page filter, PrefixSpan
PDF Full Text Request
Related items