Font Size: a A A

The Research On The Algorithm Of Mining User's Frequent Access Path From Web Log

Posted on:2012-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2178330332498060Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network, the number of E-commerce sites increases gradually. How to optimize the site structure, or to provide some personally service initiatively based on customer's behavior, become a problem which have plagued the site administrator, web log mining points out a new direction for solving this problem. As an important branch of web mining, web log mining has become a research focus.Web log mining uses traditional data mining techniques on the web logs to find the behavior patterns and interests of users in visiting the site, as well as analysis of site usage. This thesis makes a deep research on mining user's frequent access path from web log.This paper makes an in-depth study of data preprocessing. It introduces the concept of page-level in the session identification phase, which makes the determination of the page browsing time threshold more accurate. Subsequently, it makes some improvement of transaction identification and proposes the IMFR algorithm which merges path completion and transaction identification to simplify data preprocessing.The author focuses on two types algorithm which are used in mining frequent path, the first type of algorithm generates candidate set, another is an algorithm without generating candidate set. This paper mainly studies Wap algorithm which belongs to the second algorithm, and proposes an improved algorithm which name is NGCWAP basing on wap algorithm.NGCWAP algorithm avoids the construction of the physical condition tree by using the pre-order traversal number and post-order traversal number to record the sub-trees in which candidates locates.Finally, The author implements a web log mining prototype system bases on B/S structure. This system users IMFR algorithm and NGCWAP algorithm to mine user's frequent access path. In addition, the system can also find some general law, for example, the most popular pages and the outside web sites which users are from. The paper does the detailed testing work on the improved algorithms and system, and carries on the detailed analysis and summary on the test result.
Keywords/Search Tags:Web log, Data mining, Sequential pattern, Frequent path, Wap algorithm
PDF Full Text Request
Related items