Research On Technique Of Web Log Mining

Posted on:2009-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Guo

Full Text:PDF

GTID:2178360245989325

Subject:Computer application technology

Abstract/Summary:

With the rapid development and increasing popularization of the Internet, there are more and more Web log resources available on the web. How to analyze and use this huge amount of data has become a serious problem at present. Web Log Mining is a new technique for network information processing, and an important application of data mining on the internet. Web Log Mining is an application of data mining in web server log to obtain the pattern and the access behaviorial mode of the users. This helps to improve web site structure, its access quality and its performances.Data preprocessing is an important step of the Web Log Mining, which determines the performance of pattern recognition and pattern analysis algorithm. Web log preprocessing consists of data cleanup, user recognition, dialog recognition, path complement and transaction recognition. This thesis studied each individual steps of the Web log preprocessing, and introduced the relavent methods to each parts. Based on the analysis for the current dialog structural algorithms, a method for estabilishing a dialog by combining two time windows was presented. Frequent Sequential Pattern Mining is an important research field of Web Log Mining. Since the sequential pattern mining algorithm of the class Apriroi needs to scan sequence database multiple times, which produces enormous sets of candidate data, WAP-Tree structure was used to store transaction sequence in this thesis, which only needs to scan the database twice. The WAP-Mine algorithm produces conditional sub-trees recursively, which consumes memory space. Due to the deficiency of WAP-Mine algorithm for the WAP-Tree, a new WAP-Tree-based NWAP-Mine algorithm was proposed, and its validity has been proven by experiments. Due to the lack of weighing of web pages in the existing sequential pattern mining algorithm, a definition of interest-level based on the average dwell time is proposed. In light of the deficiency in the exsiting interest-level on web pages, a improved version of the web page intrest-level is suggested. This interest-level is weighed in the weight sequential pattern mining algorithm in finding the access path that interests the users. It has been demonstrated with experiments that using the improved interest measure in sequential pattern mining can produce access mode that better reflects user's access behavior.

Keywords/Search Tags:

web log mining, data preprocessing, sequential pattern, interest measure, WAP-Tree

Related items

1	Research On The Sequential Pattern Mining Algorithms Using Prefix-tree Structure
2	Design And Implementation Of The Phone Virus System Based On Sequential Patterns Mining
3	Constraint-based Sequential Pattern Mining And Its Applications
4	Mining User Traversal Sequential Patterns Based On User Traversal Interest From Web Log
5	Research On Intrusion Detection Based On Sequential Pattern Mining
6	Mining Sequential Patterns With Periodic General Gap Constraints
7	Pre-order linked WAP-tree mining of sequential patterns
8	The Research Of Frequent Pattern Algorithm Based On Web Log Mining
9	Research On An Algorithm For Time Sequential Pattern Mining
10	The Sequence Association Rules Mining For The E-commerce Personalized Recommendation