Font Size: a A A

The Research Of The Clustering Mining Based On The Web Usage Data Preprocess

Posted on:2005-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ZhangFull Text:PDF
GTID:2168360122498820Subject:Computer applications
Abstract/Summary:PDF Full Text Request
As a kind of media which connecting the dispersed information on World Wide Web, hyper- links are widely used at a dramatic speed. Through hyper-links, we get large amount of useful information; at the same time, they also challenge us to analyse the web user's behaviors and the characters of the web resource. But as the increase of the capability and complication of the website, just handling the server log simply by using some statistical methods is not enough. Discovering some kind of unknown valuable information (knowledge) from user's access records (logs) through different data mining methods is the goal what the Web Usage Mining want to achieve.In this paper, after summarizing the relate knowledge about data mining and data warehouse, we introduce the basic conceptions, classification and development of the web mining. After present the meaning of the data preprocess, we analyse how to wipe off waste data, how to distinguish the useful one from the feigned, and how to dispose the log data into different granularity-that is to say the whole process of the usage data preprocess. We also put forward the model of the usage data preprocess which based on the W3C's logdata structure strictly, and design the experiment to validate the rationality of our preprocess model.And then, from the preprocess direction, we analyse the meaning of the clustering - a kind of data mining function, introduce several clustering ways. Aiming at the usage preprocess results, page views (some kind of click-stream), we use the conception of the Longest Common Subsequence (LCS) to define the similarity between click-streams, form the non-directed weighted graph, and then we simplify character of the method about the graph partition clustering.At last, we summarize the whole paper, and prefigure our future works logically.
Keywords/Search Tags:web usage mining, clustering, session, page view, click-stream, Longest Common Subsequence
PDF Full Text Request
Related items