Font Size: a A A

Research On Web Clickstreams Data Clustering Technology

Posted on:2010-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiFull Text:PDF
GTID:2218330371450054Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Great opportunities are created for corporations and organizations with the rapid development of WWW technology, meanwhile lots of web clickstreams data emerged. Quantities of latent useful information can be discovered by analyzing and mining clickstreams data in Web servers which can help managers and decision-makers to analyze market trend, look for potential customers, optimize website structure, enhance users' experiences, therefore it is an irreplaceable role in the development of enterprises. Clustering analysis in mentioned above is a significant field in web mining, which divides preprocessing data into several groups and guarantees the similarity in the same group and the dissimilarities among different groups as far as possible. Recently, data stream mining has become a hot research topic, so how to deal with a great deal of data stream rapidly, effectively and in real time is a challenge for relevant researchers.Focus on the features of web clickstreams, after analyzing traditional clustering and data stream algorithms, a clustering algorithm called WCSCluster applicable to static environment and the other one called WCSCluStream to dynamic environment are proposed respectively. The first algorithm, which aiming at statistic environment, defines the similarity between sessions, the storage structure of algorithm, the character and the verification of the algorithm; finally, the processing of the algorithm is also illustrated. The second one, which aiming at dynamic environment, proposes a triple frame structure called online-midline-offline based on CluStream double frame structure, building sliding window model based on Possion Procedure, and modifying the attenuation function in HPStream.In order to verify the validities of above two algorithms, experiment environment, the adoption of real dataset and artificial dataset are described in detail. The results demonstrate the two algorithms proposed in this paper are feasible and prior, comparing with the current similar algorithms in some performance index such as run time, memory consumption, clustering purity, etc.
Keywords/Search Tags:Web mining, Web clickstream, clustering analysis, stream clustering, Possion Procedure, sliding window
PDF Full Text Request
Related items