Font Size: a A A

Graph based click-stream mining for categorizing browsing activity in the World Wide Web

Posted on:2005-06-17Degree:M.SType:Thesis
University:The University of Texas at ArlingtonCandidate:Maniam, AbhilashFull Text:PDF
GTID:2458390008977827Subject:Computer Science
Abstract/Summary:
This thesis addresses the following question: Is there an inherent structure in the way users browse the Web? In terms of machine learning this can be rephrased as follows: Can we learn a concept that describes certain Web browsing activity like buying a digital camera, which is different from a concept describing another browsing activity like going through technology news using the click-stream of these browsing activities? A graph-based data mining tool SUBDUE, is used for learning such discriminating concepts from Netscape 7.1 browser click-streams. We have developed a component for Netscape 7.1 which logs a user's click-stream without hampering their browsing experience. These click-stream log files are converted into directed graphs which represent the browsing activity of the user. Since a click-stream log file is semi-structured in nature, a graph is an appropriate choice for representing it. We discuss various ways of constructing the click-stream graph and methods for adding additional contextual information to the graph to aid in SUBDUE's learning algorithm. We present results generated from synthetic click-streams and browser click-streams.; Our results demonstrate that SUBDUE is capable of learning recursive rules (which describe a structural pattern) for classifying client-side click-streams. The accuracy of the structural pattern for classifying (as "search" click-streams or "random browser" click-streams) client-side click-stream logfiles is greater than the accuracy of a decision tree classifier.
Keywords/Search Tags:Click-stream, Browsing activity, Graph
Related items