Font Size: a A A

Web usage mining: Discovery and application of interesting patterns from web data

Posted on:2001-12-19Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Cooley, Robert WalkerFull Text:PDF
GTID:2468390014456958Subject:Computer Science
Abstract/Summary:
Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage patterns. As Web sites continue to grow in size and complexity, the results of Web Usage Mining have become critical for a number of applications such as Web site design, business and marketing decision support, personalization, usability studies, and network traffic analysis. The two major challenges involved in Web Usage Mining are preprocessing the raw data to provide an accurate picture of how a site is being used, and filtering the results of the various data mining algorithms in order to present only the rules and patterns that are potentially interesting. This thesis develops and tests an architecture and algorithms for performing Web Usage Mining. An evidence combination framework referred to as the information filter is developed to compare and combine usage, content, and structure information about a Web site. The information filter automatically identifies the discovered patterns that have a high degree of subjective interestingness. The results of experiments with the information filter show that lists of thousands of discovered patterns can be reliably ranked according to interestingness, as defined by the deviation from a predefined set of expectations about the usage of a Web site. Also, the necessary steps for preprocessing clickstream data are identified, and algorithms for each of these steps are developed and tested against data from a large electronic commerce Web site. The results show that proper preprocessing, including the incorporation of knowledge about the structure of a Web site, is required to obtain meaningful usage patterns.
Keywords/Search Tags:Web usage mining, Patterns, Web site, Information, Clickstream data
Related items