Font Size: a A A

Mining Web Access Logs

Posted on:2004-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X F DuanFull Text:PDF
GTID:2168360122470538Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the explosive growth of data available on the Internet, availability of this information becomes a necessity. Application of data mining techniques to the World Wide Web, referred to as Web mining, has been the focus of several recent research projects and papers. In this paper, we discussed Web mining and present a taxonomy of the web mining. For web information, there are three types of available:usage, content, and structure, so we have three types of web mining: Web usage mining, Web content mining, Web structure mining. and we present various research issues, techniques, and development efforts of each web mining. But, we mostly discussed Web usage mining.Web usage mining is the application of data mining techniques to discover usage patterns from Web usage data. Web usage mining is different from Web content mining and Web structure mining which use content and structure data, however Web usage mining gets information from usage data. Usage data include log records in the client, server logs records, proxy server log records, and user registration or survey data gathered via CGI scripts.Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis.This paper described each of these phases in detail. In this paper, we also described the various kinds of data mining ways that can be useful for Web mining, such as Statistical Analysis, Association Rules, Clustering, Classification and Sequential Patterns ,and we mostly discussed Association rules. Association rules is widely used in data mining to find patterns in data.The typical example is shopping basket analyse. Applying Association Rules to the Web Usage Mining ,the key is how to form shopping basket. When user click web pages, whatever happen will be recorded by server in the log files.By analyzing log files, topology and contents of web site, and going through data cleaning, user identification,session identification, transaction identification four pre-processing phases, we can get a transaction gather, so we formed shopping basket.Apriori algorithm is a well-known association rules algorithm, We describe an implementation of the apriori algorithm and apply it to the typical web mining.Finally, on the base of former work, the author used apriori algorithm to find the usage pattern of Chongqing Television Station Web site ,and demonstrated some interesting patterns.
Keywords/Search Tags:Data Mining, Web Mining, Web Usage Mining, Apriori
PDF Full Text Request
Related items