Font Size: a A A

The Research Of Knowledge Discovery Based On Web Usage Mining

Posted on:2006-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:M ChenFull Text:PDF
GTID:2168360152990386Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web mining is the hot research issue which combines various technologies and methods between data mining and WWW. In general, Web mining includes three research domain: Web Content Mining, Web Structure Mining and Web Usage Mining. In these areas, web usage mining aims at the rule discovery of sites' visitors browsing behavior, the improvement of sites' structure and the linkage structure among pages, the enhancement on the quality of web services and the decision support on client relationship management of the e-commerce. On the basis of the introduction of the development survey of web usage mining, the thesis discusses the procedure of web usage mining and some technologies relevant to each phrase in web usage mining. The main work and novel ideas of the thesis are showed as following:· The detailed description of the definition, classification, characteristic and challenge of web mining;· The detailed description of the definition, data source, application, the main research areas and related technologies of web usage mining. The detailed description of procedure of web usage mining that is based on web transaction. The presentation of a classical method called MF algorithm identifies transactions on the phrase of data preparation;· In Chapter Four, the thesis puts forward three improved algorithms by analyzing Apriori algorithm on Web usage mining, which is for the use of discovering users' frequent access patterns. First, putting forward a improved algorithm called RD_Apriori, based on the Apriori algorithm; Then, putting forward a Close algorithm which improves the Close method to mining frequent itemsets on data mining; Finally, putting forward RD_Close algorithm which is based on RC_Apriori algorithm and Close algorithm. By theoretic analysis and experimental tests, these algorithms can be used to discover access patterns of all types of users and frequentaccess patterns according to the support threshold value decided by experts;The design and development of web usage mining prototype system. This prototype system consists of four function modules: Data Cleaning Module, Session Construction Module, Transaction Identifying Module, and Access Patterns Mining Module. These modules fulfill the preprocessing of the original log data and four algorithms which include Apriori algorithm, RD_Apriori algorithm, Close algorithm and RDClose algorithm. Finally the performance of these algorithms is analyzed and validated by real data.
Keywords/Search Tags:Data Mining, Web Mining, Web Usage Mining, Frequent Access Patterns
PDF Full Text Request
Related items