Font Size: a A A

Study On Some Key Issues Of Web Usage Mining

Posted on:2005-03-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:B J RuanFull Text:PDF
GTID:1118360125967521Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web Usage Mining (WUM) is the process of applying data mining techniques to the discovery of usage patterns from Web data. As a kind of human-computer interface that can be used anywhere and anytime, the Web offers lots of opportunities for developing techniques to record, collect, analyze and extract the usage information on a large-scale level. In this context, WUM; attracts enormous interest from both the academic and industrial communities. The WUM techniques have broad applications in science study, softeware design and business intelligence.In this thesis, an up-to-date survey of the WUM research is given and the results of the study on some key issues of WUM are presented. The investigated issues are related with frequent sequential-pattern mining, methods for measuring the differences between two user behaviors, and the data mining techniques for optimizing web-site design.The main contributions are as follows.(1) A frequent sequential-pattern algorithm, called TD-WAP-Mine, is proposed. It differs from the previous algorithms in applying a new frequent-pattern searching strategy, which greatly reduces the workload of building intermediate data. The experimental results for various datasets show this algorithm performs better than the previous ones, especially when the datasets contain prolific frequent patterns.(2) A new method is proposed for measuring the difference between two user behaviors based on the semantic information contained in the Web structure data. The presentation structure of the relationship between feature items is formalized as a special kind of directed acyclic graph. Based on a core concept, called maximum similarity width, serveral distance functions are defined for quantifying the difference between two user behaviors in terms of the set of feature items. When the presentation structure of relationship is a rooted directed tree, these distance functions satisfy the property of triangle inequality. The property of triangle inequality is very useful for making searching more efficient, but the investigation on this property is lacking in the previous research. The results of preliminary experiments show these new functions can perform similarity with the previous ones in computing speed and nearest neighbor searching.(3) A n ew d ata m ining m ethod i s p roposed for optimizing Web s ite d esign. Itmeasures the searching cost of Web pages by computing the average searching time based on the details of the information foraging paths. Moreover, a kind of efficient data mining method is proposed to discovery a group of hyperlinks that are useful for reducing the searching time. The experimental results show the mining results can provide useful information for identifying problems in Web site design.
Keywords/Search Tags:Web Mining, Web Usage Mining, Frequent Pattern Mining, Semantic Distance
PDF Full Text Request
Related items