Font Size: a A A

The Research And Application Of Web Log Mining System Based On Clustering Algorithm

Posted on:2008-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:B P TangFull Text:PDF
GTID:2178360245490630Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Thanks to it's excellent multimedia transfer and intercommunion function, Internet has grow up rapidly and becomes people's daily tools of information issue and amusement, and an important means of intercommunion. There are so much knowledge with enormous potential value in the information space of this distributing system, can be used extensively in economy, politics, education, scientific research, medical treatment, etc.Yet, there are still many problems to be work out as the WWW bring us huge information and great conveniences. Providing individualized web service and building intelligent Web site are among those key problems. In a popular web site, every day web log size increase reaches about ten mega bytes, which make it impossible to analyze those collected data by human.Traditional data mining techniques can be applied to web log analyzing and processing. It can be used to quickly find out user access pattern including frequent traversal path and user clustering from large amount of web site browsing activities. Not only can the patterns acquired from web log mining be used to improve the capability and safety of web site, but also can be used to optimize topological structure and hyperlinks relations of web pages, which is the basis of the marketing and transactions of e-business on the Web and can help provide individualized service and construct intelligent web site.In this thesis, our work mainly focuses on preprocessing and post-processing of web log mining. It began with the introduction to the traditional Data Mining, and then to the basic concepts, methods and procedures of web log mining. Then, importing the structure of the site using XML, we improved the accuracy and efficiency of preprocessing; in the mining period, we brought forward a new modeling method of web log mining - outlier analyze. Finally, we implemented a common web log mining system.The work mainly includes:1. Resolved the difficulty that one mining system mine various formatted logs.2. Improved the methods of solve difficulties in data preprocessing.3. Integrated two algorithms in mining system and put forward a new one -outlier analyze.4. Brought forward and confirmed a common web log mining system model.
Keywords/Search Tags:Data mining, Web log mining, Preprocessing, Outlier Analyze
PDF Full Text Request
Related items