Font Size: a A A

Applied Research On XML And Association Rule In Web Log Mining

Posted on:2012-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WuFull Text:PDF
GTID:2178330332485985Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web mining is an emerging research direction of data mining field.And Web log mining is one important research in web mining.Web log mining apply data mining technology on web server log files to get web users access patterns.According to web log mining,Web designers can improve the site structure and performance in order to enhancing the site's service quality.Firstly, this paper introduces the basic concepts of data mining and web data mining. For Web log mining, this paper mainly studies Web log data preprocessing techniques, applications of association rules algorithm in the field of Web log mining and a user similarity calculation method based on multiple evaluation factors.The main research work in this paper includes the followings:1. The entire process of Web log data preprocessing was studied carefully which includes data cleaning, site topology recognition, page filtering, user identification, session identification, path completion and transaction identification. Path completioned algorithm is proposed based on the site topology for the experimental data lacking of for the reference property field. Uasge of XML storage pretreatment result and its detailed structure is proposed for the semi-structured of log file characteristics.2. A improved FP-growth algorithm for mining users who frequently access sequence mode is proposed.Firstly, this algorithm build FS-tree. And then run in its mining algorithms in FS-tree to get the frequently visited sequence of all the users.The Experiments compared with other existing mining algorithms show that this improved algorithm is effective.3. One user similarity calculation method based on multiple evaluation factors is proposed for Web users fuzzy clustering.Considering the number of pages, the page sequence and access time and other factors,this method calculates the proportion of each factor.The experimental results showed that this clustering algorithm used user similarity calculation method has better clustering results.
Keywords/Search Tags:Log pre-processing, XML, Association Rule, Frequent Path, User Similarity, Fuzzy Clustering
PDF Full Text Request
Related items