Font Size: a A A

Research And Improvement On Web Usage Mining

Posted on:2010-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:J J HuangFull Text:PDF
GTID:2178360275959168Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Researchers apply data mining technologies to researchs on web technologies since data mining technologies are rapidly developed.It results in a new research area called web mining.Web usage mining,which retrieves hidden and interesting imformation by anlaysing web server log,is one of the important branches of web mining.It provides individuation and navigation of web systems to users.Moreover;it is the foundation of reconstructing web sites.Web usage mining is divided into four phases,data collection, data preprocessing,establish interesting model and pattern analysis,and two of them,data preprocessing and establish interesting model,are what this paper focuses on.Firstly,in web usage minig,one difficulty is data preprocessing,which is divided into data cleaning,session reconstruction,path supplement and transaction reconstruction.In the step of session reconstruction,this paper proposes session reconstruction based on DFA, and in the path supplement step,proposes a path supplement method based on multi-window.Besides,the paper proposes SRDFA,which reconstructs sessions aiming at dynamic framework web sites.In the transaction reconstruction step,this paper improves the maximal path forward method,which records the hyperlink that need be appended.Secondly,establish interesting model is an important phase.In this paper,we present an improved Apriori algorithm for this phase,called RSApriori.It obtains all frequent itemsets one by one through a series of iterations beginning from the largest frequent itemsets.Users need to set the parameter k before using this algorithm and the algorithm does not finish until the frequent k-itemsets are found.Besides,this paper designs two experiments to prove the feasibility of the entire framework.Finally,applies the whole algorithm to a real struts framework web site based on MVC,proves the efficiency and practicability of the algorithm in the practice.The improved algorithm of web usage mining in this paper has certain practical significance.First of all,it provides some reference for session reconstruction and transaction reconstruction as well as some data mining algorithms,which promotes further research of web usage mining to a certain extent;Secondly,it also promotes the research for web access analysis,structural analysis and web site optimization,etc.
Keywords/Search Tags:Web Usage Mining, Session Recognition, Path Supplement, Association Rules, Apriori Algorithm
PDF Full Text Request
Related items