Font Size: a A A

Web Log Mining System Based On Association Rules And Sequential Patterns

Posted on:2010-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2178330338485453Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid development of internet, the World Wide Web is becoming a distributed global information resource containing a large amount of data relevant to all domains of human activity. How to get potential of knowledge from the vast information becomes quite necessary. Currently Web data mining technology which applies data mining technology to the internet has become a hot spot in many research domains. Web Log Mining is a branch of the Web data mining and as an important part of web mining, also has special theoretic and practical significance.This essay discusses the research background and current status of Web data mining and introduces briefly the definition, process and the most commonly used technology of data mining and the basic knowledge of the definition and classification of Web data mining. It elaborates the pretreatment process and its main technology and thoughts of Web log mining. focusing mainly on the implementation approach of association rules and the sequential pattern algorithm used in the Web log mining, deepens the understanding of the data mining theory and provides the technology support for building the Web log mining system.The main findings of this paper are the following 3 points:(1)It uses maximal forward reference(MFR) model on transaction identification in the pretreatment of Web log mining to divide users visiting records into Web page browsing sequences, eliminating the impact of users clicking the"Back"button when changing visiting subject, thus be able to mine users browsing pattern better. Based on such method it further uses PreFixSpan algorithm of sequence model to mine paths that users visit frequently to make mining results more precise and effective.(2)A web log mining system has been built based on above theory research and whole system design, which can build relate database of each section at each level beginning with the home page of a website and can get the topology of the whole site. This system also uses tentative method and testing program through human-computer interaction to connect thousands different Web pages users visited (i.e. different accessing addresses) with different sections(usually less than one hundred) of the web site to establish a solid base of Web log mining which makes the results more meaningful. This is also the application innovation of this paper.(3)It applies the Web log mining system to a specific website and gives improvement suggestions on the organizational structure and the link methods of the site according to the association rules and frequent access paths mined to make the theory and research of the Web log preprocessing and data mining algorithm more practical and realistic. At the mean time the effectiveness of the system is also verified.
Keywords/Search Tags:Web Log Mining, Web Log Pretreatment, Association Rules, Sequential Pattern
PDF Full Text Request
Related items