Font Size: a A A

Research On Web Log Mining Based On Rough Set

Posted on:2009-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:M G LiuFull Text:PDF
GTID:2178360275478725Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The continuous development of the Internet narrow the distance between people.But the vast amounts of information on the Web which gradually make people become confused meanwhile the information mentioned are constantly increasing and changing.Therefore applying data mining technology to obtain the user's accessing information for a website's existing is very necessary.Web data mining mainly focuses on the research of the text,hypertext documents,Web linking structure as well as Web log files.Web server log files are the important sources of data for the whole application of Web data mining.The log files have a very clear record of website visitors' browsing behavior and reflecting the various types of users' browsing habits.The paper researches and summs up the Web log mining status quo at home and abroad and presents Web log mining technologies via using rough set theory to generate the rules.Rough set theory is considered as a mathematical tool which applied to find incomplete and uncertain system of knowledge.On the basic of the rough set,the paper goes deep into research the continuous properties of discrete algorithms and several attribute reduction algorithm-based rough set theory.On the case of the discrete attributes,the paper mainly discusses Naive Scaler algorithm,Semi Naive Scaler algorithm and the combination of logical operations and rough set theory's discrete algorithm,then compares several discrete algorithms to find that the various data sets need to match the various decrete selecting algorithms.However,selecting different algorithms may be lead to significant differences in the reduction results.The attribute reduction algorithm research-based rough set theory is the key point of the paper.The paper mainly discusses the reduction algorithms including: attribute reduction algorithm-based the difference matrix and logical operation, retrospective logic difference matrix reduction algorithm,improved heuristic attribute reduction algorithm and the paper discusses the advantages and disadvantages of the various algorithms.The algorithm-based retrospective logic difference matrix reduction algorithm and improved heuristic attribute reduction algorithm is improved by myself.The paper proposes that reduction heuristic algorithm should be applied to the Web log processing.The first step is to propose Web log mining Rough Set model,then to obtain the attribute values via the discrete attributes of discrete processing module,The final step is to cope with the attribute reduction in order to generate the decision-making rules.Discrete processing and access rules are given with a detailed flow chart in the paper.
Keywords/Search Tags:Data Mining, Web Log, Rough Set, Discretization, Attribution Reduction
PDF Full Text Request
Related items