Font Size: a A A

Web Log Mining Based On Rough Set

Posted on:2008-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:B T WangFull Text:PDF
GTID:2178360242475567Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Rough Set theory is a new kind of mathematical tool to deal with vagueness and uncertainty problem. The main advantage of Rough Set theory is that it can efficiently analyze and deal with all kinds of the incompleted information ,fnd the underlying knowledge and rules without using any preliminary or additional information. Rough Set theory has been applied successfully to many areas including decision support, pattern recognition, process control ,machine learning and many other domains. It has received much attention of the science researchers all over the world. The key problem of the Rough Set theory bases at data discretization and attribute reduction.But it has been proved that solving all reducts or the minimum reduct is NP-hard problem, so searching a fast reduct algorithm is a major research issue of Rough Set theory.So my paper pay much attention to the research on Rough Set theory for Date Mining, escepially focus on the aspects of data discretization and attribute reduction. About the attributes' discretization, we mainly discussed the EqualFrequencyScaler algorithm, EquanFrequency binning algorithm, Naive Scaler alogorithm, Semi Na?ve Scaler algorithm.We make comparison of the three discretization algorithms through UCI dataset. We find that If we use different discretization algorithm,itt will lead to greatly difference in the attributes reduction.The core of the Rough Set theory is reduction. In the paper, we mainly discussed the reduction algorithms including: algorithm based on discernibility matrix and logic operation, the improved heuristic reduction method and genetic algorithm.All these algorithms have different characteristic.Data Mining is a new information technology which developed with thetechnology of Database and Artificial Intelligence . Data Mining integrates Database , Artificial Intelligence ,Machine Learning ,Statistics and other subjects Web Mining is the traditional Data Mining technology used under web circumstance. Web Mining has broad prospects using Data Mining technology to analyze large scale web data to reveal the hiding patterns ,Web Mining can be divided into three aspects: Usage Mining(Log Mining) ,Content Mining , and Structure Mining. The paper focus on the Usage Mining. Usage Mining can find the model of Web visiting, apprehend the browse behavior of the users through mining Web Log. So as to improve the structure of the Web , provided personal service for the users and further analyze the rules in the web log. Usage Mining aid to improve performance of the server. The data of Web Mining contain visiting log, reference log , proxy server log and error log.Web usage mining consists of three stage, namely preprocessing, pattern discovery, and pattern analysis. The resolvent contain Statistical Analysis, Association Rules, Clustering,Classification and Sequential Patterns .The paper pay attention to data preprocessinData preprocessing is a major component in the process of Web Mining. It process the raw web log which is incompleted,redundant and inaccurate. When browers visit website , operation will be recorded as a piece log file in server. Data preprocessing become complicated because of Cookis,Firewall,and proxy server. So Data preprocessing contain data cleaing ,user identification,session identification,transaction identification.We introduce the entire process applying the Rough Set for the Web Usage Mining in detail at the last chapter.
Keywords/Search Tags:Rough Set, Attribute reduction, Data discretization, Web Usage Mining
PDF Full Text Request
Related items