Font Size: a A A

Study On Density-Based Outlier Mining Algorithm

Posted on:2008-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:G X CuiFull Text:PDF
GTID:2178360215490240Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining technique is a procedure in which potential knowledge is discovered among a large quantity of data. Its main purpose is to extract the potential useful information and knowledge which is hidden and prior ignorant from a large numbers of uncompleted and noisy applications. Outlier data is the data which is an obviously departure from other data, it does not satisfy the common patterns or actions and disagrees with other exiting data. At present, outlier data mining is used for many areas, such as telecommunication, finance, weather forecast, stock market, intrusion detection and so on. Outlier data mining includes two parts which are outlier data detection and outlier data analysis. Outlier data analysis is related with background knowledge. The most pivotal question, which is outlier discovery, is discussed in this dissertation.In this dissertation, the advantages of different outlier data mining algorithms are studied, an improved outlier data mining algorithm based on density is raised, and it is used for network intrusion detection. Specifically speaking, the research here mainly includes the following aspects:①The process and status quo of outlier data mining, the meaning of outlier data mining, and the relationship between outlier data mining and data warehouse is researched. Through the analyzing of general process of knowledge discovery, a typical overall framework of outlier data mining system is provided, the functions of each module are analyzed and detail description of data mining techniques which is applied here are given.②The existing outlier data mining algorithms are studied comprehensively, the advantages and disadvantages and the scope of applications of outlier data mining algorithms which are in common use are analyzed.③On the base of existing outlier data mining algorithms based on density, DBSCAN algorithms and CURE algorithms, a new improved outlier data mining algorithm based on density is raised, and the different of improved algorithms and original algorithms are validated experimentally.④The improved algorithm is used for network intrusion detection, and the detection rates and false alarm rates of the improved algorithms are assessed.⑤The development directions of outlier data mining techniques in future are discussed. The algorithm is simple and effective in theory and experiment. The performance is evaluated through experiment. The first dataset was used in the algorithm of Ordering Points To Identify the Clustering Structure which was put forward by J?rg Sander and the second dataset was used in KDD CUP1999. The experimental results indicate that the algorithm is very successful and effective.
Keywords/Search Tags:Data Mining, Outliers, Density, Partition
PDF Full Text Request
Related items