Font Size: a A A

Research On Efficient Outlier Detection Algorithms In DRDB

Posted on:2014-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:X L YinFull Text:PDF
GTID:2268330401462275Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The data mining community has noted that knowledge discovery tasks can beclassified into4general categories:(1) dependency detection,(2) class identification,(3) class description, and (4) outlier/exception detection. The first3categories oftasks correspond to patterns that apply to many objects, or to a large percentage ofobjects, in the dataset. Recent work in data mining: association rules, classification,data clustering, and concept generalization belong to these3categories. The4thcategory focuses on a very small minority of data objects which are often discardedas noise. For many applications, such as credit card fraud, the monitoring of criminalactivities in electronic commerce, and the analysis of performance statistics ofprofessional athletes, identifying outliers can often lead to the discovery of trulyunexpected knowledge.With the being completed technology of traditional database, the developmentand popular application of computer network, the application of database is mostlyconstructed in the computer network, database characterized distribution is more andmore popular.The paper applies outlier detection into distributed relation database, proposesan improved cell-based outlier detection approach and a division approach based ondistributed relation database system.First, the paper introduces the concept of outlier and distributed relationdatabase system, briefly introduces the cell-based outlier detection algorithm andother existed outlier detection algorithm, generalized the basic features of distributedrelation database system.Second, the paper introduces an improved cell-based outlier detection algorithmICODA. It includes mapping of data, filtering of data, and the last processing. In themapping stage, each data point is mapped into a cell, the ICODA only save cellswhich are not empty. The filtering stage is to filter the non-outlier based on cellshave been constructed in the first stage. The last stage is using nest-loop algorithm tosearch outliers. Besides, the paper proposes a parallel processing algorithm based on division theory. The algorithm first maps each data point into a cell which has manypoints, then extend each cell at every dimension at radius d in both directions. Nextdivide every cell into each segment and run the algorithm on each segment.At last, these algorithms are tested in distributed relation database system.Experiments show that IACOD and DRDBDA are both more efficient than otheralgorithms and reduce time consumption.
Keywords/Search Tags:outlier, distributed relational database, cell, partition
PDF Full Text Request
Related items