Research On Efficient Outlier Detection Algorithms In DRDB

Posted on:2014-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:X L Yin

Full Text:PDF

GTID:2268330401462275

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The data mining community has noted that knowledge discovery tasks can beclassified into4general categories:(1) dependency detection,(2) class identification,(3) class description, and (4) outlier/exception detection. The first3categories oftasks correspond to patterns that apply to many objects, or to a large percentage ofobjects, in the dataset. Recent work in data mining: association rules, classification,data clustering, and concept generalization belong to these3categories. The4thcategory focuses on a very small minority of data objects which are often discardedas noise. For many applications, such as credit card fraud, the monitoring of criminalactivities in electronic commerce, and the analysis of performance statistics ofprofessional athletes, identifying outliers can often lead to the discovery of trulyunexpected knowledge.With the being completed technology of traditional database, the developmentand popular application of computer network, the application of database is mostlyconstructed in the computer network, database characterized distribution is more andmore popular.The paper applies outlier detection into distributed relation database, proposesan improved cell-based outlier detection approach and a division approach based ondistributed relation database system.First, the paper introduces the concept of outlier and distributed relationdatabase system, briefly introduces the cell-based outlier detection algorithm andother existed outlier detection algorithm, generalized the basic features of distributedrelation database system.Second, the paper introduces an improved cell-based outlier detection algorithmICODA. It includes mapping of data, filtering of data, and the last processing. In themapping stage, each data point is mapped into a cell, the ICODA only save cellswhich are not empty. The filtering stage is to filter the non-outlier based on cellshave been constructed in the first stage. The last stage is using nest-loop algorithm tosearch outliers. Besides, the paper proposes a parallel processing algorithm based on division theory. The algorithm first maps each data point into a cell which has manypoints, then extend each cell at every dimension at radius d in both directions. Nextdivide every cell into each segment and run the algorithm on each segment.At last, these algorithms are tested in distributed relation database system.Experiments show that IACOD and DRDBDA are both more efficient than otheralgorithms and reduce time consumption.

Keywords/Search Tags:

outlier, distributed relational database, cell, partition

PDF Full Text Request

Related items

1	Auto-sharding Technique And Algorithm For Distributed Relation Database Based On SQL History
2	Research On Data Partition Optimization Method Of Shared-Nothing Relational In-Memory Database
3	Research On Virtual Partition Strategies Of A Shared Storage Distributed Database
4	The Design And Implementation Of Distributed Relational Database Based On KVM Cloud Computing Platform
5	Design And Implementation Of Integrated Query Middleware About Relational Database And Non-Relational Database For Structure Safety Monitoring
6	Research On Migration Algorithm From Traditional Relational Database To Non Relational Database
7	Design And Realization Of Storage Subsystem Of Distributed Relational Database
8	Research On Privacy Protection Mechanism Of Non-relational Database For SaaS
9	Research On Application Of Relational And Non-Relational Database
10	Design And Implementation Of Critical Technologies In Distributed Graph Database