Research On Algorithm Of High Dimensional Outlier Detection

Posted on:2008-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Zhou

Full Text:PDF

GTID:2178360242488984

Subject:Computer application technology

Abstract/Summary:

Outlier detection is an important aspect of Data Mining, which has get more depth research because of its unique knowledge discovery functions. Today, there are lots of efficient outlier detection algorithms which are widely used in financial fraud detection, network intrusion detection, ecosystems imbalance, Weather forecast and other risk control areas. But, with the constant expansion of the scope of application, traditional outlier detection algorithms have encountered some insurmountable obstacles. For instance: the algorithm's efficient is unable to meet large -scale data sets; to select the algorithm's parameters is difficult which lead to an unstable result; the algorithm can not meet the high-dimensional data characteristics; and so on. These papers mainly for the above problems do some research on outlier detection technology.First, we introduce the traditional outlier detection algorithms and analysis them by comprising. On base of the above analysis we proposed an algorithm based on average density for outlier detection (ADOD). ADOD is design to reduce the user's difficulties to select the parameters. In order to solve the difficulties from the high-dimensional data in outlier detecting, we propose a limited comparison algorithm for mining maximal frequent item sets (LCMFI) in advance. Then, we improve the FindFPOF by LCMFI, and propose an algorithm based on weight maximal frequent pattern for outlier detection (FindWMFPOF). The improved algorithm is time-efficient, and has a good detection effect.The main work covers:1. The existing outlier detection algorithms are introduced and analyzed. Their common deficiencies: the algorithms are the lack of automation for selecting the parameters.2. Propose ADOD and prove its effectiveness by experiments. ADOD can decrease the difficulties for user to select the parameters and has a good detection effect.3. The high-dimensional data characteristics and their effects to the traditional outlier detection algorithms are analyzed. The paper analysis and compare the existing high-dimensional outlier detection algorithms. We analysis and compare the existing high-dimensional outlier detection algorithms also. And point out the lack of efficiency of them.4. We propose LCMFI and introduce the relevant definitions, theorem, and provision. These provide a theoretical basis for improving the FindFPOF.5. We propose FindWMFPOF. FindWMFPOF instead maximal frequent pattern to frequent pattern which reduce the scale of the data sets effectively. Using LCMIF to mining maximal frequent pattern can make FindWMFPOF get better scalability for detecting the high-dimensional data's outliers.

Keywords/Search Tags:

Average Density, High-dimensional Data, Maximal Frequent Itemsets, Outlier Detection, WMFPOF

Related items

1	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
2	Research On Algorithms For Mining Maximal Frequent Itemsets
3	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data
4	FP-Tree Based Mining Frequent Itemsets Over Data Streams
5	Research On Mining Algorithms Of Maximal Frequent Itemsets And Opened Frequent Itemsets
6	Research On Algorithm For Mining Maximal Frequent Itemsets Over Data Streams
7	Research On Mining Algorithm Of Maximal Frequent Itemsets
8	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application
9	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
10	An Algorithm Of Maximal Frequent Itemsets Mining Based On Dynamic Reordering