Font Size: a A A

Research On Algorithm Of High Dimensional Outlier Detection

Posted on:2008-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhouFull Text:PDF
GTID:2178360242488984Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier detection is an important aspect of Data Mining, which has get more depth research because of its unique knowledge discovery functions. Today, there are lots of efficient outlier detection algorithms which are widely used in financial fraud detection, network intrusion detection, ecosystems imbalance, Weather forecast and other risk control areas. But, with the constant expansion of the scope of application, traditional outlier detection algorithms have encountered some insurmountable obstacles. For instance: the algorithm's efficient is unable to meet large -scale data sets; to select the algorithm's parameters is difficult which lead to an unstable result; the algorithm can not meet the high-dimensional data characteristics; and so on. These papers mainly for the above problems do some research on outlier detection technology.First, we introduce the traditional outlier detection algorithms and analysis them by comprising. On base of the above analysis we proposed an algorithm based on average density for outlier detection (ADOD). ADOD is design to reduce the user's difficulties to select the parameters. In order to solve the difficulties from the high-dimensional data in outlier detecting, we propose a limited comparison algorithm for mining maximal frequent item sets (LCMFI) in advance. Then, we improve the FindFPOF by LCMFI, and propose an algorithm based on weight maximal frequent pattern for outlier detection (FindWMFPOF). The improved algorithm is time-efficient, and has a good detection effect.The main work covers:1. The existing outlier detection algorithms are introduced and analyzed. Their common deficiencies: the algorithms are the lack of automation for selecting the parameters.2. Propose ADOD and prove its effectiveness by experiments. ADOD can decrease the difficulties for user to select the parameters and has a good detection effect.3. The high-dimensional data characteristics and their effects to the traditional outlier detection algorithms are analyzed. The paper analysis and compare the existing high-dimensional outlier detection algorithms. We analysis and compare the existing high-dimensional outlier detection algorithms also. And point out the lack of efficiency of them.4. We propose LCMFI and introduce the relevant definitions, theorem, and provision. These provide a theoretical basis for improving the FindFPOF.5. We propose FindWMFPOF. FindWMFPOF instead maximal frequent pattern to frequent pattern which reduce the scale of the data sets effectively. Using LCMIF to mining maximal frequent pattern can make FindWMFPOF get better scalability for detecting the high-dimensional data's outliers.
Keywords/Search Tags:Average Density, High-dimensional Data, Maximal Frequent Itemsets, Outlier Detection, WMFPOF
PDF Full Text Request
Related items