Research Of Outlier Testing Methods In High-Dimensional Dataspace

Posted on:2006-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:W Li

Full Text:PDF

GTID:2168360155952982

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of computer and network, data is playing a more and more important role in many areas. Especially data analysis and data test had become a research subject in technology of informationâ€”data mining. Most of the researches in data mining are concentrated in mining the normal pattern, and the researches bout the abnormal pattern are relative fewer. But the application of the outlier test in cheating test of e-business, intrusion detection in network and data cleaning etc. can't be ignored. There are several methods of outlier test in recent year, for example, distance-based, density-based, depth-based, deviation-based, statistic-based and pattern or expert system-based methods etc. However the disadvantage of the above methods is that they are not suitable in high dimensional-dataspace and dataset with great capacity. So for reasons of this I put forward two methods and they can detect the outliers in the dataset with high dimension and great capacity efficiently. The first method is the Weighted Hypergraph-based Outlier Test (WHOT). It is based upon weighted association rule mining using weighted support and significance Framework, and multilevel hypergraph partitioning algorithms. The main idea of the algorithm is following. As the traditional association rule mining model assumes that items have the same significance without taking account of their weight/attributes within a transaction or within the whole item space, but it is not always the case. I design the association rule mining algorithmâ€”WApriori with the definition of weighted support. The efficiency of the algorithm in time and space is satisfying. Constructing hypergraph G=(V,E) with significant association rule in the dataset that mined by WApriori algorithm, therinto V is the vertice set corresponddding to the record set in dataset and E is hyperedges set corresponding to the significant itemset. I define the meaning of Window in the hypergraph and the way of calculate the weight of hyperedge. Then we get the cluster set C by using multiplevel hypergraph partitioning algorithm hMETIS on the weighted hypergraph. After that I define the measures whether a vertex in hypergraph or a record in dataset is an outlier, and they are the weighted support of a vertex to a windowâ€”WS, the weighted belongingness of a vertex to a clusteâ€”WB and the deviation of size of a vertex to a windowâ€”WD. The thresholds of the above measures given by user are WSt, WBt and WDt, and if the measures of a vertex in a window have the relation that WSWBt and WD

Keywords/Search Tags:

Data mining, Clustering, Outlier, High dimension, Hypergraph-based Partitioning, Pattern-based Clustering

PDF Full Text Request

Related items

1	Research Of Outlier Mining Algorithms Based On Space Partitioning In High-dimension
2	Research On Outlier Data Mining In High Dimensional Space
3	Study On Space Partitioning-based Optimized Clustering Algorithms And Related Techniques
4	Study On Outlier Mining Algorithms Based On Clustering
5	The Researches On Related To Key Technologies Among Clustering Based On High-dimensional Data Space
6	Study Of Data Mining Algorithms Based On Rough Set And Clustering And Application In Anti-Money Laundering
7	Research On Intrusion Detection Based On Clustering And Outlier Detection
8	Research And Implementation Of Clustering Algorithm For Multidimensional Data Sets
9	A Study Of The Pattern-Based Clustering Theories
10	Research On Subspace Clustering Algorithm Based On Sparse, Adaptive And Hypergraph