Outlier Detection For High Dimensional Data Set Base-on Projection

Posted on:2008-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Dai

Full Text:PDF

GTID:2178360215490900

Subject:Computer system architecture

Abstract/Summary:

Data Mining refers to a procedure where some implicit, undiscovered and useful knowledge is extracted from large amounts of data. Outlier detection is one of the important branches in Data Ming, it can discover the small schemas, maybe some interesting information is hidden in them. It is worth to researching in much applications, such as fraud detection in credit card, disaster alarm in weather forecast and intrusion detection in network access and so on.In fact, we confront most of high dimensional data, for example, exchanging data in business, indexing data in document and so on. In a word, it is an important research for high dimensional data in Data Mining. But high dimensional data has some special characters. For example, with the increment of dimensions, the efficiency of high dimensional index becomes worse more and more, on the other hand, for the curse of the sparsity in high dimension, the similar measure in the data dose not exist by aid of the parameter of Lp -distance. All of the characters bring the difficulty to Data Ming in high dimensional data.Many conventional clustering algorithms can detect the outlier, but the outlier is found as the side-product. In recent years, a few special outlier algorithms arise, but most of the algorithms focus on the low dimensional data. Some data set have the character of high dimension in the essence, for which the algorithms have many defect, and the interpretation for the outlier obviously is late.In the thesis, focused on the shortcoming of the conventional algorithms, we deeply research the outlier detection techniques, and indicate the defect of the application in high dimensional data, at last, we present a new outlier detection algorithm based-on the conception of projection and frequent items. The algorithm can well deal with the sparsity in high dimensional data, can expand the dimension from numeric to mixture, can give reasonable interpretation to the outlier, which benefit to distinguish the outlier form the noise. Shown as the experiment, the algorithm is feasible.In the thesis, we present a new approach in outlier detection for high dimensional data, roughly explore the problem of interpretation of the outlier, all of which are meaningful in the outlier detection research and have major advantage in the application.

Keywords/Search Tags:

Data Mining, Outlier Detection, Projection, Frequent Item

Related items

1	Research On Mining Algorithms Of Maximal Frequent Item Sets
2	Application Of Frequent-pattern-based Outlier Mining In Intrusion Detection
3	Study On Mining Algorithm Of Target Frequent Itemsets And Appliction
4	Search Of Algorithms For Mining Maximum Frequent Item-sets
5	Research On Frequent Item Mining And Correlation Analysis In Data Streams
6	Improvement Of Frequent 1-Item Set Generation Method And Experimental Study
7	Mining Of Maximal Frequent Item Sets Based On AFOPT
8	Research On And Implementation Of Frequent Item Set Mining System In Data Stream
9	Frequent Itemsets Mining Algorithm And Its Application In Data Flow
10	Research On The Algorithm For Mining Frequent Items From Data Streams