Font Size: a A A

Outlier Detection For High Dimensional Data Set Base-on Projection

Posted on:2008-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:J Y DaiFull Text:PDF
GTID:2178360215490900Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data Mining refers to a procedure where some implicit, undiscovered and useful knowledge is extracted from large amounts of data. Outlier detection is one of the important branches in Data Ming, it can discover the small schemas, maybe some interesting information is hidden in them. It is worth to researching in much applications, such as fraud detection in credit card, disaster alarm in weather forecast and intrusion detection in network access and so on.In fact, we confront most of high dimensional data, for example, exchanging data in business, indexing data in document and so on. In a word, it is an important research for high dimensional data in Data Mining. But high dimensional data has some special characters. For example, with the increment of dimensions, the efficiency of high dimensional index becomes worse more and more, on the other hand, for the curse of the sparsity in high dimension, the similar measure in the data dose not exist by aid of the parameter of Lp -distance. All of the characters bring the difficulty to Data Ming in high dimensional data.Many conventional clustering algorithms can detect the outlier, but the outlier is found as the side-product. In recent years, a few special outlier algorithms arise, but most of the algorithms focus on the low dimensional data. Some data set have the character of high dimension in the essence, for which the algorithms have many defect, and the interpretation for the outlier obviously is late.In the thesis, focused on the shortcoming of the conventional algorithms, we deeply research the outlier detection techniques, and indicate the defect of the application in high dimensional data, at last, we present a new outlier detection algorithm based-on the conception of projection and frequent items. The algorithm can well deal with the sparsity in high dimensional data, can expand the dimension from numeric to mixture, can give reasonable interpretation to the outlier, which benefit to distinguish the outlier form the noise. Shown as the experiment, the algorithm is feasible.In the thesis, we present a new approach in outlier detection for high dimensional data, roughly explore the problem of interpretation of the outlier, all of which are meaningful in the outlier detection research and have major advantage in the application.
Keywords/Search Tags:Data Mining, Outlier Detection, Projection, Frequent Item
PDF Full Text Request
Related items