Outliers Detection And Its Application Research

Posted on:2014-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:F P Yang

Full Text:PDF

GTID:2248330398958032

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer science and the increasing popularity of thenetwork, data collection and storage technology rapid development, large amount ofcomplicated data not only led to the "data grave", but also brought "dimension disaster". How toget valuable data or information from data tomb, eliminate noise or find potential andmeaningful knowledge. Data mining from the group of point detection arises at the historicmoment. In the face of complex high dimensional mass data, from the group of point detectioncan solve data noise problem, can effective mining potential, valuable information, has profoundpractical significance and broad application prospects.At present, according to the group of outliers detection problems from data mining, thedomestic and foreign research scholars have proposed a lot of algorithm, for the subsequentresearch has laid the important foundation. But the existing research still has a lack of shortage:(1) with the development of the computer and the large database, the traditional clusteringmethods can’t effective operate in the high dimensional data space and find the local anomalies.(2) as the global threshold value is limit, the existing based on distance outlier detectionproblem can only mining global outliers, and it is difficult to effective mining local outliers.(3)with the measurement and acquisition equipment technical improvement and quantity expansion,data source and dimension has increased dramatically, some data dimension has reachedhundreds or thousands of position, it’s difficult to effectively solve the outliers detectionproblem of complicated high dimensional mass data.This paper’s research background is based on the high dimensional complex mass data ofnetwork, based on the above-mentioned exiting problems, the main work and innovation pointsare as follows:1. Proposed a Two stage outliers detection algorithm Based on clustering division formining local outliers. Compared with the traditional detection algorithms, improves theefficiency of the algorithm, for local outliers mining provides a new way of thinking.We propose two-stage outlier detection algorithm based on clustering. First, Iterating to getthe value k that the k-means required based on agglomerative hierarchical clustering.Then,dividing the data set into a number of micro-clustering by the k-means method. To improve the efficiency of mining, Propose the clustering filter mechanism based on information entropy todetermine whether the micro-clustering contain outliers. Finally, using based on distanceapproach to detect local outliers from the micro-clustering with outliers. Experimental resultsshow that the proposed algorithm has high efficiency, high precision and low time complexity.2. It puts forward a outliers detection algorithm based on the attribute reduction–ARODfor the high dimensional data outliers detection, effectively solves the mass, high dimensionaldata outliers mining problem.This paper puts forward the AROD outliers detection algorithm and proposes the entropy todata attributes division and attribute reduction, keep the important attribute which can reflectdata information, makes use of important attribute and the attribute weights (contributiondegree),combines with weighted distance formula for the outliers detection.The experimentalresults show that the improved algorithm to multi-dimensional data detection has higheraccuracy or efficient than existing algorithms.

Keywords/Search Tags:

outlier detection, the information entropy, micro-clustering, filtering mechanism, attribute reduction

PDF Full Text Request

Related items

1	Research On Outlier Detection Based On Density Difference
2	Based On Information Entropy And The Subspace Outlier Mining Algorithm
3	Improvement And Research Of Attribute Reduction Algorithm Based On Information Entropy
4	Research And Application Of Outlier Detection Algorithm
5	Research And Implementation On Outlier Detection Method Based On SOFM Clustering Algorithm
6	Study Of Outlier Detecting Algorithm Based On Natural Nearest Neighbor And Weighted Attribute Entropy
7	The Research On Several Methods Of Attribute Reduction In Information Systems
8	Research On Attribute Reduction Methods Based On Information Entropy
9	Study On Attribute Reduction Criteria And Information Loss Of Attribute Reduction Based On Rough Sets
10	Study On Algorithms For Fast Outlier Detection