Font Size: a A A

Outliers Detection And Its Application Research

Posted on:2014-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:F P YangFull Text:PDF
GTID:2248330398958032Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science and the increasing popularity of thenetwork, data collection and storage technology rapid development, large amount ofcomplicated data not only led to the "data grave", but also brought "dimension disaster". How toget valuable data or information from data tomb, eliminate noise or find potential andmeaningful knowledge. Data mining from the group of point detection arises at the historicmoment. In the face of complex high dimensional mass data, from the group of point detectioncan solve data noise problem, can effective mining potential, valuable information, has profoundpractical significance and broad application prospects.At present, according to the group of outliers detection problems from data mining, thedomestic and foreign research scholars have proposed a lot of algorithm, for the subsequentresearch has laid the important foundation. But the existing research still has a lack of shortage:(1) with the development of the computer and the large database, the traditional clusteringmethods can’t effective operate in the high dimensional data space and find the local anomalies.(2) as the global threshold value is limit, the existing based on distance outlier detectionproblem can only mining global outliers, and it is difficult to effective mining local outliers.(3)with the measurement and acquisition equipment technical improvement and quantity expansion,data source and dimension has increased dramatically, some data dimension has reachedhundreds or thousands of position, it’s difficult to effectively solve the outliers detectionproblem of complicated high dimensional mass data.This paper’s research background is based on the high dimensional complex mass data ofnetwork, based on the above-mentioned exiting problems, the main work and innovation pointsare as follows:1. Proposed a Two stage outliers detection algorithm Based on clustering division formining local outliers. Compared with the traditional detection algorithms, improves theefficiency of the algorithm, for local outliers mining provides a new way of thinking.We propose two-stage outlier detection algorithm based on clustering. First, Iterating to getthe value k that the k-means required based on agglomerative hierarchical clustering.Then,dividing the data set into a number of micro-clustering by the k-means method. To improve the efficiency of mining, Propose the clustering filter mechanism based on information entropy todetermine whether the micro-clustering contain outliers. Finally, using based on distanceapproach to detect local outliers from the micro-clustering with outliers. Experimental resultsshow that the proposed algorithm has high efficiency, high precision and low time complexity.2. It puts forward a outliers detection algorithm based on the attribute reduction–ARODfor the high dimensional data outliers detection, effectively solves the mass, high dimensionaldata outliers mining problem.This paper puts forward the AROD outliers detection algorithm and proposes the entropy todata attributes division and attribute reduction, keep the important attribute which can reflectdata information, makes use of important attribute and the attribute weights (contributiondegree),combines with weighted distance formula for the outliers detection.The experimentalresults show that the improved algorithm to multi-dimensional data detection has higheraccuracy or efficient than existing algorithms.
Keywords/Search Tags:outlier detection, the information entropy, micro-clustering, filtering mechanism, attribute reduction
PDF Full Text Request
Related items