Font Size: a A A

Research And Application Of Local Outlier Detecting Algorithm Based On Density

Posted on:2017-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhouFull Text:PDF
GTID:2308330488985678Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of computer technology and database technology, data mining technology has been widely used in various fields of people’s lives.Data mining technology extract novel, valuable and potential knowledge from massive, noisy database.Outlier detection is one of the most popular and important branches of data mining. It is mainly devoted to find out the significant deviation what does not to meet the general object’s behavior characteristics At present, outlier detection is mainly used in the field of network attack behavior detection, credit card fraud detection, extreme weather forecast, telecommunications charges fraud analysis, etcAs the application of outlier detection receives more and more attention, the existing outlier detection algorithms encountered great challenges. Mainly exists the following problems:(1) the data size is becoming larger and larger and the dimension of data is becoming higher and higher, even to hundreds or thousands of dimension.while the existing algorithms of outliers detection mainly aimed at the low dimensional and small data sets,it is difficult to effectively mining high-dimensional or vast amounts of data sets.And the detected outlier is global, not local. (2)The outliers is simplely regarded as a kind of "either or" binary attributes, the existing algorithms is failed to assess the degree of outliers.In view of the limitations of the traditional LOF、ELSC algorithms mainly limited in difficult to adapt to the high dimensional and large amount of data mining, and did not fully consider the relative relationship between the objects in the data object and the neighborhood so that repeated calculation steps too much in the process of mining.a new algorithm NELSC for outlier detection based on density is proposed. In this paper, the main work includes:(1)In view of the vast amounts of large-scale data sets, this paper presents a pruning strategy based on DBSCAN algorithm.According to the characteristics of the DBSCAN algorithm is sensitive to parameters, different DBSCAN clustering results are obtained through the use of multiple sets of different data parameters.Then analyze and integrate the results,obtain the preliminary outlier data set through the pruning of cluster data. The use of multiple sets of different parameters is to avoid the error that pruning the data object of cluster edge, so that reduce the amount of data to the greatest extent, reduce computational complexity, while ensuring the accuracy of detection;(2)In view of the inefficient problem that the traditional algorithms detect outliers in high-dimensional datasets or large amount of datasets,This paper put forward the subspace strategy based on information entropy.By using this strategy, endowing the different weights to different object properties.The different weights generate different attribute weight vector,and then calculate the entropy distance between objects.lt can effectively solve the "dimension disaster" problem which existing in the high dimensional data mining and realize local outliers mining based on the density in the high-dimensional space.(3) In the process of DBSCAN algorithm clustering and computing local outlier factor, neighborhood query information of object p is only used to deal with the current data object p.After the querying completely, discarding the neighborhood imformation of the object p.But the neighborhood imformation of the object p is useful when dealing with the object that in the neighborhood of object p. According to this feature, the neighborhood query optimization strategy based on memory effect is adopted, which effectively reduces the area that neighborhood queries.(4)Both theoretical and experimental results show that the improved algorithm is effective and efficient.
Keywords/Search Tags:Algorithm ELSC, Outliers, Density, High Dimensional Space
PDF Full Text Request
Related items