Font Size: a A A

Research On Data Mining Algorithm Of Uncertain Data

Posted on:2019-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:W Y ChenFull Text:PDF
GTID:2428330569996087Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Because of the influence of factors,such as measurement error,equipment precision,privacy protection,uncertain data appears more and more commonly and generally plays a significant role in many practical fields.If traditional certain data mining methods are applied to uncertain data directly,the probability dimension of uncertain data will be ignored with inaccurate mining results.Uncertain data mining is able to obtain information from uncertain data,so it is meaningful and valuable in practice.On the basic of explanation of the background and classification of uncertain data,and introduction of concepts and related algorithms of data mining,this thesis focuses on researching and analyzing outlier detection and clustering on uncertain data.And we have solved the problem that is how to achieve more diverse and accurate results of outlier detection and clustering over uncertain data.The main research contents are as follows:The importance of outlier detection on uncertain data has been illustrated by specific examples,but the distribution of neighbors around the data objects has been ignored in distance based outlier detection algorithms.To overcome that defect,the algorithm IDDOD has been proposed.A series of concepts,such as the probability neighborhood,local density and local outlier factor of continuous uncertain objects have been formally defined.The algorithm IDDOD combines approaches of distance and density based outlier detection.Firstly,pruning is carried out according to the method of distance based outlier detection.In pruning process,IDDOD marks uncertain objects of which the in probability neighborhood is greater than the threshold value as non-outliers.And non-outliers that account for most of the data set are excluded.Then local outlier factor values of remaining points are calculated to judge final outliers according to the method of density.Algorithm IDDOD not only considers the total number of data objects' close neighbors but also takes densities of their neighbors,when it determines whether a data object is an outlier or not.It uses integrals to calculate the probability that the distance between two uncertain objects is less than or equal to a certain threshold,which preserves uncertain information of the data and leads to a more accurate result.Experiments show that the algorithm IDDOD can accurately detect the outliers in less time over uncertain data.Algorithm IAUC employs expected distance value to define the distance between the uncertain objects and the measurement method on the bottom-up agglomerative clustering algorithm.It improves the distance formula in the traditional clustering algorithm,so that IAUC is applicable on uncertain data.At the same time,the new average distance between clusters is defined,which acts as the standard for merging clusters in clustering.Results of experiments indicate that algorithm IAUC can accomplish the clustering task accurately and effectively.
Keywords/Search Tags:Uncertain Data, Data Mining, Outlier Detection, Clustering, Hierarchical Agglomeration
PDF Full Text Request
Related items