Font Size: a A A

Research On Uncertain Data Clustering And Classification

Posted on:2019-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:1368330545469091Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional clustering and classification algorithms are only applicable for certain data.However,due to equipment measurement errors,network transmission interference,user privacy protection,uncertain data is ubiquitous in many real applications.As the introduction of uncertainty,traditional clustering and classification algorithms cannot deal with uncertain data directly,they cannot meet the demand of real applications.It is important to design special clustering and classification algorithms for uncertain data.In this dissertation,we focus on the problem of uncertain data clustering and classification and aim to design effective clustering and classification algorithms for uncertain data.The main contributions are as follows.(1)Self-adapted mixture distance measure for clustering uncertain data.Geometric distance measure cannot identify the difference between uncertain objects with different distributions heavily overlapping in locations.Probability distribution distance measure cannot distinguish the difference between different pairs of completely separated uncertain objects.This dissertation proposes a self-adapted mixture distance measure.It considers the geometric distance and the probability distribution distance simultaneously,adjusts the importance degree of different distance meaures according to the location overlapping information of the dataset adaptively,and then improves the performance of uncertain data clustering.(2)Density-based and hierarchical density-based clustering algorithms for uncertain data.Existing density-based and hierarchical density-based clustering algorithms suffer from the issues about the loss of uncertain information,high computational complexity and non-adaptive probability threshold.This dissertation proposes novel density-based and hierarchical density-based clustering algorithms for uncertain data.By using the accurate method to compute the probability that the distance between uncertain objects is less than or equal to a boundary value,introducing a series of new definitions like probability neighborhood,support degree,core object probability,direct reachability probability,fuzzy core distance and fuzzy reachability distance,the proposed algorithms overcome the issues in existing density-based and hierarchical density-based clustering algorithms,and then improve the performance of uncertain data clustering.(3)AdaBoost classification algorithm for uncertain data based on possible world.Existing uncertain data classification algorithms rely on ideal probability distributions.Traditional classification algorithms cannot deal with uncertain data directly.This dissertation proposes an AdaBoost classification algorithm for uncertain data based on possible world.By introducing possible world in multiple phases and adding extra majority voting and weighted voting procedures,the proposed algorithm can deal with arbitrary probability distributions and make it possible to use any existing classification algorithms for certain data to serve for uncertain data,thus expands the application field of uncertain data classification and improves the performance of uncertain data classification.(4)Consistency learning based uncertain data clustering and classification.Firstly,existing uncertain data clustering and classification algorithms ignore the consistency among different possible worlds,this dissertation proposes a similarity matrix based consistency learning framework for uncertain data clustering and classification.The proposed framework utilizes the consistency principle to learn a consensus affinity matrix for uncertain data,which can make full use of the information across different possible worlds and then improve the clustering and classification performance.Secondly,marginal possible worlds may cause negative effects on the clustering result,this dissertation designs a representative possible world selection strategy to filter out the marginal possible worlds.By integrating this strategy with eigenvector matrix based consistency learning,this dissertation proposes a representative possible world based eigenvector matrix consistency learning algorithm for clustering uncertain data,thus achieves better performance.
Keywords/Search Tags:Uncertain Data, Clustering, Classification
PDF Full Text Request
Related items