Font Size: a A A

Research On Granular Computing Based Outlier Mining Methods

Posted on:2018-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H YangFull Text:PDF
GTID:1318330542491547Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier mining is an important research topic of data mining.The task of outlier mining is to discover the abnormal knowledge from data sets,so it has many applications in production,living,and scientific research.In recent years,outlier mining is facing new problems and challenges with the increase of data volumes and the complexity of application scenarios.Granular computing,which is an important theory to deal with fuzzy and massive information,has become a hot issue in the field of artificial intelligence.Granular computing is a simulation of the global analysis ability of human beings.It abstracts an intricate problem into simple model granules from different levels,and then problems are analyzed and solved based on these simple model granules.In particular,granular computing hierarchically analyses and solves problems by grouping,classifying and clustering data,becoming a new concept and paradigm of information processing.Considering some problems of existing outlier mining methods,this thesis proposed four various novel outlier mining methods by clustering,classification and hierarchical analysis from the perspective of granular computing,and the effectiveness of these methods was verified in experiment comparison.The study content of this thesis is summarized as follows:(1)Existing clustering based outlier mining methods only consider cluster optimization,while they ignore to optimize outlier mining.Simultaneously,in order to use a small amount of label information to improve the accuracy of outliers mining,we propose an outlier mining method based on feature weighted semi-supervised PCM clustering.Considering mutual influence between clustering and outlier mining,the objective function of this model adaptively assigns different weights to each feature.This method maximizes the membership degree of a labeled normal object to the cluster it belongs to and minimizes the membership degree of a labeled normal object to clusters it does not belongs to.Meanwhile,this method minimizes the membership degrees of a labeled outlier to all clusters.According to semi-supervised clustering,fuzzy partition of the dataset is obtained.And then fuzzy information granules of each cluster are deduced.Under the frame of fuzzy information granules,the outlying degree is defined base on the principle that the membership of outliers to each fuzzy information granule.According to outlying degree of each sample,outliers can be mined effectively in dataset.(2)Support vector data description(SVDD)based outlier mining method describes and models training dataset consist of normal samples.Samples locating outside decision boundary are naturally recognized as outliers.In order to reduce the negative influence of outliers to the training model,a one-cluster kernel possibilistic C-means(PCM)based SVDD model for outlier mining is proposed.In OCP-SVDD model,each sample of the training dataset is assigned a confidence level based on the membership degree of each sample belonging to the target class granule,which is obtained through the one-cluster kernel PCM clustering based granulation.The information granule of target class is deduced according to confidence level of each sample.In order to distinguish different contribution of each sample to training model,the membership of training samples to fuzzy granule is incorporated into OCP-SVDD training model.Because outliers far away from the center,they have less confidence levels,and the negative influence of outliers to decision boundary is weakened.(3)In order to reduce the negative influence of contamination to training process of one class support vector machine,manifold distance based one class support vector machine(MD-OCSVM)is proposed for outlier mining in high-dimension dataset.The manifold distance between normal samples located in the same manifold is small.While,the manifold distances between normal samples and outliers are large,because outliers locate outside the manifold.According to manifold distance between samples and center,a fuzzy granulation of the training set is conducted,in which sample memberships are calculated.Theses memberships can reflect the importance of each sample in process.The memberships of most outliers are small,because these points always deviate from the manifold.MD-OCSVM introduces the information of fuzzy granulation into the training model to improve the importance of normal samples which located in manifold and reduce advice influence of outliers to decision boundary.Finally,effectiveness of outlier mining in high-dimension dataset is improved.The effectiveness of MD-OCSVM is finally validated in experiments of synthetic dataseta,UCI datasets and fault detection.(4)In order to overcome the limitation of single granularity based outlier mining methods,an unsupervised outlier mining model is proposed based on multi-granularity theory model.Firstly,we neighborhood granulate dataset and build multi-granularity neighborhood hierarchical model.Under multi-granularity neighborhood hierarchical model,new features of outlier are defined to describe outlying characteristic under multiple views.Because analysis scale and depth are different under different granularities,in order to complement decision of multiple views,multi-granularity neighborhood hierarchical model is formulated based on group decision.And then we form compromised decision used to estimate whether one point is outlier.Maximizing consistency between group decision and individual decision,optimal weights of each granule in fusion process is obtained.Finally,according to group decision,we can obtain compromise outlying degree of each point,and then outliers in dataset are detected.Finally,we summarize the main contribution of this thesis,and discuss weaknesses and future research work.
Keywords/Search Tags:outlier mining, clustering, granular computing, one-class classification, multi-granularity, group decision making
PDF Full Text Request
Related items