Font Size: a A A

The Research On Uncertain Outlier Detection Algorithm For Internet Fraud Detection

Posted on:2017-07-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:1318330536967209Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet,people's social life is more and more tightly connected with Internet.At the same time,the network fraud has become an important factor that affects the normal life of people.Outlier detection is an important data mining technique and an important means of anomaly detection,and the outlier detection is one of the most commonly used outlier detection techniques.In this paper,we study the distancebased outlier detection algorithm in uncertain datasets for network fraud detection.Network fraud occurs in the process of network transactions,and is accompanied by abnormal network transaction behavior.In this paper,each user's network transaction behavior is considered as a data object,which is mapped to a multidimensional space,and each attribute of the network transaction behavior is a dimension of the space.An abnormal network transactions often reflected a few data objects far away from most data objects.Detecting these spatial few data objects is the process of outlier detection.In this paper,we try to find the network fraud by using outlier detection in the network transaction log data.In this paper,we use the distance-based outlier detection technique.At the same time,uncertainties often exist in the network transaction log.A user's network transaction behavior often has many features,and it is difficult to be described using a fixed data object in a multi dimensional space,which makes the traditional distancebased outlier detection algorithms can not be directly applied to uncertain datasets.In order to solve this problem,we have studied the problem of detecting the distance-based outlier detection algorithm in uncertain datasets.Firstly,we use the x-tuple model and the possible world semantic to describe uncertain datasets.Each uncertain data object is represented by an x-tuple.Each tuple of the x-tuple is an instance of the data object.Each tuple is with a probability.Tuples from different x-tuples compose a possible world.Each possible world is an instance of the dataset.Then we consider outlier detection in uncertain datasets as a type of query.Focusing on different features of uncertain datsets,we proposed four novel concepts of outlier detection: the expected outlier,the semi-expected outlier,the top-(k1,k2)outlier and the relative outliers.The expected outlier detection is the most sample one of them.Every x-tuple is assigned an constant expected outlier score,and x-tuples with largest expected outlier score in an uncertain dataset are the expected outliers.The semi-expected outlier detection,which is an improvement of the expected outlier detection,work out the incompleteness of the datasets.The top-(k1,k2)outlier detection work out the problem of noise.It doesn't calculate tuples and x-tuples' outlier score,but compare x-tuples in each possible world.The relative outliers detection is different from others.It find outliers by comparing each couple of x-tuples.The relative outliers detection can don't need a professional threshold,which makes it can be widely used.We formally define above four outliers and construct relative algorithms.Some pruning strategies are proposed for acceleration.Finally,we use experiments in synthetic and real datasets to test these algorithms.Existing research is always with some limitation.First,hey often suppose that each data object is subject the some known distribution,specially the normal distribution.However,it's hard to get the analytical distribution of the dataset.Second,although some researchers also used the x-tuple model and the possible world semantic to describe uncertain datasets,they did not consider variety of datasets.In this paper,we avoid the drawback the existing research,and propose effective algorithms to detect outliers with data incompleteness and variety.
Keywords/Search Tags:network fraud, uncertain data, outlier detection, x-tuple model, possible world semantic
PDF Full Text Request
Related items