Font Size: a A A

Researches On Outlier Detection Algorithms For Categorical Matrix-object Data

Posted on:2020-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:X L WuFull Text:PDF
GTID:2428330578473738Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Outlier detection is an important issue in data mining which aims to discover useful anomalous objects and anomalous patterns hidden in large data sets,and has been widely used in credit card fraud,network monitoring,e-commerce,fault detection,bad weather forecasts and health system monitoring.The input of the existing outlier detection algorithms is a collection of 9)objects and each object is described by a feature vector.However,in many real world applications,an object is described by a number of feature vectors.In this thesis,we define an object described by more than one feature vector as a matrix-object and a data set consisting of matrix-objects as a matrix-object data set.At present,there is no effective algorithm to detect outlier in the matrix-object data set.If the existing outlier detection algorithms are used to process the matrix-object data set,the direct ways are to compress and transform the data.However,the compressed data usually loses a lot of information,not enough to reflect the user's behavior characteristics.Therefore,we conduct in-depth and meticulous research and discussion on the outlier detection for categorical matrix-object data,and propose new algorithms.The main woks are shown as follows:(1)Since the object in the matrix data set contains multiple records,each matrix object can be regarded as a small data set,by giving the cohesion degree of a matrix-object itself and the coupling degree with other matrix-objects to define the outlier factor of the matrix-object,and we propose an outlier detection algorithm based on information entropy.(2)There are complex interactions between data attributes.Therefore,it is also necessary to consider the influence of the interaction between attributes on the outlier detection for matrix-object data set.Therefore,we measure the interaction between attributes by mutual information when calculating the cohesion of the matrix-object itself,and proposes an outlier detection algorithm based on information entropy and mutual information.In summary,we study how to detect outliers for the categorical matrix-object data,propose new algorithms,and verify the effectiveness and expansibility of the new algorithms on the real datasets.The research results provide new ideas and new methods for outlier detection of matrix-object data,and have theory and application values in practical applications.
Keywords/Search Tags:data mining, outlier detection, categorical matrix-object data, cohesion degree, coupling degree
PDF Full Text Request
Related items