Font Size: a A A

Research On Outlier Detection Algorithms For Mixed Attribute Data

Posted on:2017-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q JiaoFull Text:PDF
GTID:2348330512451092Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the hotspot problem in research of pattern recognition field,the purpose of research on outlier detection is to find the abnormal behavior patterns that don't meet our expectations from data sets.At present,the existing outlier detection algorithms have been performed well in efficiency and quality.However,there are still many defects existing in outlier detection algorithms,for example,most of the existing algorithms aim at detecting the numerical or categorical data alone.But in the real world,a wide variety of data not only include numerical and categorical data,but also include mixed data.In the processing procedure of real data,the traditional algorithms have been lack of practicability.In this thesis,we have conducted the further study in outlier detection algorithms for numerical and mixed data,the main research results include:(1)We have put forward the PNNRD algorithm for numerical data,the main process is:?because standard deviation can estimate the discrete degree of data sets,we give the right to each column in the data sets by employing standard deviation;? we measure difference degree between objects by making use of the appropriate distance formula;?we use the combination of the reverse and forward nearest neighbor rank difference with density-based method to describe the isolated level of the data object,finally we can get the set of outliers.Experiments and the visual technology clearly indicated the effectiveness of our algorithm.(2)We have proposed the MPNNRD algorithm for mixed data based the PNNRD algorithm for numerical data and the WDOD algorithm for categorical data.the main process is:?we acquire the data object's isolated scores on the numerical properties by using the PNNRD algorithm;?we gain the data object's isolated scores on the categorical properties by employing the WDOD algorithm;?taking advantageof the appropriate parameter,we combine the isolated scores of a data object on the numerical and categorical attributes to describe the data object's isolated degree,finally we can get the isolated point set.Experiments on UCI data sets clearly indicated the effectiveness of our algorithms.(3)Using MATLAB2011a GUI platform,we implemented the outlier detection experiment system for numerical and mixed data,we have realized the friendly interaction among human and computer by employing the system.The main function of the system include:data preparation,the outlier detection for numerical and mixed data,and visualization of testing results.The above research work significantly improve the effectiveness and the adaptability of the existing outlier detection algorithm;at the same time,the above work expands the application field of outlier detection algorithm,it lays the foundation for further research work.We believe the advance of this kind of algorithm can solve more and more realistic problems.
Keywords/Search Tags:Numerical data, Mixed data, Outlier detection
PDF Full Text Request
Related items