Font Size: a A A

Research On Outlier Detection For Targeted Poverty Alleviation Data In Cloud Environment

Posted on:2019-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:S J MaFull Text:PDF
GTID:2428330545982412Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Targeted Poverty Alleviation(TPA)data is usually produced by statistic.It is meaningful to analyze and mine the potential value of TPA data.However,data distortion is one of the main challenge faced in the field of statistics.Because outlier detection can help to improve data quality and gain more reliable results,it is used to overcome the issue of data distortion.Taking TPA in a province as an example,this thesis intends to explore the outlier detection of TPA data.Focusing on the characteristics of the TPA data,such as complexity,high dimension and large volume,the main contents of the thesis are as follows.(1)Because TPA data includes either discrete or continuous attributes,it is complicated to some extent,a novel approach to the discretization of continuous attributes based on information entropy is proposed.Compared with several classic discrete approach based on five public datasets from UCI,the proposed approach shows high advantages for its less information loss.The approach is also applied into 100,000 samples of the TPA data selected randomly to verify its effectivity.(2)Focusing on the high dimension of TPA data,the concept of signal process is introduced,and an approach to attribute sorting based on Maslow's Hierarchy of Needs Theory is proposed.Then,the attributes sorted are regarded as signal sampling points,and a novel approach to outlier detection based on Fourier transform is proposed.Taking a dataset from UCI,experiments show that the proposed approach has advantage in recall rate and error detection rate.Subsquently,the sample-discretized of the TPA data is processed by our method.(3)TPA data contains the details of millions of poor families,conventional computing environments are not adaptive to process it.The approach proposed in the thesis is implemented on Hadoop platform to improve its efficiency by distributed data-parallelism.On the one hand,experiments show the abnormal rate of the TPA data is approximately from 0.005% to 0.013% based on the proposed outlier detection approach.So,to some extent,the TPA data is very reliable.On the other hand,the proposed approach provides a candidate measure to outlier detection for complex,highdimension and large scale data.
Keywords/Search Tags:Outlier Detection, Information Entropy, Fourier Transform, Targeted Poverty Alleviation, Cloud Computing
PDF Full Text Request
Related items