Font Size: a A A

Research On Anomaly Detection And Classification Of Labeled Data Based On Data Density

Posted on:2019-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:X N GaoFull Text:PDF
GTID:2428330566489350Subject:Engineering
Abstract/Summary:PDF Full Text Request
Data mining technology can quickly find valuable data from a large number of complex data.Data classification technology is an important branch and application.Based on the analysis and research of data density,this paper focuses on two parts of the work.Firstly,this paper analyzes two characteristics of data density.One is that each category of data has unique density information.It means that in a certain business scenario,the density values of different categories of data are different.Therefore,different types of data can be distinguished and identified by this information.Another,assuming that some data are obtained from random sampling of the overall data,the distribution of partial data and overall data is similar,and the ratio of partial data density to overall data density is equal to the ratio of partial data volume to overall data volume.Secondly,the first key research work is to improve the quality of data and improve the quality of the classification model.This paper proposes a method of data preprocessing based on the data density characteristics of labeled data to “purify” the original data.This method uses the first characteristic of data density to calculate the density of each category of data,labels the data samples that do not meet the density characteristics of the category as anomalies and removes them;then uses the “purified” data to build a classifier model.Thirdly,the second key research work is to explore new data classification algorithms.This paper proposes a data classification algorithm based on data density.The algorithm uses the two characteristics of data density for analysis and calculation,calculates the data density of each category,and then finds the data to be classified with the same data density,and divides the data into this category.Finally,the algorithm proposed in this chapter is programmed with R language under Windows platform.Experiments verify the validity and feasibility of the algorithm.
Keywords/Search Tags:data mining, classification algorithm, anomaly detection, data density, unbalanced data
PDF Full Text Request
Related items