Research On Anomaly Detection And Classification Of Labeled Data Based On Data Density

Posted on:2019-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:X N Gao

Full Text:PDF

GTID:2428330566489350

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Data mining technology can quickly find valuable data from a large number of complex data.Data classification technology is an important branch and application.Based on the analysis and research of data density,this paper focuses on two parts of the work.Firstly,this paper analyzes two characteristics of data density.One is that each category of data has unique density information.It means that in a certain business scenario,the density values of different categories of data are different.Therefore,different types of data can be distinguished and identified by this information.Another,assuming that some data are obtained from random sampling of the overall data,the distribution of partial data and overall data is similar,and the ratio of partial data density to overall data density is equal to the ratio of partial data volume to overall data volume.Secondly,the first key research work is to improve the quality of data and improve the quality of the classification model.This paper proposes a method of data preprocessing based on the data density characteristics of labeled data to �purify� the original data.This method uses the first characteristic of data density to calculate the density of each category of data,labels the data samples that do not meet the density characteristics of the category as anomalies and removes them;then uses the �purified� data to build a classifier model.Thirdly,the second key research work is to explore new data classification algorithms.This paper proposes a data classification algorithm based on data density.The algorithm uses the two characteristics of data density for analysis and calculation,calculates the data density of each category,and then finds the data to be classified with the same data density,and divides the data into this category.Finally,the algorithm proposed in this chapter is programmed with R language under Windows platform.Experiments verify the validity and feasibility of the algorithm.

Keywords/Search Tags:

data mining, classification algorithm, anomaly detection, data density, unbalanced data

PDF Full Text Request

Related items

1	Research On Classification Algorithms For Unbalanced Data
2	Anomaly Detection System For Big Data Of Mobile Printed Circuit Board Industry
3	Research On The Classification Ensemble Algorithm For Medical Insurance Anomaly Detection
4	Research On SVM Classification Of Unbalanced Data And Its Application In Identify Poor Students In Colleges And Universities
5	The Classification Algorithm Research Based On Imbalanced Data
6	Research On Classification Algorithms Of Data Mining Based On Imbalanced Data Sets
7	Research On Anomaly Detection Methods For Financial Data
8	Research On Adaboost Improved Algorithm For Unbalanced Data
9	Research And Implementation Of Anomaly Detection System Of Network Traffic Based On Data Mining
10	Unbalanced Data Classification Algorithm Based On SVM For Research And Application