For The Non-equilibrium Hybrid Data Classification And Its Application

Posted on:2009-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Chen

Full Text:PDF

GTID:2208360245483026

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The processing of the imbalanced mixed data is very commom in the real world, Such data are unevenly distributed, and diversity of attributes. The effectiveness of traditional classification learning methods is not high in dealing with this type of data, and if the minor samples is sufficiently important, it may lead to greater losses. So against non-equilibrium mixed data processing methods have become one of the focal point of the current domestic and international data mining research.The main research work of this paper is on the basis of traditional classification methods, through improving the traditional methods, achieve non-equilibrium mixed data processing. It was found that k-nearest neightbours by counting can be effective in the mixed data classification by analyzing the algorithm, but the effectiveness of the algorithm are not satisfactory for non-equilibrium data processing. So this paper proposes three improved classifying methods by combining the characteristics of imbalanced data with CwkNN algorithm, were as follows:(1) The overall density classification algorithm: Against the characteristics of the CwkNN algorithm can not handling non-equilibrium data, the introduction of a overall density, re-balancing of data on the impact of the classification. It was found that the minor samples increase the accuracy of the classification, and the majority samples reduce the classification accuracy through experiments.(2) K—local density classification algorithm:Aim at the overall density classification algorithm reducing the classification accuracy of the majority samples, the introduction of a K—local density to ensure that the minor samples will improve the accuracy of classification, and the majority samples will not reduce the classification accuracy at the same time. It was found that the effective increase in imbalanced type of data classification accuracy through experiments.(3) The boundary points detection and classification algorithms based on the density: Aim at the boundary points in the data, the paper proposed a boundary points detection method based on the density, and use the three kind of classification methods of boundary points to classify boundary points detected. Experiment prove that these method can classify the non-equilibrium data with boundary points correctly.

Keywords/Search Tags:

k-nearest neightbours by counting, non-balanced data, overall density, k-local density, boundary point detect

PDF Full Text Request

Related items

1	Research And Application Of Financial Big Data Based On Density Peak Clustering Of K Near Neighbors
2	The Outliuer Mingng Algorithm Based On Gaussian Kernel Function And Local Density
3	Study Of Boundary Detecting Algorithm For Each Cluster
4	The Research And Implementation Of Density-based Clustering Algorithm With Pattern Evaluation Methods
5	Research On Clustering Algorithm For Fast Recognition Of Density Backbone
6	Research On Improved Density Peak Clustering Algorithm
7	Research And Improvement On Density-Based Clustering Algorithm
8	Density Estimation-based Crowd Counting Methods For Complex Scenes
9	Research Of Density-based Clustering Algorithm By KNN
10	Research Of Clustering Algorithm Based On Data Local Distribution