Instance Reduction With Granular Computing Based Data Importance Labeling

Posted on:2020-10-28

Degree:Master

Type:Thesis

Country:China

Candidate:L Liu

Full Text:PDF

GTID:2428330596977374

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Nowadays,the amount of data generated by different fields is exponentially increasing.However,the processing performance of instance-based machine learning is struggling under this growth,and the great storage cost of big-size data also needs to be solved.Therefore,instance reduction is one of the hot spots in large-scale data processing.Many existing instance reduction algorithms struggle with the trade-off among great computational complexity,reduction rate and the learner performance on the reduced datasets,especially for the large-scale datasets.Motivated by this,the instance reduction algorithms with granular computing based data importance labeling are studied here,and research contents are mainly as follows:(1)Fast Data Reduction with Granulation based Instances Importance Labeling:According to the research results of Granular Computing in the field of feature selection,we propose a fast data reduction algorithm with granulation based instances importance labeling(FDR-GIIL).The original dataset is first mapped into a lower-dimension space and granulated into K granules by applying K-means;then,the importance of each instance in every granule is labeled based on its Hausdorff distance;those instances whose importance values are lower than an experimentally tuned threshold are selected to delete;furthermore,the crowding degrees of those instances with the same data importance are calculated,and finally the less crowded instances are retained to the reduced subset,so that the well-distributed samples can be reserved.The presented algorithm is applied to 18 different sizes of datasets from UCI Repository,and its outstanding performance in classification accuracy,size reduction rate,and running time are illustrated by comparing the proposed approach with other seven data reduction methods.The experimental results demonstrate that the proposed algorithm can greatly reduce the computational cost with higher classification accuracy when the reduction size is the same as all the compared algorithms.(2)Improved Data Reduction Combining Noise Deletion and Feature Selection:Although FDR-GIIL can quickly reduce instances,the classification accuracies of the large-scale datasets are still to be further improved.Therefore,noise deletion and feature selection are combined with FDR-GIIL algorithm to enhance the performance of data reduction(EPDR).First,the edited nearest neighbor(ENN)is used to remove the noisy instances among the initial dataset,and the granulation mapping based onprincipal component analysis(PCA)is proposed;the Euclidean distance and Value Difference Metric(VDM)are mixed to calculate the instances importance.EPDR is applied to the popular datasets and compared with FDR-GIIL as well as a popular data reduction method.The experimental results show that the proposed algorithm can effectively enhance the classification accuracy on reduced datasets within the acceptable running time.The proposed fast data reduction with granulation based data importance labeling uses the �divide and conquer' strategy to label the data importance,it can quickly remove most unimportant data from the original dataset.FDR-GIIL has an obvious advantage in reducing computation cost;the improved data reduction combining noise deletion and feature selection makes further improvement on FDR-GIIL,the performance of data reduction is enhanced by using ENN denoising,PCA dimensionality reduction and the importance labeling based mixed distance calculation.

Keywords/Search Tags:

instance reduction, granular computing, data importance labeling, K-means, mixed distance calculation

PDF Full Text Request

Related items

1	Study Of Data Mining Based On Rough Set And Granular Computing
2	The Research Of Fault Diagnosis Based On Knowledge Reduction Of Granular Computing And SDG
3	Rough Decision Rules Reduction Based On Granular Computing
4	Granular Computing Reduction Method Of SDG Non-accommodating Fault Decision Table And Its Application
5	High Efficient Algorithm For Knowledge Reduction Based On Granular Computing And Method For Dealing With Missing Value
6	Studies On Granular Computing-based Of Text Classification Technology
7	Research And Application Of Granular Computing Based On Rough Sets In Data Mining
8	Attribute Reduction Based On Granular Computing Algorithm And Applied Research
9	Knowledge Reduction And Imputation Based On Granular Computing In Formal Concept Analysis
10	The Research Of Rough Set Theory And Granular Computing Crossing Problems