Small-scale Data Classification On Deep Forest

Posted on:2021-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Zhang

Full Text:PDF

GTID:2428330611464283

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of digital technology,a large amount of data has been generated and stored in all walks of life.The accurate classification of these massive data is the basis for the subsequent effective analysis.Due to personal privacy and security issues,in some industries with high information confidentiality,only a small amount of stored data can be obtained,and the labor cost of labeling a large number of data is too large,which makes the available data very limited.The research shows that the deep learning model needs a lot of training data,and it is easy to have fitting problems in some small-scale data tasks.Therefore,the research on small-scale dataset classification has far-reaching significance.Because of its high intelligibility and automatic determination of cascading layers,deep forest model has obvious advantages in processing small dataset classification tasks.Small sample size of small data sets usually has problems such as category imbalance and poor diversity.Category imbalance will affect the ability of random forest to effectively learn the accurate distinguishing features between categories.Poor data diversity will lead to the failure of the model to learn the overall data distribution of the original data,which may lead to over-fitting phenomenon of the deep forest model,which resulting in poor classification performance of the model.This paper makes an in-depth analysis of these two problems as follows:1)To solve the problem of class imbalance in small datasets,the strategy of building tree by class in multi-grained scanning is studied,and the Skip Connection Forest(SCForest)model is proposed.By adding skip connection in the cascade forest,the feature disappearance or feature explosion is effectively alleviated when the feature vector propagates backward.Five types of classifiers are used in the cascade layer to improve the ensemble diversity and the standard deviation of the first k important features is considered as the enhanced features,which optimizes the transmission process of the effective features in model learning.The experimental results show that the proposed SCForest model can effectively avoid the influence of class imbalance in the classification of small data sets,especially in the high-dimensional and multi-class datasets,which improves the generalization ability of the model in small datasets.(2)To solve the problem of poor diversity of small data sets,according to the superior performance of generative adversal network in generating artificial sample data,the weak labelled generated data with the same distribution as the original data is obtained.Based on SCForest,the Joint Learning Forest(JLForest)model is proposed.The JLForest model dynamically updates the weak label of the generated data by cascades through the previous i layer until it reaches a certain degree of accurate confidence.By designing the joint loss function,the method of training the cascade forest with the original data and the generated data is proposed.The experimental results show that the classification effect of generated data as additional data is slightly inferior than that of real data as additional data,and JLForest can obtain the best classification performance on these data sets by setting the appropriate data generation rate for different small data sets.In this paper,the deep forest model is studied for the problem of small data set classification.By using the strategy of building trees according to classes,we propose SCForest to solve the problem of class imbalance.By further improving the cascade forest,we improve the transmission efficiency of effective features.Then,based on the SCForest model,we propose a joint training strategy to increase the diversity of data by adding the generated samples JLForest model.Experiments show that JLForest model can improve the classification accuracy of small data sets by adding a certain amount of generated data.This method provides a new solution for special industries that can only obtain a small amount of training data.According to the data classification results,enterprises can carry out subsequent customer behavior analysis and precision marketing.

Keywords/Search Tags:

Small-scale Dataset, Deep Forest, Generative Adversarial Network, Diversity, Generated Data

PDF Full Text Request

Related items

1	Stereo Target Recognition For Small Dataset Based On Generation Network And Classifiers Fusion
2	Research Of SAR Image Data Diversity And Data Augmentation Method
3	Research Of Crowd Density Estimation Based On Generative Adversarial Network
4	High-Resolution Realistic Image Synthesis From Text Description Using Iteratively Generative Adversarial Network
5	Data Privacy Masking Of Text Sequence Dataset Based On Generative Adversarial Network
6	Research On Image Data Generation Technology Based On Generative Adversarial Network
7	Research On Data Generation Model Based On Generative Adversarial Network
8	Image Diversity Generation Based On Generative Adversarial Network
9	Research And Application Of Generated Anti-Neural Network In Intrusion Detection
10	Research On HRRP Generation Method Based On Generative Adversarial Networks