Font Size: a A A

Research And Application Of Multi-trick Hybrid On Long-tail Distribution Data In The Field Of Image Recognition

Posted on:2024-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y B YanFull Text:PDF
GTID:2568307064996909Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the traditional image recognition task,the distribution of training data is usually artificially balanced,so there is no obvious difference among the number of samples of different classes.A balanced dataset surely decreases the reliance on the robustness of the algorithm and ensures the reliability of the model,while as the number of categories increases gradually,the collection cost of maintaining the balance among the categories will increase exponentially.Randomly collect data from nature without collection cost will inevitably leads to the problem: some categories of data are too large and some categories of data are too small,which is called the “long tail” problem.If we ignore the long tail problem,the classifier will favor the results with the highest accuracy,that is,mark almost all data as the majority category,and leads to a loss of performance.In the field of facial expression recognition,the long-tail problem is also extremely common,some expressions are less in most datasets,these expressions include fear,disgusting,etc.,so the solution to the long-tail problem needs to be proposed and applied in the field of expression recognition.There are many solutions in the long tail area,whis article classies them as two types:one is simple tricks to improve the training process,which can be abbreviated as“tricks”,the other is relatively comlex methods,often requiring changes to the netword,or directly proposing a new kind of model.The latter can yield excellent results in some scenarios,however,the former has become the focus of this paper for its simplicity,usability and easy to implement characteristics.Although such methods have the advantages above,there is no good standard in the selection of specific methods at present,the main goal of this paper is to compare several common tricks in the long tail field and find a more general best solution,the work can be divided into following aspects:1.Collecting dominant tricks in long-tail field,such as resampling,reweighting,data mixup,etc.Various specific implementations are selected for each trick,and a comparative experiment is conducted on the long-tail version of the public dataset CIFAR to campare the basic effects of each method to draw a experimental conclusions:in cases where long-tail problems are severe,resampling is less effective than mixup,there is no one best solution for all datasets,not all tricks can improve the effectiveness of the baseline model and two-stage training with resampling or reweighting work better than a single trick,etc.Based on the above conclusions,the paper compare the combinations among various tricks,such as:resampling and reweighting,resampling and data augment,etc,conducts experiments and analyzies each result in detail,not only focusing on exploring the best collection,but also discussing the effect of co-using of multiple tricks is positive or not,their influce on the model is also further discussed from the results,provides theoretical support for the new data augmentation method proposed in this paper: the combination of resampling and data augmentation has the best effect,while the other combination methods are lackluster and even have negative effects.2.A new data augment method,CAM-based Cut Mix is proposed,it is combined with resampling and two-stage training to perform well based on theory of 2,during that process,the best results were achieved in both tricks and combined tricks,with an error rate of 18.55 on the CIFAR50-LT-IF50 dataset when paired with class-balanced sampling,22.90 on the CIFAR50-LT-IF100 dataset.The idea of active learning is introduced for the CAM-Cut Mix method,the entropy value is calculated and the data with high entropy is retained in the process of data generation in order to continuously improve the amount of information of the generated data,and the feasibility of the method CAM-entropy-Cut Mix is explored,but the experimental results show that the effect is not good,and the topl-error is higher than that of the CAM-Cut Mix method alone,which proves that the excessive pursuit of high-quality information weakens the robustness of the model.Therefore,it is concluded that the data enhancement process should not only focus on the complete features of the generated images,but also ensure that the data is sufficiently diverse.3.The method combination of 1 and 2 steps is applied in the field of facial expression recognition,FER model Efficient Face is used as the baseline and it’s accuracy rate of is increased by 1% by using the method in this paper,and it exceeds other classical facial emotion recognition methods to prove the effectiveness,usability and generalizability of the proposed method.
Keywords/Search Tags:Image recognition, Long-tail distribution, Facial emotion recognition, Two-stage training, Class activation mapping
PDF Full Text Request
Related items