Font Size: a A A

Research Of Partial Label Learning Algorithm To Correct The Problem Of Data Imbalance

Posted on:2022-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:P YiFull Text:PDF
GTID:2518306554971269Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There're lots of problems about samples' uncertain label exist in the scenarios of real life,namely that there's only one label is real in the samples' candidate label set,and others are false label,The main research target of partial label learning is how to use the uncertain label's information of the training data to obtain a stable model for classifying.However,in the traditional partial label learning algorithms,few of them take into account the relevance of labels,the generalization of classification models and the imbalance of data.Based on these problems,this paper mainly makes the following research work:1.A Partial label learning via Feature-Guided Disambiguation(PL-FGD)algorithm is proposed to solve the problem that label information in the field of Partial label learning is not fully utilized.The accuracy of classification is improved effectively.First,the similarity between samples' feature is calculated by the least square method,then Pearson correlation coefficient of neighbor samples' label is utilized to determine the similarity between samples,and determine the comprehensive similarity between a sample for disambiguation,finally in the stage of classification,use the Bagging strategy to build the classification tree for achieving the goal of classification.Experiments on the UCI datasets(Deter?Segment and Vehicle)and partial label datasets(MSCRv2?Bird Song?Yahoo! News?Soccer Player and Lost),as well as comparing with the existing algorithm(PL-LEAF?IPAL?M3PL?PALOC?LSB-CMM and PL-ECOC)clearly validate that the effect of classification has been improved.2.A Partial Label Learning based on Balabced Local Linear Embedding(PL-BLLE)algorithm is proposed to effectively solve the problem of class imbalance in highdimensional partial label data,and the accuracy of classification is improved.First,the method of balanced clustering is adopted to divide the data into intervals to minimum the local unbalance coefficient of sample data.Then,based on the feature space of the manifold curvature and sample density to optimize neighbor selection of samples,to obtain the optimal neighbor sample collection,and to get the data in dimension reduction through linear reconstruction and solveing for low dimensional space,in the end,multiple regression classifier is used to classify the data after dimensionality reduction.The experimental results show that the algorithm proposed in this chapter can improve the classification effect obviously in class unbalanced data sets.
Keywords/Search Tags:Weakly supervised learning, Partial labeling learning, Disambiguation, Local linear embedding, Balanced clustering
PDF Full Text Request
Related items