| Weak supervised learning is different from supervised learning that fully trusts the label information of instance samples and unsupervised learning that only uses the feature information of instance samples.It assumes that there are missing,unreliable and wrong labels in the instance samples.Therefore,the key to weak supervised learning is to explore how to mine effective information from the instance samples with inaccurate information and build models.Partial label learning belongs to the weak supervised learning framework of inaccurate supervised learning type.In partial label learning,the labels of training samples are not unique and explicit,but contain inaccurate label information.The purpose of partial label learning is to data mine the candidate label set and build a label prediction model to finally determine the labels of new samples.Most partial label learning algorithms generate candidate label sets by randomly adding additional labels to the basis of the original labels for training,which does not make good use of the reliable prior information of partial samples and leads to building models that are not suitable for realistic situations in which some samples have confusable candidate labels.In addition,many partial labeling algorithms use only the nearest neighbor nodes to construct the graph,ignoring the importance of reliable sample information.At the same time,many datasets in reality have the problem of incomplete accuracy of label information,and the partial label learning that can extract valid information from incomplete accurate label information has a wider applicability.For example,due to the high cost of manual labeling,the dataset can only ensure that parts of the instance sample labels are labeled;because the manual labeling work requires extremely high professionalism of the labelers,part of the data have label ambiguity problems.In particular,in the field of psychiatric diagnosis,highly heterogeneous psychiatric disorders can have overlapping symptoms and similar disorders,leading to difficulties for doctors to confirm the diagnosis.Partial label learning is suitable for application to neuroimaging data to extract biotypes of psychiatric disorders.The main research of this dissertation is as follows:(1)A new Instance-based Nearest Neighbor Propagation-based Partial Label Learning(INNPL)algorithm is proposed to address the problem of how to generate more realistic candidate label sets and effectively utilize the information of highly reliable samples.INNPL makes better use of the apriori information in the generation of candidate label sets by using only the original labels for samples with high reliability without adding additional random labels,and adding labels with high confusion for other samples as candidate label sets based on their original labels,iteratively building and updating the graph structure based on the nearest neighbor samples and samples with high reliability,and progressively labeling the samples in a hierarchical manner.Finally,a reliable classification model and classification results are obtained.(2)The validation of the INNPL algorithm effectiveness was tested based on nine different kinds of data and compared comprehensively with seven partial label learning methods.The evaluation results confirm that the INNPL algorithm has better classification results compared to other methods.(3)The application of the INNPL algorithm to psychiatric data and identification of psychiatric biotypes.Based on f MRI data from 113 bipolar disorder with psychosis(BPP)patients,113 schizoaffective disorder(SAD)patients,113 schizophrenia(SZ)patients and113 healthy controls(HC),we obtained meaningful biotypes with INNPL,showing significant differences between the identified biotypes and functional connectivity(FC). |