Error-Correcting Output Codes is a multi-class classification ensemble learning algorithm framework.The basic idea of the algorithm is "divide and conquer",which transforms the multi-classification problem into multiple binary classification problems,and focuses on solving simple problems.Finally,the final multi-class classifier is obtained by integrating the classifiers of these simple problems.Although error-correcting output codes is already a relatively mature technology,once it reaches a specific field,the phenomenon of "unacceptable" will also appear.This thesis hopes to realize the adaptive optimization of the Error-Correcting Output Codes algorithm by aiming at the non-adaptive situation of the Error-Correcting Output Codes in the partial label problem and the targeted drug combination problem.The main contributions of this thesis are as follows:1.Aiming at the situation that the error-correcting output coding matrix cannot be adaptively generated in the partial label problem,this thesis proposes a method of using cluster analysis of the sample to obtain the information of the characteristic distribution and label distribution of the sample and a random flip coding strategy to generate adaptive coding Matrix method;while ensuring that the coding matrix can retain sufficient prior information of the sample,the distance between rows is increased,and the diversity is increased.2.Aiming at the weak performance of the coding matrix base classifier,this thesis proposes an adaptive greedy feature selection algorithm,which improves the classification performance of the base classifier by adaptively selecting the training samples corresponding to each column.Experiments show that the algorithm designed in this thesis has certain advantages in performance compared with the current partial labeling algorithm.3.Aiming at the situation that the Error-Correcting Output Codes cannot be directly used in the problem of targeted drug combination medication,this thesis designs a homogeneous integrated error correction output code.In this thesis,by pre-selecting the features of the data to obtain the most influential features,and then using the independent component analysis method of dimensionality reduction to reduce the dimensions of the selected features,you can get samples of the same data with different features adapted to the data set,and use OVO and The OVA method generates coding matrices,trains these data into OVO and OVA coding matrices based on the same base classifier,and finally integrates them to obtain an error correctiing output coding matrix algorithm.The performance of this algorithm is greatly improved compared to classic machine learning algorithms. |