| Tumors are major diseases that threaten human health.Researchers have been searching for the best way to cure tumors for a long time.However,there are many types of tumors,even the same type of tumors can be divided into many different subtypes,and the treatment of different subtypes of tumors is different.Therefore,accurate and rapid classification of tumors can maximize the therapeutic effect,prolong or even save the lives of patients.Using tumor gene expression profiles to classify tumors is a new method which is faster,automated and can save a lot of manpower and material resources.Now,this method has become a research hotspot in the field of tumor classification.However,most of the traditional machine learning methods generally have a low classification accuracy rate for tumor gene expression profiles,and it is necessary to design a more suitable classification algorithm.In order to improve the accuracy of classification,we made three studies,as following:1.We propose a dictionary learning classification algorithm based on discriminant projection.Dictionary learning classification algorithm is an algorithm suitable for processing gene expression profile data.But the general dictionary learning model only focuses on improving the ability of the trained dictionary to reconstruct the samples,but ignores its ability to discriminate the samples.In order to improve this problem,a dictionary learning classification model based on discriminant projection is designed in this paper.In the training process,we train a set of sub-dictionaries for each class of training samples.The ability of each sub dictionary to reconstruct the samples is restricted to the same class sample.A projection matrix is also trained while training the dictionary.Projecting a test sample using the projection matrix can widen the distance between different classes of samples.Finally,we use the sub-dictionaries of each class to determine the class of the sample by reconstructing the error of a test sample.Experimental results on multiple public datasets show that the classification accuracy of this method is higher than the current mainstream methods.2.We propose a dictionary learning classification algorithm combined with the idea of integrated learning.Aiming at the problem that the single dictionary learning classifier has weak classification ability,the idea of integrated learning is combined with dictionary learning.We randomly extract some genes from all the genes in the training sample,and use this selected genes to train a specially designed dictionary learning model to be a weak classifier.In the experiment,a plurality of such weak classifiers are trained,and finally the class of the test samples is determined by the voting results of the plurality of weak classifiers.The experimental results show that the classification performance of this method is greatly improved compared with other methods.3.A feature selection process for gene expression data is proposed.Aiming at the redundancy and noise of gene expression profiles,a feature selection process was designed to screen out the key genes.First,replace the outliers in the data with reasonable values,and then the data need to be normalized.Subsequently,we used a random sequence method and a sample distance method to screen out a certain number of key genes respectively,and then take the intersection of these key genes as the final key genes.Compared with other conventional characteristic gene filtering methods,the method of this paper can achieve higher classification accuracy when fewer key genes are selected. |