Font Size: a A A

Research On Semi-supervised Classification Based On Tumor Gene Expression Data

Posted on:2018-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:D ChenFull Text:PDF
GTID:2348330542960048Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the innovation of microarray technology,gene expression profiles have shown a great potential in tumor subtype prediction.Accurate prediction the categories of tumor based on gene expression data will facilitate select the appropriate treatment plan for the patients.Nevertheless,small sample size remains a bottleneck to design suitable classifiers.Traditional supervised classifiers can only work with labeled data,a large number of gene expression profiles data that do not have adequate follow-up information are disregarded.Semi-supervised classifiers by introducing the distribution information of unlabeled gene expression profiles data have been proved that can significantly improve the classification performance and generalization ability of the classification model.In this paper,we focus on depth analysis the semi-supervised classification based on gene expression profiles,and proposed advanced semi-supervised classification algorithm to improve the classification performance and data generalization ability.The main work is as follows:For the Transductive Support Vector Machine(TSVM),it is necessary to evaluate the distribution of the unlabeled samples through the distribution of the label samples in the whole samples space.In the case of the small size of labeled samples and unlabeled samples with different distribution of labeled samples,it is easy to cause the problem of big estimation error.This paper proposed a Progressive Filtering Transductive Support Vector Machine(PL-TSVM),through progressive filtering labeling the unlabeled samples,which can not only avoid the reduce of learner performance caused by estimation error of the data distribution in the sample space,but also filtering the semi-labeled samples which are not necessarily accurate in a certain degree,thus ensure the new labeled samples in work set from the correct label samples in a certain extent,thereby reducing errors accumulate,improve the performance of learners.PL-TSVM effectively solves the problem of unbalanced distribution of unlabeled samples and labeled samples in the field of semi-supervised learning.The simulation experiments were carried out on four publicly available gene expression data sets,and the performance of PL-TSVM algorithm was significantly better than that of TSVM and S4VM when the distribution of the unlabeled sample and the labeled sample is unbalanced.Considering the different samples error has different misclassification cost,the cost-sensitive strategy is introduced into the PL-TSVM algorithm.By evaluating the kernel distance between the samples data and the center of the class,the different misclassification cost of the sample data is given and Cost Sensitive Progressive Filtering Transductive Support Vector Machine(CS-PL-TSVM)is obtained.The superiority of this method is verified by the simulation experiments on the gene expression data set.
Keywords/Search Tags:semi-supervised classification, transductive, progressive filtering, cost-sensitive
PDF Full Text Request
Related items