Font Size: a A A

Research On Several Key Problems Of Machine Learning In Computer Vision And Cancer Bioinformatics

Posted on:2020-05-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X C YuFull Text:PDF
GTID:1368330575981198Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The computer vision technologies alwasys aim to enable computers,like human beings,to recognize the objects in the visual field,perceive the environment,and understand the world around them.However,in the real world,there always exists various kinds of interference information,such as noise,occlusion,illumination changes,geometric distortion,view change,translation,rotation,affine transformation,scale change,scene layout,object appearance and other interference factors,which have brought certain difficulty in effectively conducting the computer vision tasks.The development of machine learning methods promotes the solution of key problems in computer vision.Machine learning methods can effectively reduce the influence of interference factors,alleviate the model overfitting,and solve the model training issues faced with small samples.With the help of relevant algorithms,techniques and frameworks in machine learning,this paper is devoted to solving the corresponding problems of computer vision fields.In addition,cancer bioinformatics,as an interdisciplinary subject of computer and biology,has attracted more and more researchers' attention.It is of great significance to apply machine learning methods to solve the related problems of cancer classification.To solve the problems such as breast cancer subtype classification and cancer staging classification based on machine learning can not only help researchers recognize the different functions of breast cancer subtypes,functional changes in cancer staging,etc.,but also assist doctors to conduct cancer diagnosis and provide more accurate treatment for patientsThe main contents and contributions of this paper are as follows.1.The key issues realted to linear subspace learning in face recognition include: 1)The issue of dissimilarity measurement of face images.2)The issue of spatial structure preservation of face images.In this paper,we carry out specific research on these two problems.For the first problem,this paper proposes to adopt the improved EMD metric to measure the dissimilarity of p HOG histograms on blocks of face images.On the one hand,it aims to reduce the computational complexity of the improved EMD,and on the other hand,it strives to show the strong robustness to occlusion,exposure,and other interference factors of the improved EMD.For the second problem,face images are divided into blocks,and p HOG histogram is extracted on each block.In this paper,the improved EMD metric for LPP is proposed.The algorithm first uses the sum dissimilarity betweem p HOG histograms on the corresponding blocks in the two face images as the dissimilarity of the two face images,then constructs the adjacency graphIEG by calculating the K-nearest neighbors,and finally completes subspace learning for face images.In order to better utilize the spatial structure information of face images,the improved EMD metric for BSLPP is proposed in this paper.The algorithm first calculates the sub-adjacency graphbG on the corresponding blocks in the face images,then combines these sub-adjacency graphs to obtain the final adjacency graphIEG,and finally conducts the subspace learning for face images.2.The two main issues of visual dictionary learning are: 1)How to determine the size of the visual dictionary,that is,how to determine the number of visual words or codewords.2)How to preserve the spatial structure information of local features.In this paper,we conduct the specific study on these two problems.For the first problem,this paper proposes a datadriven two-layer visual dictionary learning framework,which divides the visual dictionary into the attribute layer and detail layer.In the attribute layer,the latent attributes are automatically determined by the Bayesian nonparametric model,which not only can automatically determine the number of latent attributes but also can determine the complexity of the mixture model according to the data scale,which can alleviate the overfitting problem.For the second problem,this paper utilizes the pyramid BOF algorithm to preserve the approximate geometric spatial correspondence of each image.Benefiting from the study of these two issues,the framework proposed in this paper achieves high accuracy on the challenging fifteen scene categories dataset.3.The static nature of human actions such as “Phoning”,“Riding Horse” and “Running” drives us to study the static clues-based recognition methods.In this paper,a non-sequential convolutional neural network(NCNN)model is constructed for action recognition in still images.The advantages of the model include: 1)Adopt the pre-trained VGG16 to initialize the weight of the convolutional layer module under the guidance of the transfer learning.2)Use data augmentation to mitigate overfitting;use Global Average Pooling(GAP)to make the model lighter,thus aims to improve the generalization ability of the model.3)Design an endto-end structure for the Baseline CNN model and NCNN model,making it possible to train the proposed CNN models on a PC with single CPU.4)The NCNN model proposed in this paper has a non-sequential network topology,which enables the model to learn the spatial and channel features of parallel branches separately.Finally,this paper also proposes a model ensemble method,which integrates the predictions of the Baseline CNN model and the NCNN model with weighted coefficients to get the final prediction.4.High-weight differentially expressed genes(DEGs)have high discriminatory and biological importance.Therefore,based on high-weight DEGs,this paper constructs a corresponding binary classifier for each breast cancer subtype.The evaluation results of the binary classifier for each breast cancer subtype have validated the effectiveness of high-weight DEGs.In this study,we construct gene co-expression networks using high-weight DEGS for the control group and the experimental group of each subtype respectively and analyze the differences in the interaction mechanism between the control group and the experimental group.Based on this discovery,a novel pathway enrichment analysis method,pathway enrichment based on gene co-expression networks(PEGCN),is proposed to analyze the pathway enrichment at the level of whether gene co-expression is activated or inhibited.PEGCN will give a reasonable explanation for the biological function changes of breast cancer subtypes to a certain extent.5.Cancer staging classification based on gene expression data can explain the functional changes of different cancer stages by examing the change of gene expression values,thus can facilitate to discover and reveal the mechanism of cancer development and evolution.However,at present,the performance of cancer staging classification based on gene expression data is not satisfactory.One of the reasons may be that the information related to one-dimensional gene expression values lacks strong discriminatory power.Therefore,analyzing the interaction mechanism between genes will enrich the information of a single sample.In this paper,a significantly differential co-expression network(SDCN)is constructed for the control group and the experimental group,respectively.SDCN network is used to reveal the significant difference of the interaction network structure between the control group and the experimental group.In this paper,we propose the significantly differential co-expression network with sparsity(SDCNS)to extract sparse features,so as to enhance the effectiveness of cancer staging classification model.The validity of SDCN and SDCNS lies in that it extends the information of one-dimensional gene expression values to two-dimensional significantly differential coexpression gene pairs,thus it promotes the discriminability of features.Based on the SDCN structures with significant differences between the control group and the experimental group,this paper proposes an enrichment analysis method,pathway enrichment using co-expressed gene pairs(PEUCGP),which is used to perform generalized up-regulated and down-regulated co-expression enrichment analysis.Through the cancer staging framework proposed in this paper,the classification models of different cancer stages have been constructed,and the evolutionary mechanism of cancer has been effectively explained to some extent.
Keywords/Search Tags:Machine learning, Computer vision, Cancer bioinformatics, Model overfitting, Deep convolutional neural network
PDF Full Text Request
Related items