Font Size: a A A

Research On The Algorithm Of Breast Cancer Gene Recognition Based On Co-Change Network

Posted on:2021-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y FengFull Text:PDF
GTID:2370330605472973Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Breast cancer is a kind of malignant tumor,its threat to women's health is increasing year by year,and the incidence of breast cancer has a younger trend in women,especially in China,the incidence of breast cancer is increasing significantly.The prediction of pathogenic genes of breast cancer is an important problem to be solved.At present,the analysis of pathogenic genes of breast can cer mostly starts from single gene features,which can not meet the needs of the times.In order to increase the accuracy of gene recognition,this paper uses three aspects of characteristic data of breast cancer,including: copy number of breast cancer,methylation,Single-level differential genes is preferentially screens,and then constructs a breast cancer gene co-change network to increase the interaction between genes to predict breast cancer-related cancer genes.This paper mainly completes the following work:Firstly,for breast cancer genomics data acquisition and processing.Including: gene copy number data,methylation data,gene expression data,protein interaction data and clinical information data.Secondly,the single feature of breast cancer,namely gene expression feature,copy number feature and methylation feature,are analyzed respectively,and the differential genes are extracted to lay the foundation for the subsequent gene joint analysis.Then,the data of three characteristics of genes are integrated for joint analysis.Using protein interaction data to construct human protein interaction network as the basic network of experiments.Screening nodes related to breast cancer gene in protein interaction network by using breast cancer gene data,and the typical correlation analysis algorithm is used to calculate the correlation coefficients between the different genes.The correlation coefficients assign the weights to the corresponding edges to construct Breast Cancer Gene Alteration Network.Finally,the random walk with restart method is used to extract cancer related genes from breast cancer.The known cancer gene of breast cancer is downloaded as seed gene by using Cosmic database,and the seed gene is used as the starting node for random walk.The walk probability of each subsequent node is sequenced,and the nodes with high ranking probability are gene nodes with high correlation with seed gene.Finally,enrichment analysis and survival analysis are carried out for the selected seed genes,and the accuracy is verified by combining with previous studies.The related genes of breast cancer are predicted by the breast cancer gene co-change network.The predicted results include the certified ERBB2 and GATA3 breast cancer genes.Through the survival analysis and GO enrichment analysis,it is found that ID4 and NTHL1 genes are likely to be the key genes for the occurrence and development of breast cancer,and the accuracy of the algorithm is verified.
Keywords/Search Tags:Multi-group data, Breast cancer, Canonical correlation analysis algorithm, Co-change network, Random walk algorithm
PDF Full Text Request
Related items