Font Size: a A A

Gene Storage Method And Topological Data Analysis For Breast Cancer Data

Posted on:2020-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2404330611999584Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Breast cancer is a malignant tumor that occurs in the epithelial tissues of the breast gland.The incidence of breast cancer in the national tumor registration area ranks first among female malignant tumors.The storage and prediction of breast cancer information are of great significance.mRNA and mammography imaging can be used for early diagnosis of breast cancer.This paper has done a complete process for the storage,factor screening,and classification of breast tissue data.Based on the mRNA expression level data of breast cancer tissues and normal tissues and mammography images of mammary glands,mammary gland tissue information was stored in test tubes as genetic information.The digital information is converted into a ternary gene code,and the long-chain gene is divided into gene fragments end-to-end,and front primers,rear primers,and error correction positions are added.Considering the"insecurity" of an information storage test tube,this article adopts a distributed storage method.The information is stored in several test tubes,and a bit of information in each test tube is proposed according to the method of discarding information with congruence,so that the original information can be restored only when all the test tubes are present.After adding a certain amount of artificial disturbance,the gene sequence in the gene pool can be compared to the original information one by one.Through computer simulation,it is found that the error rate is very low,the robustness is high and the security is strong.The topological data analysis was performed using the open source data set of breast tissue—mRNA expression data from different breast tissues,and the linear discriminant method was used for dimensionality reduction.Construct a simple complex of 1133-dimensional mRNA data and its filtering simple complex,calculate its boundary operator to find all edges of the filtered simple complex,and calculate the difference between the ranks of the simplified simplified boundary operator to obtain the Betti number and its Topological characteristics.Compare the Betti number and topological characteristics of cancer tissues and normal tissues,look for homology groups with persistent homology barcodes greater than specific parameters for cancer tissues,find a total of 53 mRNAs corresponding to the above several homology groups,and find that 43 can be supported by the literature.The accuracy rate is as high as 81.13%.Based on the selected topological target data,classify and apply breast tissue.Based on decision tree method,kNN method,random forest method,Adaboost method and GBDT method for breast tissue classification,the accuracy rates obtained are 0.975,0.75,1.0,1.0,1.0 respectively.It can be found that the random forest method,Adaboost method,and GBDT method have the best classification effect on the mRNA expression dataset.However,the deep neural network algorithm can not effectively extract the characteristics of one-dimensional data,so the classification effect is not significant.In this paper,medical data is stored in a genetic manner,and disease targets are obtained through persistent homology screening.The paper provides a basis for early diagnosis.
Keywords/Search Tags:gene storage, topological data analysis, persistent homology, breast cancer
PDF Full Text Request
Related items