Font Size: a A A

Research On Cancer Subtype Clustering Algorithm Of Gene Expression Profile Data

Posted on:2021-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:L X WeiFull Text:PDF
GTID:2404330605460943Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Because of the rapid development of high-throughput gene sequencing technology,large-scale gene expression profile data has been generated.Combining data mining technology to obtain biological knowledge has become a research hotspot in precision medicine.The original cancer gene expression profile data has the characteristics of high dimensionality,distribution imbalance,redundant data and complex data structures,which lead to increasing calculation cost and inaccurate clustering results during a large number of high-dimensionality clustering analysis.This is one exploration whether existing cancer subtypes and a deviation result of molecular markers.According to the characteristics of gene expression profile data,we consider the advantages and disadvantages of various clustering algorithms.In this paper,it is mainly focus on constructing the framework of cancer-subtype clustering analysis for data preprocessing,feature selection,clustering methods and evaluation indexes of clustering results.This paper will concentrate on two subtype-clustering algorithm models for gene expression profile data.Aiming at density peak clustering algorithm(DPC),it is difficult to accurately select cluster centres or multiple high-density points in gene expression profile data.It proposed an improved aggregation density peak clustering algorithm(IA-DPC).Firstly,the enhanced aggregation method was constructed as an evaluation function of node importance and calculated the local significance of each node.Secondly,we could sort node importance and select an extremely value that node importance producting node distance as a cluster centre in the same cluster.Finally,compared with DPC and ADPC-KNN algorithms,experimental simulations showed that the proposed algorithm can find cluster centres with higher accuracy and improve the accuracy of subtypes clustering.Aiming at the fact that traditional clustering methods cannot obtain reasonable biological explanations for cancer subtypes clustering,and cannot explain inter-genic interaction.It proposed a consensus clustering with Davies-Bouldin index algorithm(CC-DBI).This method uses conjunction with resampling techniques to select subsets.Then,constructing consistent matrices for subsets and using resampling method across multiple runs to achieve the consistency of the clustering results.It can reflect the sample points affinity and overcome the influence of stochastic factors,and then visualize the final clustering results.Using the DBI index to evaluate clustering results selects the best results.In summary,the thesis experimentalized on 8 groups cancer expression profile data which are used two clustering algorithm models that is IA-DCP algorithm and CC-DBI algorithm.According to feature selection based on differential gene expression leads intoevaluation indexes to reflect the quality of the clustering results.This verifies that methods proposed in the dissertation are reasonable and effective for identifying new cancer subtypes.Concurrently,it also proves the advantages of IA-DPC algorithm and CC-DBI algorithm in cancer subtypes clustering.
Keywords/Search Tags:Cancer Subtypes, Density Peak Clustering, Consensus Clustering, Differential Gene Expression, Gene Expression Profile Data
PDF Full Text Request
Related items