Research On Time-Series Data Of Expression Profiles Based On Dual Eigen Analysis Method

Posted on:2023-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Wang

Full Text:PDF

GTID:2530306617968769

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

Type 2 diabetes(T2D)is a complex chronic metabolic disease,involving multiple genes,usually associated with obesity.Its pathological state is characterized by insulin resistance and pancreatic dysfunction,and the body loses its metabolic fuel homeostasis.Previous studies mainly focused on the study of the disease at the RNA level,and most of the commonly used research methods have limitations.In this paper,we introduce a new method to analyze the expression data matrix,and analyze the time-series data of type 2 diabetes proteomic profiles,which is a practical research topic.Expression profile data includes mRNA expression data and proteomic expression data.The former focuses on cell function and the latter focuses on structure and function.Proteomic profiles is a technology to obtain proteomic expression level,In the past,the algorithms for processing expression data matrix mainly include K mean clustering algorithm in unsupervised algorithm,support vector machine method in supervised algorithm and so on.However,the clustering algorithm needs to scan the whole data set many times,and the selection of the initial centroid is blind,which leads to the inaccurate clustering results.Supervised learning algorithms usually need a large number of data information of labeled samples,which requires a large number of cyclic processes to identify the data points of all unlabeled samples.This leads to low efficiency in processing high-throughput data.In order to improve the robustness of the experimental method and solve the problem of large expression profiles data and more biological noise,a dual eigen analysis method based on singular value decomposition is introduced in this paper.This method makes a theoretical innovation on the original algorithm.Whether it is high-dimensional arrays such as RNA sequencing data or proteomic profiles data,this method can be used to directly reduce the dimension and extract the "principal components" at the same time,which can improve the efficiency of data processing.This method first performs robust singular value decomposition on the quantile normalized time-series data,and then sorts the sample eigenvector and gene eigenvector of each dimension respectively to obtain the polarized eigenvector group of all dimensions.Each group includes polarized gene eigenvector and polarized sample eigenvector.By observing the bipolar load of each sample eigenvector,the law of experimental variables(sample types and age)is obtained.Then,enrichment analysis based on Wilcoxon rank sum test was performed on the positive and negative genes of the corresponding polarized gene eigenvector,and finally the gene pathway with biological significance is obtained,which reveals the relationship between the macro factors of the sample and gene microbiology.In this paper,dual eigen analysis method is used to process a yeast cell cycle data matrix to obtain the corresponding relationship between each period of cell cycle and sample eigenvector.This discovery has important scientific significance.Then,compared with the experimental results of alpha factor obtained by the original method,cluster analysis,we get the conclusion that the dual eigen analysis method is better.Finally,in the example demonstration part,this paper uses the islet proteomic data set isolated from sick GK rats and control WST rats.The data set is measured by proteomic method based on TMT labeling,and the measurement time is five consecutive time points.In this paper,the data set was processed by dual eigen analysis method.Through further analysis of the time process,the differentially expressed genes of GK rats and the biological process represented by the gene subset were obtained,and the pancreatic cancer model was constructed.A comprehensive framework of island dysfunction mechanisms,which can help humans develop potential measures to prevent T2D.

Keywords/Search Tags:

Expression Profiles Data Research, Data Preprocessing, Dual Eigen Analysis Method, Singular Value Decomposition, Enrichment Analysis

PDF Full Text Request

Related items

1	Research On Gene Expression Data Analysis Method And Its Application
2	Research On Regression-based Sparse Matrix Decomposition Method And Its Application In Sequencing Data
3	Research On Seismic Data Denoising Method Based On Matrix Completion And Optimization Theory
4	Seismic Data Reconstruction Based On Multichannel Singular Spectrum Analysis Method
5	Research On The Method Of Prognostic Survival Analysis From Data With High Dimension And Small Sample Size
6	Research On EQTL Data Processing And Functional Analysis Methods
7	Research On GPS Data Pre-Processing Method Based On Wavelet Analysis On RINEX Level And Its Program Realization
8	Non - Negative Matrix Decomposition And Its Application In Gene Expression Data Analysis
9	Research On Method Of 3D Visualization Based On Logging Data
10	Research On Multi-level Preprocessing Method For Complex Fault Trees