| With the emergence of large-scale parallel sequencing,the generation of genomewide expression data has reached an unprecedented level.These abundant data have greatly promoted the research of genes.It has become an important research field to establish gene coexpression network by using system biology methods to reveal the relationship between genes at the system level.By constructing gene coexpression network,we can better understand gene function,biological process and complex disease mechanism.In essence,coexpression network analysis has been widely used to understand which genes participate in cell life activities.Gene coexpression network uses nodes to represent genes,and the connection between genes represents the intensity of coexpression of two genes.In this paper,maize genome was selected as the research object,and the gene coexpression network was constructed based on Maize RNA-seq data.The main work of this paper is as follows:1.In this paper,a new data standardization method is proposed.Logarithmic processing is introduced into gene expression data processing to transform the product relationship between gene molecules into linear relationship,and then the center alignment of the processed data is carried out,so that the RNA expression level of different samples of different genes can be measured on the same scale.2.This paper proposes a new statistical model of gene coexpression network,which is divided into two steps: gene similarity measurement and p-value comparison table generation.Gene similarity is calculated by replacing the expression level of genes with the ordinal number of sequencing,and the p-value comparison table is calculated by randomly generating rank sequence.3.This paper proposes the r-order Pearson correlation coefficient and the correlation coefficient based on the degree of difference.Through the sensitivity,specificity,AUROC and other indicators,it is verified that the correlation coefficient based on the difference degree is better than the r-order Pearson correlation coefficient when constructing the gene co-expression network,and the r-order Pearson correlation coefficient is better than the traditional Pearson correlation coefficient.4.This paper proposes two algorithms,brute force relaxation and queue-based relaxation,to remove the indirect correlation of genes.Hierarchical clustering is performed on the updated correlation coefficient matrix,and the GO enrichment analysis shows that the gene clustered modules have the characteristics of functional enrichment. |