Font Size: a A A

Research On Gene Expression Level Prediction Method Based On High-throughput Sequencing Data

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:H J ShiFull Text:PDF
GTID:2370330620476904Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Histone modification is a ubiquitous phenomenon in organisms,which can affect gene expression in different ways of regulation.With the rapid development of high-throughput sequencing technology,a large amount of sequencing data makes it possible to explore the internal connection between histone modification signals and gene expression levels.Histone modification affects gene expression by changing the spatial structure of DNA or providing a specific binding surface.Studying histone modification is essential for understanding the expression of genetic material.Based on statistical methods,this article did some research by taking ten histone modification data and gene expression data of human GM12878 cell line as the object,the contents of this research mainly including four aspects as follows:(1)According to the gene locus information,locate and extract the feature information of the gene-specific locus in the histone modification data,and construct the design matrix.We use 100 bp as a feature extraction length and combine with the location information of genes on the chromosome to extract the histone modification feature value of 4k bp before the transcription start site and 4k bp after the transcription termination site of each gene.Then we use the histone modification features data and gene expression data for further analysis and processing.(2)With the correlation analysis method and hierarchical clustering methods,we analyze the relationship between histone modifications and gene expression to find potential associations of histone modifications.We first conduct the correlation analysis of histone modification features to find the strong correlation between histone modifications.Then we analyze the correlation between histone modification features and gene expression to find strong correlation variables.Finally,through the hierarchical clustering,we find the histone modification signals with similar features,and further clarify the combination patterns of these ten histone modifications.(3)We build a support vector machine for the classification of gene expression level.Combined with the correlation analysis between histone modifications,a single feature of histone modification to gene expression level prediction model is constructed and the performance of the model is evaluated.To further optimize the classification effect of the model,a more comprehensive classification model is constructed.(4)Because of the generalized linear model and the master-slave model,a high-precision gene expression value prediction model is built.Combining the characteristics of zero inflation in response variable data,we propose a master-slave model based on generalized linear model,which perform regression analysis on gene expression values.Finally,we compare it with existing multiple regression algorithms to verify the effectiveness of proposed method.Based on the histone modification characteristics data and gene expression data of GM12878 cell line,we analyze the potential connection between the two sets of data,and focus on the construction of the gene expression level prediction model.We explore ten histone modifications features strongly related to the gene expression,and construct an efficient classifier to classify the gene expression level.Moreover,considering the characteristics of response variables,a high-precision regression model with gene expression level values is proposed,which is of great help to researchers to further clarify the role of histone modification in regulating gene expression.
Keywords/Search Tags:Histone Modification, Gene Expression, Hierarchical Clustering, Support Vector Machine, Generalized Linear Model
PDF Full Text Request
Related items