Font Size: a A A

Soybean Gene Expression Data Analysis Based On Overlapping Community Detection Method

Posted on:2018-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z LiFull Text:PDF
GTID:2323330515978438Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of gene microarray technology and RNA-Seq technology,a huge volume of gene expression data of many species have been accessed in recent years.Gene expression data reflect the gene transcription level of cells at a given time and contain the cell activity information in different environments.Soybean is an important crop,some scholars have done a lot of researches about soybean using microarray technology,got a large number of valuable gene expression profile data.Analysis of the biological information contained in soybean gene expression data is of great importance for the study of soybean disease resistance and improvement of crop varieties.There are many common gene expression data analysis methods such as differential expressed gene analysis,classification and clustering method.Clustering method is one kind of unsupervised learning algorithm.It is widely used in the analysis of gene expression data.We can do some exploratory analysis about gene expression data using clustering method.Genes often form some community structures through interactions to carry out biological functions.Genes with such structures are called co-expressed genes and it is important to find them.In recent years,many works on the complex networks community structure finding have been proposed.By calculating the similarities between genes,we can construct a gene network and convert the clustering problem into community discovery problem.Studies have shown that a gene is often involved in more than one biological function,different types of co-expressed genes overlap each other.Traditional clustering methods such as k-means and hierarchical clustering can't find this overlapping structure,fuzzy clustering algorithm can deal with this problem,but it's too hard to set its parameters.We can use overlapping community discovery algorithms to deal this problem.Speak Easy is one of the typical overlapping community discovery algorithms,which is a label propagation algorithm adopting one top-down and bottom-up strategy.A node is divided using not only the local sub-information but also the overall network structure information.SpeakEasy has several advantages: can automatically predict the number of communities,do not need to manually set the parameters,algorithm running fast.However,in the course of the experiment,we find that the Speak Easy algorithm is defective when recognizing overlapping nodes.We have improved the algorithm and proved the effectiveness of our improved algorithm by experiment.In this paper,we selected the gene expression data about soybean rust in the GEO database under the GPL4592 platform.First,according to the general gene expression data analysis process,7971 differentially expressed genes were screened.Secondly,we measure the similarity between genes using Pearson correlation coefficient and construct a weighted network G(V,E)of soybean genes.After that,we implemented Speak Easy method and used it to partition the graph.Then,we made enrichment analysis of every community with DAVID tools.We found some meaningful information about soybean rust,genes in community S3 regulate the biosynthesis of flavonoids and flavonoids can strengthen soybean disease resistance.Genes in community S2 regulates defense and stress response of soybean.Comparing the results of our analysis with the existing literature,we get to know the pathology of soybean rust.We also found that under the influence of rust,soybean cells will make some defense response,such as flavonoids and aromatic compounds content increased which will strengthen the cell wall.To sum up,there are three contributions in this paper.Firstly,we preprocessed the data and found out the differential expressed gene set;Secondly,we improved the Speak Easy overlapping community detection method,and then partition the differential expressed genes using the improved algorithm;Finally,the DAVID method was used to make enrichment analysis of the community detection result,and KEGG mapping and GO analysis were performed on overlapping genes or gene sets.Our study in this paper is helpful to understand the mechanism of soybean pathogen,and to further analyze the defense response of soybean under rust stress,and also contribute to the study of soybean disease resistance.
Keywords/Search Tags:Gene expression data, Differential expression, Complex networks, Overlapping communities, Functional enrichment analysis
PDF Full Text Request
Related items