Font Size: a A A

Research On Biclustering Algorithm And Its Application In Gene Expression Data Analysis

Posted on:2020-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:H T YangFull Text:PDF
GTID:2370330575977691Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The traditional clustering methods obtain gene sets with high similarity under all experimental conditions.For real gene expression data,it is more meaningful to extract sub-matrices with high similarity under some experimental conditions,because when all experimental conditions are considered,some genes that are similar will not be in the same set because the similarity in these genes can not reach the threshold.Moreover,in real biological activities,genes are not regulated under all conditions and they can participate in many biological activities.So it is more practical and significative to obtain local information in real data.In view of these shortcomings of traditional clustering methods,the biclustering analysis method is proposed.The main research work of this paper is to improve biclustering algorithms,then apply the improved biclustering algorithm to gene expression data to get trendpreserving biclusterings,and finally provide information and knowledge from the biclusterings to help biological research.The novel biclustering algorithm called YUNIBIC is proposed in this paper.The steps of YUNIBIC are as follows: firstly,the corresponding index matrix is constructed according to the original matrix,and the longest common subsequence algorithm is used to select suitable genes between the row pairs of the index matrix to form biclustering seeds;secondly,the related rows and columns are added to expand the seeds.In this paper,the behavior matrix is constructed to helping to expand the biclustering seeds.In addition,rows with negative correlation with biclusterings can also be added.Finally,Spearman's Biclustering Measure(SBM)is used as the evaluation function of biclustering to filter the final biclusterings.In addition,in the face of large-scale matrix data such as gene expression data,preprocessing operations are needed before the algorithm runs.The data pre-processing stage in this paper is divided into data separation and data reassignment.In the data reassignment,a new grouping method is proposed,which takes into account the similarity between the values of data and the order of data.Elements in the same group are assigned the same number values of sets.In this paper,YUNIBIC is compared with two algorithms: UNIBIC and QUBIC on simulated data and real data.YUNIBIC retains all significant length seeds in the process of seed selection.This strategy makes YUNIBIC effectively improve the problem of UNIBIC in neglecting narrow biclusterings with fewer columns in identifying trend-preserving biclusterings.From the results of simulation experiments,the value of recovery and correlation of YUNIBIC's biclustering are significantly higher than the other two algorithms,which shows that YUNIBIC can accurately identify the embedded trend-preserving biclusterings.Moreover,the GO enrichment of YUNIBIC is the highest in real data,which indicates that YUNIBIC can not only effectively group the original data into different biological activities,but also these genes are enrich in the GO items of biological activities.At the end of this paper,a YUNIBIC system is designed and implemented.The system is implemented by GTK + and designed with Glade.Through this system,the parameters of YUNIBIC can be set,YUNIBIC can be run,and the results of YUNIBIC can be displayed and analyzed.This paper mainly studies the biclustering algorithms.Through improvement and application of the biclustering algorithm to gene expression data,Yunibic has follwing advantages:(a)We propose a strategy in data reassignment of preprocessing,our strategy based on position and value between elements,which allows a subset of adjacent column labels in a sequence to be grouped as an order equivalent group if their values are similar.(b)Compared with the method of constructing expression matrix proposed by Ayadi,W et al.,we only focus on the columns which are related to biclusters rather than all columns.It can increase the accuracy of finding trendpreserving biclusters.(c)The use of SBM can select accurately final biclusters with positive and negative patterns from a number of candidate biclusters.
Keywords/Search Tags:biclustering algorithm, gene expression data, enrichment analysis
PDF Full Text Request
Related items