Font Size: a A A

Data Mining Algorithms For Tal Effector Targets Prediction Based On Weighted Matrix

Posted on:2015-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:K L LiFull Text:PDF
GTID:2298330431489799Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Transcription activator-like (TAL) effectors compose of a collection of specific type â…¢ effectors that are secreted by plant pathogenic bacteria in the genus Xanthomonas. They play important roles in pathogens-host interactions. TAL effector targets directly reflect the pathogenicity and avirulence of pathogens. Thus, identifying TAL effector targets has a direct impact on finding susceptibility genes and resistance genes, and contributes to reveal the process of long-term interactions between pathogens-host and co-evolution. However, bioinformatics applications, especially the target prediction of these effectors, greatly facilitate the target identification. As a result, it is essential and valuable to develop data mining algorithms for finding TAL effector targets.This paper makes a deep investigation of TAL effectors, and proposes two algorithms, namely TargetMinerA and TargetMinerB, for predicting TAL effector targets. TargetMinerA is to model RVD binding specificity. It generates the weighted matrix of RVD binding specificity for given RVD sequences by constructing probability matrix of RVD specificity, and designs a novel function to score all possible target sites. TargetMinerB is another algorithm for prediction of TAL effector targets according to both weighted matrix of RVD binding specificity and RVD efficiency. It quantifies RVD efficiency in terms of their strengths, and designs a novel function to score all possible target sites as well.To evaluate the performances of TargetMinerA and TargetMinerB, these two algorithms are parallel implemented by MATLAB programming language. To test the effectiveness of algorithms, the known target sites of TAL effectors are divided into initial training set and test data set. The former is used to evaluate the parameters of algorithms, and the latter is applied for target prediction, respectively. The results suggest that the proposed algorithms can find all known target sites of those TAL effectors in test set. Also, the method to determine the scan threshold is discussed, and the produced scan threshold is able to guarantee good prediction performance while scanning and scoring the known TAL effector-DNA interactions. On the whole, it is observed that the presented algorithms in this paper generate better rankings for the known target sites than other existing algorithms by comparing the rankings of known TAL effector target sites in the genome. To obtain more reliable predictions, gene expression data is combined with those possible targets produced by our algorithms to predict TAL effector targets. The results show that our methods can predict not only known targets but also novel candidate targets. Moreover, the positional preference of candidate TAL effector target sites relative to the transcription start site is discussed and analyzed. The results suggest that the number of candidate target sites of TAL effectors is significantly enriched at approximately25bp upstream of the transcription start site.
Keywords/Search Tags:TAL effectors, weighted matrix, target prediction, scorefunction
PDF Full Text Request
Related items