Research And Implementation Of Text Mining For Transcription Regulatory Information

Posted on:2010-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:Q Yang

Full Text:PDF

GTID:2178360275491628

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Biology data about regulatory mechanism of eukaryotes organism is increasing day by day.Transcription Factor is a kind of special protein,it regulates gene expression through binding the cis-regulatory elements which usually locate at the upstream of gene.Now,large number of transcription factor and cis-regulatory elements information is stored in documents.How to mining or extracting such kind of useful information is a big challenge in front of us.Man always extract such information by slowly reading instead of with the assistant of computer.To help biology experts,two main algorithms are proposed and implemented in this paper.The first algorithm proposed is used to mine text sentences in biology documents describing cis-regulatory elements.The paper extends the vector space model in traditional information retrieve system by adding two-word phrase dimension and part of speech information.With the trained text data,we train the system model which describes the cis-regulatory elements sentence context information.Given a text sentence,first algorithm will translate it into a extend vector space model and compare it with the trained system model.With the help of sentence similarity function,the sentence will be viewed as the the target when the score between system model and sentence bigger than given threshold.The second algorithm extracts more concrete information including transcription factor and binding site text segments.With the given trained data,algorithm constructs a context free grammar and use Earley algorithm to analyse the sentence structures,After extracting the noun phrases,verb phrases,algorithm builds the knowledge data base.Each text sentence to be analysed will be splited into several noun phrases and verb phrases,and those phrases will be compared with the knowledge data base.Only those noun phrases matched in the knowledge data base will be seemed as candidates.The two algorithms are implemented with Java language.Recall and precision are above 60%in corresponding experiments.

Keywords/Search Tags:

Transcription factor, Cis-regulatory elements, text mining, data mining, Bioinformatics

PDF Full Text Request

Related items

1	Research And Implementation Of Transcription Regulatory Sequences Data Mining
2	Applications Of Data Mining Techniques To Text Classification And Bioinformatics
3	Research On Co-regulated Gene Mining Algorithms
4	Based Support Vector Machine Prediction Of Regulatory Networks Within The Genome-wide Study
5	Application Of Artificial Neural Network In Research In Bioinformatics
6	The Application Of Factor Space Theory In Text Mining
7	Key Techniques Of Text Ming On Criminal Cases
8	Relational clustering and its applications in text mining and bioinformatics
9	Text Data Mining For Applied Research In Information Monitoring
10	Applications Of Data Mining For The Competitive Intelligence System In The Enterprise