Font Size: a A A

Extracting Transporter Substrate Information From Semi-structure Text

Posted on:2012-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ChenFull Text:PDF
GTID:2218330362953601Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With fast growing available biological molecular data, many tools of automatic extracting biological information from many specialized databases get deeply developing. In this paper, we show how to construct a pipeline as a tool (TTSI) to extract new transporter substrate interactions (TSI) for transporter substrate database (TSdb) from UniProt.We describe a maximum-entropy classifier that is the most important component of TTSI trained on records of UniProt, that achieves high precision and recall in cross-validation experiments of extracting sentences which express transporting relationships, and the tool also can quickly map the compound names in these sentences into compound IDs set of KEGG Ligand Compound. The experiments of extracting human TSI data from human transporter annotations of UniProt show that the tool allows human experts to examine only about 1% sentences in the annotations and new TSIs is 68.63% of extracted data when comparing to other special transporter databases. TTSI can greatly cut down data that was submitted to human experts, and many new TSI data can be extracted, which are powerfully supplied for TSdb and can assist biological experts in designing experiments, systematic analyzing transporter substrate system, and locating transporters in metabolic pathways. The methods we describe are flexible and general, so they can be applied easily to other specialized databases.
Keywords/Search Tags:transporter, substrate, maximum-entropy, classifier, extracting-information
PDF Full Text Request
Related items