With fast growing available biological molecular data, many tools of automatic extracting biological information from many specialized databases get deeply developing. In this paper, we show how to construct a pipeline as a tool (TTSI) to extract new transporter substrate interactions (TSI) for transporter substrate database (TSdb) from UniProt.We describe a maximum-entropy classifier that is the most important component of TTSI trained on records of UniProt, that achieves high precision and recall in cross-validation experiments of extracting sentences which express transporting relationships, and the tool also can quickly map the compound names in these sentences into compound IDs set of KEGG Ligand Compound. The experiments of extracting human TSI data from human transporter annotations of UniProt show that the tool allows human experts to examine only about 1% sentences in the annotations and new TSIs is 68.63% of extracted data when comparing to other special transporter databases. TTSI can greatly cut down data that was submitted to human experts, and many new TSI data can be extracted, which are powerfully supplied for TSdb and can assist biological experts in designing experiments, systematic analyzing transporter substrate system, and locating transporters in metabolic pathways. The methods we describe are flexible and general, so they can be applied easily to other specialized databases. |