TransportTP is a two-phase decision system for transporter prediction
and categorization on a genome scale.
TransportTP utilized a two-phase framework. In the first phase,
a nearest neighbor classification was conducted from a curated
transporter database, TCDB. This phase integrated BLAST search
and HMM modeling and categorized query proteins as either non-transporters
or with initial transporter families. In the second phase, the evidences
associated with an unknown protein were collected and integrated to either
reinforce or deny the initial predictions, targeting the improvement of
prediction accuracy. These evidences included their transmembrane segments,
their K-nearest neighbors in the TCDB, their transporter-related Pfam domains
and GO terms, and their nearest proteins in the SWISSPROT. All the evidences,
together with their original BLAST and HMM scores generated in the initial predictions,
were converted into classification features and put into a decision
system, specifically decision trees. With the decision system, the initial
predictions were either reinforced or removed according to the feature values.
The performance of TransportTP was benchmarked. A decision tree was firstly
trained on yeast genome and then tested it on other six model genomes and four
non-model genomes, ranging from archaean to eukaryotic genomes. A preliminary
literature-based validation has cross-validated 82.3% of our predictions with
which overlapped with 80.4% of the predicted transporters in TransportDB,
a putative transporter database for hundreds of genomes. Comparing with our
previous approach which only applied BLAST search, HMM modeling and TMS filtering
[ Li et al, Bioinformatics, 2008], the new two-phase system increased more
than 15% of validation rate but maintained the similar overlap rate with
TransportDB. It also outperformed other alternative approaches, such as
suffix trees and Support Vector Machines (SVMs).