1. What is the major function of TransportTP?
TransportTP is an automatic transporter prediction system. It categorizes a input sequence with a TC family or superfamily (based on the third taxonomic level of the TC system), or rejects its as a potential transporter. TransportTP is able to run on a genome scale. TransportTP integrates BLAST search & HMM search on TCDB, and search other transporter-related database or other transporter-related features for a refining decision. Machine learning methods are to integrate the various transporter-related features.

2. What is the standard format for input?
The input sequences should be in fasta format. It can be input from local host, or input by user in an edit box. The file input from the local file system has more priority.

3. What are the options for prediction?
Besides the fasta input sequences, a few of options can be customized for the transporter prediction.
3.1. What are the sequence type supported by the system?
The system support both nucleotide sequences and amino acid sequences. For nucleotide sequences, they are simply translated into amino acid sequences by sixpack package before running the prediction pipeline.
3.2. What are the reference organisms?
TransportTP replies on a machine learning model to integrate different types of transporter-related features. The model should be trained on a well-studied model oragnism which is mostly close to the target organism. It can also be trained by several model organisms, for example, by several plant organisms. If no suitable organisms can be chosen as reference organisms, all model organisms can be chosen. The predictive performance difference will only be influenced slightly (<5%).
3.3. What is the e-value threshold?
TransportTP first search homologs from TCDB, then filter out the false positives based on a classification model. The initial e-value threshold determines the initial set to be refined. Strengthening threshold will lead to less predicted transporters. The e-value threshold should be a little bit larger than that used in traditional homology search because the machine learning methods can help pickuping the false positives from the rough initial predictions.

4. How can I simply test the function of the system?
You can test the functionality of the system by using a set of sample sequences we provided. Simply click the "sample input file" button in the home page and do not change other options. You can compare the results with the expected output to see whether the system is work properly or whether you have operated correctly.

The sample file is:


The sample output should be:
#sequence id scores
overall
/

blast
/

hmm
transporter family
id
/

size
/

name
family avg tms
/

deviation
/

protein tms
nearest transporter
/

sub family
5NN coincide
(%)
Pfam
domain ids
/

e-values
Pfam namesGO term
ids
/

e-values
GO namesSwissprot
top homolog
/

e-value
Swissprot namePhy treeconfidence
(%)
protein desc
1IMGA|AC124971_15.20E0

0E0

1.84E-115
1.A.1.

115

The Voltage-gated Ion Channel [VIC] Superfamily
8.47

5.75

6
gnl|TC-DB|Q38998

1.A.1.4.1
100PF00520; PF00023; PF00023; PF00023; PF00023; PF07885;

2e-038; 2e-033; 3e-013; 1e-005; 4e-012; 5e-019;
Ion_trans Ion transport protein; Ank Ankyrin repeat; Ank Ankyrin repeat; Ank Ankyrin repeat; Ank Ankyrin repeat; Ion_trans_2 Ion channel; GO:0005242

0.0
inward rectifier potassium channel activitysp|Q38998|AKT1_ARATH

1e-313
Potassium channel AKT1 OS=Arabidopsis thaliana GN=AKT1 PE=1 SV=2 100>IMGA|AC124971_15.2 Cyclic nucleotide-binding; Ion transport protein AC124971.15 59907-59019 E EGN_Mt060719 20060904 TIGR 2137.m00012
2IMGA|AC135797_3.21.76E-11

1.7E-8

1.82E-14
2.A.1.

312

The Major Facilitator Superfamily [MFS]
12

1.73

12
gnl|TC-DB|Q5A7S4

2.A.1.58.1
100





100>IMGA|AC135797_3.2 Protein of unknown function DUF895, eukaryotic AC135797.13 24893-24449 F EGN_Mt060719 20060904 TIGR 2284.m00003
                          
                          
                          
Page 1 of 1
page out of 1




5. What are the format of the output?
The output is displayed in a table that may consist of multiple pages. The column of the table is as follows, according to the left to right order,
ColumnImplication
sequence id the short name of the sequence, taken by the string between start character '>' and the first space
score overall the overall homology score with respect to TCDB, it is a weighted score of BLAST search and HMM search
blast the best blast score of the sequence to transporters in the predicted transporter family
hmm the HMM score of the sequence to the predicted transporter family
transporter family id the tc_id of the predicted transporter family
transporter family size the number of transporters in the predicted transporter family
transporter family name the functional description of the predicted transporter family
family avg tms the average transmembrane segment (TMS) of the predicted transporter family
family deviation the standard deviation of the TMS in the predicted transporter family
protein tms the number of TMS in the protein/input sequence
nearest transporter the homology in TCDB with the best weighted score (overall score)
sub family the tc id of the nearest transporter of the predicted sequence
5NN coincide (%) the proportion of top-5 homolog in TCDB with the same TC family as the best homolog
Pfam domain ids the set of transporter-related pfam domain id for the predicted sequence
Pfam e-values the e-values of the occurred transporter-related Pfam domains
Pfam names the functional description of the occurred transporter-related Pfam domains
GO term ids the GO terms hit by the predicted sequence
GO e-values the e-value of the hit GO terms by the predicted sequence
GO names the functional description of the hit GO terms by the predicted sequence
Swissprot top homolog the best homolog in Swissprot database for the predicted sequence
Swissprot top e-value the smallest e-value of the predicted sequence in the Swissprot database
Confidence (%) the prediction confidence by the refining classifier, scaled by percentage
Protein desc the original functional description in the input sequence

6. Can I save the prediction results?
Yes. You can click the 'Save' button. The file can be saved as tab-delimited and opened in Microsoft Excel.

7. Can I download the predicted transporter sequences? If yes, how?
Yes. You can download the sequences of the predicted transporters by click the 'Transporter sequence' button. The file is in fasta format.

8. Are there some tools to help me browse or verify the predicted transporters?
There are a couple of tools to facilitate your browse and verification of the predicted results, such as: Multiple-page display: the results may contain thousands of putative transporters and Sort on particular column: the predicted transporters can be sorted in most columns. Filter the results: users may focus on specific targets by setting constraints on particular column. Cross-link of evidence: users may click the link in each predicted sequence to check the corresponding evidence in transporter-related database.

9. How do I know the complete time of my job? How long will you keep my results?
When the task is started or completed, a link will be sent to the user. You may check the status of your task by the link. If it is completed, you can see the results page rather than running progress. You may download the results in one week. After that, you may lose the access of your results.

10. Is my data and results private and secure? Please keep the link, mainly contained your sessionid, cautiously. Anyone knows your id may read your predicted results within the non-expired period. We will not use your data in any publication. But, we reserve the right to keep your data in our database in case you may check with us later.

11. What is the size limitation of my input? Usually, we set a size limit for uploaded sequences, for the performance purpose. If you run on very large genome, you may contact us directly at bioinfo@noble.org.