Search phased small RNA clusters:

This is the core function of pssRNAMiner. In this section, the server clusters the user-submitted small RNA sequences based on its phasing character on the mapped transcript. The output contains all the details about phased small RNA clusters as well as their mapped transcript (TAS candidate). The server will perform this function after the users submit small RNA sequences and specify interested transcript/genomic library and mapping ambiguity factor.

Identify phase-initiator for detected small RNA clusters:

This is an optional function. Users enter phase-initiators of interest in FASTA to let server perform the function, otherwise the server will only Search phased small RNA clusters after submission. Typical phase-initiators are miRNA mature sequences and less phase-initiators are ta-siRNAs. The function in this section relies on the result from the prior Search phased small RNA clusters.     To perform this function, the server aligns the input phase-initiators (microRNA/ta-siRNA) and specified transcript libraries to search complementary regions. Subsequently, the server will identify valid cleavage sites from the complementary regions based on user-predefined cleavage site and its distance to the phased small RNA clusters on the transcript (TAS candidate). The obtained cleavage sites are supposed to link the upstream phase-initiator and the downstream phased small RNA clusters in cascade.

Upload small RNA sequence(s) in simple format or FASTA format:

The users upload small RNA sequences of interest for mining ta-siRNA cluster. Before analysis, our pipeline will check and process the uploaded sequences based on below requirements:

  • The sequences should be in FASTA format or simple sequence file.
  • Users should submit at least 10 small RNA sequences for analysis. Since the clustering function is based on the P-value, users are expected to upload as many as possible sequences for reliable statistical analysis. Our experience suggests that at least 100 distinct sequences are sufficient for significant statistical analysis.
  • A valid sequence ID should not include special characters, such as ! % & " ' $ ` \ ^ * , which are unable to normally work with query function. Thus, all of these special characters will be automatically converted into "_" prior to analysis.
  • If input FASTA format, please avoid too long sequences ID. For example, a ID longer than 50 letters may make the web display a mess.
  • Valid sequences should be between 17-28 nt in length.
  • Only 'ATCGU' are valid sequence letters, otherwise the sequence will be quitted.
After the job complete, users may download the processed sequences file and LOG file from Session Summary of list view of clusters if the pipeline has ever detected any invalid sequence or format.

To help users obtain correct sequences content before submission, we strongly recommend the FASTA FORMAT MAKER for quick format conversion. The users could upload their small RNAs and then download the sequences in correct format.

Mapping ambiguity (maximum hits on transcript/genomic library):

Some small RNAs hit on the transcript/genomic sequences too many times, which makes the algorithm difficult to locate them on genome. Our experience indicates that the small RNAs sequenced from ribosomal RNA and transponsons have more hits on genome. Thus, we ignore these small RNAs with too many hits due to mapping ambiguity. The option allows users set a threshold to ignore those small RNAs with hits beyond the number.

Maximum offset from phased position (+/- nt):

The option allow users to set a maximum offset from phased position. Valid offset should be within 0-3 nt. A small RNA will be considered as a phased small RNA if it is mapped within a phased-position +/- offset; And the phased-position will also be considered as 'having hit'.

Candidate phase-initiators in FASTA format (miRNA/ta-siRNA):

If the users enter phase-initiator sequences here in FASTA format, the server will proceed to analyze potential cleavage site guided by the sequences after clustering the phased small RNAs on transcript library. This step relies on the result of clustering and associated transcript (TAS candidate) from the first step of job, i.e. Search phased small RNA clusters.      Following is a typical example:

>ath-miR390a trigger TAS3 to produce ta-siRNAs clusters
AAGCUCAGGAGGGAUAGCGCC

The mature sequence of ath-miR390 directs degradation of TAS3 precursor and generates a ta-siRNAs cluster.

The pssRNAMiner accepts 20-30 nt sequence as valid input and at most 300 sequences could be uploaded at one time. Only 'ATCGU' will be accepted as valid letters in the sequence(s).

Maximum expectation for searching complimentary fragment:

To score the complementary regions between input phase-initiator and transcript sequences, we followed the method described by Yuanji Zhang(2005 PMID: 15980567]. The maximum expectation is the threshold of the score. A complementary region will be discarded if its score is greater than the threshold.

Max distance from cleavage site to phased small RNA clusters:

A complementary region doesn't necessarily mean the corresponding phase-initiator directs the cleavage of the ta-siRNA-related transcript. Here we allow the users to set a maximum distance factor. A complementary region will be discarded if its distance to the small RNA clusters (identified at first step) is larger than the threshold. We generally set the threshold to 5 times of the phasing size, i.e. 105nt. (see FAQ and our paper for detailed explanation)

Start/End position of cleavage site in complimentary fragment:

If a phase-initiator direct the biogenesis of ta-siRNA, the distance between its valid cleavage site and the boundary of phased clusters is supposed to be phased, i.e. multiple of 21 nt.

The users set a expected cleavage range in the complementary region beforehand and the server will check whether one of cleavage sites in the range is multiple of 21-nt away from the phased cluster on the same transcript. Most of phase-initiator tend to cleave transcript at the position between 9 and 11 nt of phased-initiator in complementary region (this is our default value), and the position between 10 and 11 nt is more common.

Query tools bar on the top of list view page:

Query tools bar facilities users to filter/include phased small RNA clusters based on keywords in description/ID of transcript, ID of phased small RNA and ID of related phase-initiator. Some "ribosomal" RNA or "transposon" are often detected with low P-value because of their repetition in sequence. "Unknown" transcripts may be potential TAS precursors with high probability. Thus, the users have the option to enter keywords to narrow the range of clusters to be viewed. Please note: In Include keywords, the relationship between multikeywords is "OR", but in Exclude keywords the relationship is "AND". Users can also set Minimum cleavage site(s) to only show those clusters having more valid cleavage sites than the threshold. Below is the sample of utilizing the query tools bar.

List view of phased small RNA clusters:

The table lists the phased small RNA clusters after screen of Query tools bar. To give users all possible combinations of small RNAs, users may find that some clusters appear more than one times, i.e. the list is redundant. Users need note: the default order of list is based on P-value, thus the cluster with minimum P-value will be listed at first.     However, if users input microRNA/ta-siRNA sequences for cleavage site analysis, the web page will sort clusters by the number of valid cleavage sites and complementary regions, and the P-value will be sorted as the third order.

The following is the definition of columns:

  1. Cluster ID: a unique ID for accessing the phased small RNA cluster. Users could click it to view the details of the cluster.
  2. Phased small RNA(s): IDs of all phased small RNAs in the cluster. The small RNA at start site of cluster is also included, though it will be ignored in calculation of P-value because it is the reference point of phased position.
  3. # of hit position [phased:non-phased]: number of phased/non-phased position having small RNAs hit within the 231 bp region.
  4. P-value: Calculated based on random hypergeometric distribution. Default value is 0.05, which means a phased cluster will be considered significant if its p-value is less than 0.05. User could increase the value in Query tools bar to set a less stringent threshold.
  5. Matched transcript: A transcript/genomic sequence ID in the user-selected transcript/genomic library. The cluster of small RNAs is located on the transcript or genomic sequence, which is thus assumed as a TAS gene precursor.
  6. Transcript description: The transcript's annotation.
  7. # of cleavage sites guided by input phase-initiator: This column, as well as next column, will appear if the users submit phase-initiator for analysis of valid cleavage site. This column represents the number of valid cleavage sites identified from complementary regions between input phase-initiator and the mapped TAS candidate. The number is expected to be less than or equal to the number in next column as some complementary regions don't have valid cleavage site relative to the small RNAs cluster.
  8. # of complementary regions: see description above.

Session Summary:

This table summarizes the session results generated during various stages of our pipeline. In different jobs, users may get different table columns because the pipeline finally performs through various stages. Please see the following explanation for each stage:

In Mapping small RNAs onto transcript/genomic sequences stage (Stage 1),

  • Number of total hits during mapping represents how many hits happened during the mapping. It is possible that one small RNA may hit multiple transcripts or even hit at various positions on the same transcript.
  • Number of small RNAs having hits represents how many small RNAs are involved in these hits. This value is supposed to be less than (or equal to) Number of total hits during mapping since one small RNA often hits many times in the transcript/genomic library.
  • Number of transcript/genomic sequences having hits means how many transcripts were hit by the above small RNAs. A transcript sequence is usually hit by multiple small RNAs at different coordinates.
Please note that the Number of total hits during mapping may show "0" on occasion, which usually happens when the submitted small RNAs and the specified transcript/genomic library are from different species. Afterwards, the pipeline will stop at this stage, since there are no mapped small RNAs to be clustered at the next stage.

In Clustering phased small RNAs stage (Stage 2), the table will give the total number of all clusters with phased small RNAs, even if a cluster only have two phased small RNAs and one of them is the small RNA working as phasing reference coordinate at the start site of the cluster. The pipeline will only perform this step when some hits are available (Number of total hits during mapping>0) at the previous stage.

In Identifying cleavage site stage (Stage 3), users could get Number of phased small RNA clusters with valid cleavage sites directed by phase-initiators. This column appears only when the users submit phase-initiator sequences and the pipeline detects phased small RNA clusters.

Phased small RNAs:

This table lists all the phased small RNAs in the cluster, including the mapping coordinates. Strand denotes the strand on which the small RNA is mapped because ta-siRNAs are chopped from double-strand RNA. Coordinate on TAS precursor represents the position of the first nt of small RNA on TAS candidate. Phased-position represents the phased point the small RNA hit. The value will be equal to Coordinate on TAS precursor, if Maximum offset from phased position is set to zero. Start site of cluster represents if the phased small RNA is the first small RNA used to determine the phased position of the cluster. By clicking the link at the first column, users may jump to mapping view at the bottom of page.

Complementary regions between input phase-initiator and TAS candidate:

This table lists all the complementary regions between the user-specified phase-initiator and the TAS candidate where the phased small RNAs cluster(s) are mapped.

View of small RNAs mapping and complementary region:

Please refer to the legend of above figure.

All phased 21-nt RNA fragments within the cluster and up/downstream region:

The table lists all potential phased 21-nt small RNA fragments which were chopped from the TAS candidate precursor's double strands. These fragments derived from the cluster region (In cluster) and/or the region between cleavage sites and cluster borders(5'-upstream of cluster or 3'-upstream of cluster), respectively. Users could utilize these sequences for further validation.

Demo data:

  Demo 1 Demo 2 Demo 3 Demo 4
Purpose Quick preview:
  • Search phased small RNA clusters
Quick preview:
  • Search phased small RNA clusters
  • Identify microRNA cleavage site(s) and link with the detected small RNA clusters
Users input real data to test:
  • Search phased small RNA clusters
Users input real data to test:
  • Search phased small RNA clusters
  • Identify microRNA cleavage site(s) and link with the detected small RNA clusters
How to try Users click [Load demo data #1] to automatically load data and options provided below. Users click [Load demo data #2] to automatically load data and options provided below. Users need to manually input data/options provided below. Users need to manually input data/options provided below.
Below is the input data or options on analysis page for above demos:
small RNA sequences for uploading (in FASTA format) a selected TAS3-related small RNAs dataset from RDR MPSS library for quick trial, download here a selected TAS3-related small RNAs dataset from RDR MPSS library for quick trial, download here RDR small RNA MPSS library (in FASTA format) at Arabidopsis MPSS Plus RDR small RNA MPSS library (in FASTA format) at Arabidopsis MPSS Plus
Transcript/genomic library for mapping Arabidopsis thaliana TAIR7, cDNA, released 04/25/2007 Arabidopsis thaliana TAIR7, cDNA, released 04/25/2007 Arabidopsis thaliana TAIR7, cDNA, released 04/25/2007 Arabidopsis thaliana TAIR7, cDNA, released 04/25/2007
Mapping ambiguity (maximum hits on transcript/genomic library) 6 6 6 6
Maximum offset from phased position 2 2 2 2
Your Email (optional): You will receive notification about your job's progress and tracking URL if you input your Email address here.
Candidate phase-initiator Keep it blank, the server will only "search phased small RNA clusters". All options below will be ignored >ath-miR390
AAGCUCAGGAGGGAUAGCGCC

After "search phased small RNA clusters", the server will proceed to "Identify phase-initiator for detected small RNA clusters".
Keep it blank, the server will only "search phased small RNA clusters". All options below will be ignored download miRNA FASTA file

After "search phased small RNA clusters", the server will proceed to "Identify phase-initiator for detected small RNA clusters".
Maximum expectation for searching complimentary fragment   5   5
Max distance from cleavage site to phased small RNA clusters   105   105
Start position of cleavage site in complementary fragment   9   9
End position of cleavage site in complementary fragment   11   11
After submission, wait your session's result online or save the URL about session's output and wait offline.
you will find the URL from the auto-refresh output page or email
Waiting time 5-10 minutes 5-10 minutes 30-60 minutes 30-60 minutes