Search phased small RNA clusters:This is the core function of pssRNAMiner. In this section, the server clusters the user-submitted small RNA sequences based on its phasing character on the mapped transcript. The output contains all the details about phased small RNA clusters as well as their mapped transcript (TAS candidate). The server will perform this function after the users submit small RNA sequences and specify interested transcript/genomic library and mapping ambiguity factor. Identify phase-initiator for detected small RNA clusters:This is an optional function. Users enter phase-initiators of interest in FASTA to let server perform the function, otherwise the server will only Search phased small RNA clusters after submission. Typical phase-initiators are miRNA mature sequences and less phase-initiators are ta-siRNAs. The function in this section relies on the result from the prior Search phased small RNA clusters. To perform this function, the server aligns the input phase-initiators (microRNA/ta-siRNA) and specified transcript libraries to search complementary regions. Subsequently, the server will identify valid cleavage sites from the complementary regions based on user-predefined cleavage site and its distance to the phased small RNA clusters on the transcript (TAS candidate). The obtained cleavage sites are supposed to link the upstream phase-initiator and the downstream phased small RNA clusters in cascade. Upload small RNA sequence(s) in simple format or FASTA format:The users upload small RNA sequences of interest for mining ta-siRNA cluster. Before analysis, our pipeline will check and process the uploaded sequences based on below requirements:
To help users obtain correct sequences content before submission, we strongly recommend the FASTA FORMAT MAKER for quick format conversion. The users could upload their small RNAs and then download the sequences in correct format. Mapping ambiguity (maximum hits on transcript/genomic library):Some small RNAs hit on the transcript/genomic sequences too many times, which makes the algorithm difficult to locate them on genome. Our experience indicates that the small RNAs sequenced from ribosomal RNA and transponsons have more hits on genome. Thus, we ignore these small RNAs with too many hits due to mapping ambiguity. The option allows users set a threshold to ignore those small RNAs with hits beyond the number. Maximum offset from phased position (+/- nt):The option allow users to set a maximum offset from phased position. Valid offset should be within 0-3 nt. A small RNA will be considered as a phased small RNA if it is mapped within a phased-position +/- offset; And the phased-position will also be considered as 'having hit'. Candidate phase-initiators in FASTA format (miRNA/ta-siRNA):If the users enter phase-initiator sequences here in FASTA format, the server will proceed to analyze potential cleavage site guided by the sequences after clustering the phased small RNAs on transcript library. This step relies on the result of clustering and associated transcript (TAS candidate) from the first step of job, i.e. Search phased small RNA clusters. Following is a typical example:
The pssRNAMiner accepts 20-30 nt sequence as valid input and at most 300 sequences could be uploaded at one time. Only 'ATCGU' will be accepted as valid letters in the sequence(s). Maximum expectation for searching complimentary fragment:To score the complementary regions between input phase-initiator and transcript sequences, we followed the method described by Yuanji Zhang(2005 PMID: 15980567]. The maximum expectation is the threshold of the score. A complementary region will be discarded if its score is greater than the threshold. Max distance from cleavage site to phased small RNA clusters:A complementary region doesn't necessarily mean the corresponding phase-initiator directs the cleavage of the ta-siRNA-related transcript. Here we allow the users to set a maximum distance factor. A complementary region will be discarded if its distance to the small RNA clusters (identified at first step) is larger than the threshold. We generally set the threshold to 5 times of the phasing size, i.e. 105nt. (see FAQ and our paper for detailed explanation) Start/End position of cleavage site in complimentary fragment:If a phase-initiator direct the biogenesis of ta-siRNA, the distance between its valid cleavage site and the boundary of phased clusters is supposed to be phased, i.e. multiple of 21 nt. Query tools bar on the top of list view page:Query tools bar facilities users to filter/include phased small RNA clusters based on keywords in description/ID of transcript, ID of phased small RNA and ID of related phase-initiator. Some "ribosomal" RNA or "transposon" are often detected with low P-value because of their repetition in sequence. "Unknown" transcripts may be potential TAS precursors with high probability. Thus, the users have the option to enter keywords to narrow the range of clusters to be viewed. Please note: In Include keywords, the relationship between multikeywords is "OR", but in Exclude keywords the relationship is "AND". Users can also set Minimum cleavage site(s) to only show those clusters having more valid cleavage sites than the threshold. Below is the sample of utilizing the query tools bar. List view of phased small RNA clusters:The table lists the phased small RNA clusters after screen of Query tools bar. To give users all possible combinations of small RNAs, users may find that some clusters appear more than one times, i.e. the list is redundant. Users need note: the default order of list is based on P-value, thus the cluster with minimum P-value will be listed at first. However, if users input microRNA/ta-siRNA sequences for cleavage site analysis, the web page will sort clusters by the number of valid cleavage sites and complementary regions, and the P-value will be sorted as the third order. The following is the definition of columns:
Session Summary:
This table summarizes the session results generated during various stages of our pipeline. In different jobs, users may get different table columns because the pipeline finally performs through various stages. Please see the following explanation for each stage: In Mapping small RNAs onto transcript/genomic sequences stage (Stage 1),
In Clustering phased small RNAs stage (Stage 2), the table will give the total number of all clusters with phased small RNAs, even if a cluster only have two phased small RNAs and one of them is the small RNA working as phasing reference coordinate at the start site of the cluster. The pipeline will only perform this step when some hits are available (Number of total hits during mapping>0) at the previous stage. In Identifying cleavage site stage (Stage 3), users could get Number of phased small RNA clusters with valid cleavage sites directed by phase-initiators. This column appears only when the users submit phase-initiator sequences and the pipeline detects phased small RNA clusters. Phased small RNAs:
This table lists all the phased small RNAs in the cluster, including the mapping coordinates. Strand denotes the strand on which the small RNA is mapped because ta-siRNAs are chopped from double-strand RNA. Coordinate on TAS precursor represents the position of the first nt of small RNA on TAS candidate. Phased-position represents the phased point the small RNA hit. The value will be equal to Coordinate on TAS precursor, if Maximum offset from phased position is set to zero. Start site of cluster represents if the phased small RNA is the first small RNA used to determine the phased position of the cluster. By clicking the link at the first column, users may jump to mapping view at the bottom of page. Complementary regions between input phase-initiator and TAS candidate:This table lists all the complementary regions between the user-specified phase-initiator and the TAS candidate where the phased small RNAs cluster(s) are mapped. View of small RNAs mapping and complementary region:
Please refer to the legend of above figure. All phased 21-nt RNA fragments within the cluster and up/downstream region:The table lists all potential phased 21-nt small RNA fragments which were chopped from the TAS candidate precursor's double strands. These fragments derived from the cluster region (In cluster) and/or the region between cleavage sites and cluster borders(5'-upstream of cluster or 3'-upstream of cluster), respectively. Users could utilize these sequences for further validation. Demo data:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||