This track is produced as part of the ENCODE Transcriptome Project. It shows the starts and ends of full length mRNA transcripts determined by GIS paired-end ditag (PET) sequencing using RNA extracts from different sub-cellular localizations in different cell lines. The RNA-PET information provided in this track is composed of two different PET length versions based on how the PETs were extracted. The cloning-based PET (18 bp and 16 bp) is an earlier version and detailed information can be found from reference (Ng et al. 2006). The cloning-free PET (25 bp and 25 bp) is a recently modified version which uses Type II enzyme EcoP15I to generate a longer length of PET (unpublished), which results in a significant enhancement in both library construction and mapping efficiency. Both versions of PET templates were sequenced by Solexa platform at 2 x 36 bp Paired End sequencing. See the Methods and References sections below for more details.
In the graphical display, the ends are represented by blocks connected by a horizontal line. In full and packed display modes, the arrowheads on the horizontal line represent the direction of transcription, and an ID of the format XXXXX-N-M is shown to the left of each PET, where X is the unique ID for each PET, N indicates the number of mapping locations in the genome (1 for a single mapping location, 2 for two mapping locations, and so forth), and M is the number of PET sequences at this location. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. PETs that mapped to multiple locations may represent low complexity or repetitive sequences.
To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide.
Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments.
Cells were grown according to the approved ENCODE cell culture protocols. Two different GIS RNA-PET protocols were used to generate the full length transcriptome PETs: one is based on a cloning-free RNA-PET library construction and sequencing strategy (unpublished), and the other is a cloning-based library construction (Ng et al. 2005) and recent Solexa paired end sequencing.
Method: The cloning-free RNA-PET libraries were generated from polyA mRNA samples and constructed using a recently modified GIS protocol (unpublished). Total RNA in good quality was used as starting material and purified through MACs polyT column to obtain full length polyA mRNAs. Approximately 5 micrograms of enriched polyA mRNA were used for reverse transcription to convert polyA mRNA to full length cDNA. The obtained full length cDNA was modified and ligated with specific linker sequences, followed by circularization through ligation to generate circular cDNA molecules. The 25 bp tag from each end of the full length cDNA was extracted by type II enzyme EcoP15I digestion. The resulting PETs were ligated with sequencing adaptors at the both ends, amplified by PCR, and further purified as complex templates for paired end (PE) sequencing using Solexa or SOLiD platforms. Most data displayed in this track are sequenced using Solexa.
Data: Data: The sequenced RNA-PETs are unified in 25/25 bp length from each end of a cDNA. After filtering out redundant and noise tags, the unique PETs will proceed to analysis pipeline. Initially, the orientation of each tag will be screened out by the barcode built in the sequencing-template, then paired into a given orientation-PET. The orientation-determined RNA-PET is mapped onto reference genome allowing up to two mismatches. Majority of PETs are mapped on the known transcripts, or splice variants. A small portion of misaligned PETs, defined as discordant PETs, are mapped either too far from each tag, have wrong orientations, or mapped in different chromosomes, indicating exist some transcription variations which could be caused by genome structure variations: such as fusion, deletion, insertion, inversion, tandem repeat and translocation; or RNA trans-splicing etc.
Method: The cloning-based RNA-PET (GIS-PET) libraries were generated from polyA RNA samples and constructed using the protocol described by Ng et al. (2005). Total RNA in good quality was used as starting material and further purified through MACs polyT column to enrich polyA mRNA and remove any contaminants (e.g., rRNA, tRNA, DNA, protein etc). Approximately 10 micrograms of polyA mRNA were then used for reverse transcription to convert polyA mRNA into full length cDNA. The obtained full length cDNA was modified with specific linker sequences, then, ligated to a GIS-developed (pGIS4) vector to form a complex full length cDNA library, which was cloned into E. coli. The plasmid DNA was then isolated from the library, followed by MmeI (a type II enzyme) digestion to generate a final length of 18 bp/16 bp ditags from each end of the full length cDNA. The single ditag (or called PET) was then ligated to form a diPET structure (a concatemer with two unrelated PET linked by a linker sequence) to facilitate Solexa Paired End sequencing.
Data: The cloning-based RNA-PETs are unified in 18 bp and16 bp length, respectively extracted from 5' and 3' end of each cDNA. The redundant reads were filtered out initially and unique ones were included for analysis. PET sequences were then mapped to (hg18) reference genome using the following specific criteria (Ruan et al. 2007):
PETs mapping to 2-10 locations are also included and may represent duplicated genes or pseudogenes in the genome. A majority of PETs mapped on the known transcripts or splice variants. A small portion of misaligned PETs, defined as discordant PETs, were mapped either too far from each other, mapped in the wrong orientation, or mapped to different chromosomes, indicating that some transcription variations exist which could be caused by genome structure variations: such as fusion, deletion, insertion, inversion, tandem repeat and translocation; or RNA trans-splicing etc.
The GIS RNA-PET libraries and sequence data for transcriptome analysis were generated and analyzed by scientists Xiaoan Ruan, Atif Shahab, Chialin Wei, and Yijun Ruan at the Genome Institute of Singapore.
Contact: Yijun RuanNg P, Tan JJ, Ooi HS, Lee YL, Chiu KP, Fullwood MJ, Srinivasan KG, Perbost C, Du L, Sung WK, et al., Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res. 2006;34:e84.
Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH, et al., Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005;2:105-111.
Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD, Srinivasan KG, Yao F, Choo CY, Liu J, Ariyaratne P, et al., Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res. 2007;17:828-838.
Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.