This track depicts high throughput sequencing of long RNAs (>200 nt) from whole cell RNA samples from tissues or sub cellular compartments from cell lines included in the ENCODE Transcriptome subproject. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. RNA-Seq was performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing of the cDNA, which was done here on Helicos Genetic Analysis System (Harris et al; http://www.helicosbio.com/).
To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments.
Note that the strand of the RNA is not displayed in the track in the genome browser. The strand can be found in the download file.
RNA was converted into first strand cDNA using a high excess of random hexamers without prior fragmentation. Spurious second-strand cDNA synthesis could occur under these conditions. The first strand cDNA molecules were tailed at the 3′ ends with polyA residues using terminal transferase and used directly for sequencing.
Filtered reads were aligned to the human genome using in-house and freely available Helicos Alignment software indexDPgenomic (http://open.helicosbio.com/mwiki/index.php/Docs/Software/Bioinformatics#Executables, requires registration (free)) with a minimum normalized alignment score of 4.5. The normalized score was defined as following:
Score=(#matches*5-#mismatches*4)/length_read
For example, in the following alignment:
Tag Sequence CCTCCGTGTTGTTCCAGCC-CAGTGCTCGCAGG Ref Sequence C-TCCGTGTTGTTCCAGCCACAGTGCTCGCAGG Length of alignment block: 33 Length of tag sequence: 32 Number of matches: 31 Number of errors: 2 Score: (31*5) - (2*4) = 155 - 8 = 147 Normalized score = 147/32 = 4.59375
Raw data can be found at Helicos (requires registration (free)).
Known exon maps as displayed on the genome browser are confirmed by the alignment of sequence reads.
Helicos BioSciences: Philipp Kapranov, Eldar Giladi, Steve Roels, Chris Hart, Stan Letovsky, Patrice Milos.
Cold Spring Harbor Laboratory: Carrie Davis, Kim Bell, Huaien Wang, Tom Gingeras.
Contacts: Philipp Kapranov ; Patrice Milos
Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z. Single-molecule DNA sequencing of a viral genome Science. 2008 Apr 4;320(5872):106-9
Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.