Description

This track shows the starts and ends of mRNA transcripts determined by paired-end ditag (PET) sequencing. PETs are composed of 18 bases from either end of a cDNA; 36 bp PETs from many clones were concatenated together and cloned into pZero-1 for efficient sequencing. See the Methods and References sections below for more details on PET sequencing.

The PET sequences in this track are full-length transcripts derived from two cell lines and mapped on whole genome:

the log phase of MCF7 cells
HCT116 cells treated with 5FU (5-fluorouracil) for 6 hours

In total, 584,624 PETs were generated for MCF7 and 280,340 PETs were generated for HCT116. More than 80% of the PETs in each group were mapped to the genome. The 474,278 MCF7 PETs and 223,261 HCT116 PETs that mapped with single and multiple (up to ten) matches in the genome are shown in the two subtracks.

In the graphical display, the ends are represented by blocks connected by a horizontal line. In full and packed display modes, the arrowheads on the horizontal line represent the direction of transcription, and an ID of the format XXXXX-N-M is shown to the left of each PET, where X is the unique ID for each PET, N indicates the number of mapping locations in the genome (1 for a single mapping location, 2 for two mapping locations, and so forth), and M is the number of PET sequences at this location. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. PETs that mapped to multiple locations may represent low complexity or repetitive sequences.

The graphical display also uses color coding to reflect the uniqueness and expression level of each PET:

Color	Mapping	PETS observed at location
dark blue	unique	2 or more
light blue	unique	1
medium brown	multiple	2 or more
light brown	multiple	1

Methods

PolyA+ RNA was isolated from the cells. A full-length cDNA library was constructed and converted into a PET library for Gene Identification Signature analysis (Ng et al., 2005). Generation of PET sequences involved cloning of cDNA sequences into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET. Each 36 bp PET sequence contains 18 bp from each of the 5' and 3' ends of the original full-length cDNA clone. The 18 bp 3' signature contains 16 bp 3'-specific nucleotides and an AA residual of the polyA tail to indicate the sequence orientation. PET sequences were mapped to the genome using the following specific criteria:

a minimal continuous 16 bp match must exist for the 5' signature; the 3' signature must have a minimal continuous 14 bp match
both 5' and 3' signatures must be present on the same chromosome
their 5' to 3' orientation must be correct
the maximal genomic span of a PET genomic alignment must be less than one million bp

Most of the PET sequences (more than 90%) were mapped to specific locations (single mapping loci). PETs mapping to 2 - 10 locations are also included and may represent duplicated genes or pseudogenes in the genome.

Verification

To assess overall PET quality and mapping specificity, the top ten most abundant PET clusters that mapped to well-characterized known genes were examined. Over 99% of the PETs represented full-length transcripts, and the majority fell within ten bp of the known 5' and 3' boundaries of these transcripts. The PET mapping was further verified by confirming the existence of physical cDNA clones represented by the ditags. PCR primers were designed based on the PET sequences and amplified the corresponding cDNA inserts from the parental GIS flcDNA library for sequencing analysis. In a set of 86 arbitrarily-selected PETs representing a wide range of annotation categories — including known genes (38 PETs), predicted genes (2 PETs), and novel transcripts (46 PETs) — 84 (97.7%) confirmed the existence of bona fide transcripts.

Credits

The GIS-PET libraries and sequence data for transcriptome analysis were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore and the Bioinformatics Institute of Singapore.

References

Ng, P. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat. Methods 2(2), 105-11 (2005).