Description

This track displays expression levels of computationally identified first exons and a constitutive exon of genes in ENCODE regions, based on the real competitive Polymerase Chain Reaction (rcPCR) technique described in Ding et al. (2003). Expression levels are indicated by color, ranging from black (no expression) to red (high expression).

Experiments were performed on total RNA samples of ten normal human tissues purchased from Clontech (Palo Alto, CA): cerebral cortex, colon, heart, kidney, liver, lung, skeletal muscle, spleen, stomach, and testis.

The name for each alternative transcript starts with the gene name, followed by an identifier for the alternative first exon or the constitutive exon. For example, for gene CAV1, there are three alternative first exons (CAV1-E1A, CAV1-E1B, and CAV1-E1C) and the third exon is chosen as the constitutively expressed exon (CAV1-E3).

Methods

Alternative transcription start sites (TSS) for 20 ENCODE genes were predicted using PromoSer, an in-house computational tool. PromoSer computationally identifies the TSS by considering alignments of a large number of partial and full-length mRNA sequences and ESTs to genomic DNA, with provision for alternative promoters. In PromoSer, the treatment of alternative first exons (or the resulting TSSs) is as follows:

For each gene, all alternative first exons were identified based on manual selection of PromoSer predictions. An exon that is shared by all transcripts (called the constitutive exon) was also selected. The selection process involved visually examining the structure of the cluster, preferably using the latest data available on UCSC, to identify distinct first exons that were well formed (having multiple supporting sequences) and had no evidence (especially from newer sequences) of additional sequence that made them internal exons. After the first exon was identified, a subsequence (between 100-300 bases) was selected for use in the experiment. The selection process avoided repeat sequences as much as possible and if the two first exons partially overlapped, the non-overlapping region was selected. If those conditions caused the remaining sequence to be too short (or the first exon itself was too short), a junction with the second exon was used. A constitutive exon was also selected that was included in all (or most) of the alternative transcripts and suitable sequences were then extracted as above (no exon junctions are used).

The absolute expression levels of all exons were individually quantified by rcPCR by designing four assays with PCR amplicons corresponding to each exon. Amplicons were designed according to transcript sequences and can span a large distance on the genomic sequence. In addition, some amplicons were designed across the junctions between first exons and the constitutive second exons, and thus these amplicons may overlap with the amplicons that correspond to the constitutive second exons.

The rcPCR technique combined competitive PCR and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) for gene expression analysis. To measure the expression level of a gene, an oligonucleotide standard (60-80 bases) of known concentration, complementary to the target sequence with a single base mismatch in the middle, was added as the competitor for PCR. The gene of interest and the oligonucleotide standard resembled two alleles of a heterozygous locus in an allele frequency analysis experiment, and thus could be quantified by the high-throughput MALDI-TOF MS based MassARRAY system (Sequenom Inc.).

After PCR, a base extension reaction was carried out with an extension primer, a ThermoSequenase and a mixture of ddNTPs/dNTP (for example, a mixture of ddA, ddC, ddT, and dG). The extension primer annealed the immediate 5’-upstream sequence of the mismatch position. Depending on the nature of the mismatch and the mixture composition of ddNTPs/dNTP, one or two bases were added to the extension primer, producing two extension products with one base-length difference. These two extension products were then detected and quantified by MALDI-TOF MS.

Expression ratios (e.g. CAV1-E1A/CAV1-E3, CAV1-E1B/CAV1-E3, CAV1-E1C/CAV1-E3) indicate the relative abundance of alternative first exons. 18S rRNA was used for exon absolute expression normalization among different tissues.

Values shown on this track represent the relative abundance of the alternative first exons with respect to the 18S rRNA. The raw values have been log10 transformed and scaled to show graded colors on the browser.

Verification

One biological replicate was performed for each gene. Two to four competitor concentrations were used to detect the expression level of each exon. Two to six technical replicates were performed for each competitor concentration. One more biological replicate will be performed in the future.

Credits

Data generation and analysis for this track were performed by ZLAB at Boston University. The following people contributed: Shengnan Jin, Anason Halees, Heather Burden, Yutao Fu, Ulas Karaoz, Yong Yu, Chunming Ding, Charles R. Cantor, and Zhiping Weng.

References

Ding, C. and Cantor, C.R. A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc Natl Acad Sci U S A 100(6), 3059-64 (2003).

Ding, C. and Cantor, C.R. Direct molecular haplotyping of long-range genomic DNA with M1-PCR. Proc Natl Acad Sci U S A 100(13), 7449-53 (2003).

Halees, A.S., Leyfer, D. and Weng, Z. PromoSer: A large-scale mammalian promoter and transcription start site identification service. Nucleic Acids Res. 31(13), 3554-9 (2003).

Halees, A.S. and Weng, Z. PromoSer: improvements to the algorithm, visualization and accessibility. Nucleic Acids Res., 32, W191-W194 (2004).