Access Keys:
Skip to content (Access Key - 0)
Biology 230 - Molecules and Cells


Skip to end of metadata
Go to start of metadata
  • No labels


  • 5' cap
  • alternative splicing
  • branch-point sequence
  • consensus sequence
  • direct termination
  • exon
  • general (basal) transcription factors (TFII)
  • intron (intervening sequence)
  • lariat
  • mature mRNA
  • messenger RNA (mRNA)
  • polyadenylation
  • poly-A polymerase
  • poly-A tail
  • pre-mRNA
  • promoter
  • RNA polymerase
  • rho-dependent termination
  • sigma factor
  • site of initiation (start site)
  • small nuclear ribonucleoprotein particle (snRNP)
  • small nuclear RNA (snRNA)
  • splicing
  • spliceosome
  • TATA box
  • termination signal
  • transcription
  • transcription initiation complex

Introduction and Goals

The flow of information in a typical cell is from the DNA in the genome to the proteins manufactured in the cell. This occurs through two sequential processes: transcription and translation. Transcription is the synthesis of messenger RNA (mRNA), which is the intermediate between the DNA sequence of the genome and the proteins expressed by the cell; mRNA, in turn, is used to direct the synthesis of protein. Translation is the process of translating the nucleotide code of mRNA to direct the synthesis of protein. Transcription will be described in this tutorial, and translation will be described in the next tutorial. By the end of this tutorial you should know:

  • The processes of initiation, elongation and termination of prokaryotic transcription
  • The organization of a typical prokaryotic promoter
  • The organization of a typical eukaryotic promoter, and the assembly of a transcription initiation complex
  • The processing of eukaryotic mRNA, including 5' methyl cap addition, polyadenylation and splicing
  • The transport of a mature eukaryotic mRNA out of the nucleus

From DNA to Protein

Figure 1. DNA to mRNA to Protein.During transcription, a region of DNA is copied into mRNA. One strand of the DNA is used as a template during transcription to synthesize the mRNA. The mRNA sequence is complementary to the DNA, however, uracil (U) is used in the mRNA as the complementary nucleotide for adenine (A). During translation, a triplet of nucleotides in the mRNA, termed a codon, are used to direct the synthesis of the protein.

Although the DNA of a cell encodes all the proteins that the cell can synthesize, the DNA does not directly encode for the protein. In a process termed transcription, regions of the DNA molecule are copied into messenger RNA (mRNA), which, in turn, is used in translation to direct the synthesis of proteins. The entire DNA molecule is not copied into mRNA; for instance, some regions of DNA do not encode for genes and some genes may not be expressed at all in a particular cell. The levels of different mRNAs can also be variable; for instance, one gene may be transcribed (also termed expressed) at a very high rate, while another gene in the same cell may be transcribed at a much lower level or not at all.

Figure 2. Transcription and translation in prokaryotes and eukaryotes.In a prokaryotic cell, transcription and translation occur concurrently in the cell. Ribosomes (the protein-RNA complexes that carry out translation) assemble on the newly synthesized mRNA. In a eukaryotic cell, transcription occurs in the nucleus and the newly synthesized pre-mRNA is modified in a variety of ways (referred to as RNA processing) and exported from the nucleus, whereupon a ribosome assembles on the mature mRNA in the cytoplasm.

Although there are many similarities in transcription between prokaryotes and eukaryotes, there are several important differences. The most striking is that the processes of transcription and translation occur in the same region of a prokaryotic cell, and often occur coincidentally; in a eukaryotic cell they occur in different compartments of the cell (see Figure 2). In a eukaryotic cell, transcription occurs in the nucleus and mRNA is modified in a variety of ways before it is exported out of the nucleus to the cytoplasm for translation. In general, the transcription of eukaryotic genes is more complex than that of prokaryotic genes, requiring additional enzymes (some specifically to carry out mRNA modification) as well as mechanisms to deal with the greater complexity of the eukaryotic genome.

RNA Polymerase

Figure 3. RNA polymerase transcribes mRNA.(1) RNA polymerase binds DNA in the promoter region (pink-shaded area) and unwinds the DNA in front of it. (2) Once the DNA is unwound, one strand is used as a template to synthesize the complementary mRNA. Here, the bottom strand of the DNA is the template. The normal rules of complementary base pairing apply, except uracil (U) substitutes for thymine (T) in the RNA and is complementary to adenine (A). Synthesis of mRNA occurs in the 5' to 3'direction, reading the template strand in the 3' to 5' direction. (3) RNA polymerase travels along the DNA, and the mRNA is no longer base-paired to the DNA (except for a short region of 10-12 nucleotides right behind the RNA polymerase). The double-stranded DNA helix reforms behind the RNA polymerase. (4) The RNA polymerase dissociates from the DNA when it encounters a terminator sequence (green-shaded area).

The DNA of a cell is transcribed into mRNA by the enzyme RNA polymerase. Transcription begins when RNA polymerase binds to a region of DNA (the promoter), and proceeds to unwind the double helix to reveal the two single-strands of DNA. RNA polymerase will use one strand of the DNA as a template and catalyze the synthesis of a complementary strand of nucleic acid composed of ribonucleotides. The same rules of complementary base pairing are used in the synthesis of RNA, except that when the DNA template contains the deoxynucleotide adenine (A), RNA polymerase inserts the ribonucleotide uracil (U) in the newly synthesized (nascent) mRNA chain. The synthesis of mRNA is in the 5' to 3' direction, reading the template in the 3' to 5' direction. The nascent mRNA will not remain hydrogen-bound to the DNA; that is, after a region of DNA is transcribed, it will rapidly reform the DNA double helix. The synthesis of mRNA is processive, meaning that RNA polymerase binds the DNA and copies one strand over a long distance. RNA polymerase dissociates from the DNA at a specific termination signal (see below). Unlike DNA replication, where one strand of DNA is replicated once during the cell cycle, transcription of a region of DNA to form mRNA can occur many times. In fact, for a gene that is highly transcribed, many RNA polymerases can transcribe the DNA simultaneously, each polymerase initiating transcription anew and moving progressively along the DNA, one after another.

In prokaryotes there is a single RNA polymerase that catalyzes the transcription of mRNA as well as the other types of RNAs (tRNA and rRNA, both used in the process of translation). In eukaryotes there are three distinct RNA polymerases that transcribe different types of RNA. RNA polymerase I transcribes most of the ribosomal RNAs, RNA polymerase II transcribes the mRNAs and small nuclear RNAs, and RNA polymerase III transcribes the tRNAs and one of the ribosomal mRNAs. In this tutorial we will focus on eukaryotic RNA polymerase II and the transcription of mRNA.

Initiation of Prokaryotic Transcription

Figure 4.  A typical prokaryotic promoter.RNA polymerase is recruited to the promoter, the region upstream of the start of transcription. RNA polymerase binds to DNA in two places (at positions -35 and -10). The sequence at position -10 is referred to as the TATA box. The base pair (bp) at position +1 is the site of transcription initiation, the first bp to be transcribed. Transcription proceeds along the DNA in the direction indicated by the arrow.

Transcription has three distinct phases: initiation, elongation and termination. Transcription initiation is the process of recruiting RNA polymerase to the appropriate place on DNA. Initially, RNA polymerase binds to the region of DNA termed the promoter, which is technically the beginning of the gene. This region is 5' (referred to as upstream) relative to the first nucleotide to be transcribed (referred to as the site of initiation or the start site). When describing the organization of a gene and its promoter, the site of initiation is denoted as +1 (see Figure 4for illustration). Conventionally, the sequences 3' (or downstream) of the initiation site are denoted as positive numbers (+2, +3, +4 ...) and the sequences upstream of the initiation site are denoted as negative numbers (-1, -2, -3 ...).

RNA polymerase initially binds to the promoter DNA weakly; however, when RNA polymerase is associated with an additional protein called thesigma factor, the polymerase binds to the promoter more tightly. When RNA polymerase (plus the sigma factor) binds to double-stranded DNA at the promoter, the enzyme unwinds the DNA immediately in front of it and begins to use one of the two strands of DNA as a template. Transcription proceeds for the first 10 nucleotides and the sigma factor is released, allowing transcription elongation to continue.

Prokaryotic Promoters

The binding of RNA polymerase to the promoter is dictated by the sequence of the DNA in that region. Specifically, there are two RNA polymerase binding sites in DNA: one at position -35 and one at position -10 (see Figure 4). Both are bound by RNA polymerase but not transcribed into mRNA. The sequence of DNA at these positions is very similar (although not identical) in the promoters of many different prokaryotic genes. For example, the sequence at -10 is referred to as the TATA box because a comparison of the promoters of many genes revealed a sequence of TATAAT. Such sequences are termed consensus sequences, meaning that for one gene the actual sequence may be TATATT and for another gene it may be TATGAT; however, the consensus is that in the majority of cases the sequence is TATAAT in many prokaryotic promoters. The binding of specific amino acids of RNA polymerase (plus the sigma factor) to the regulatory sites in DNA at positions -35 and -10 bps orients the RNA polymerase on the DNA. It also determines the direction of transcription. Transcription initiates at the +1 nucleotide, approximately 10 bps downstream from the promoter site.

Prokaryotic Elongation and Termination

Figure 5. Intrinsic termination.Termination occurs when a hairpin loop is formed in the mRNA. The sequence highlighted in red indicates the G-C-rich region that forms the stem of the hairpin loop through intramolecular base pairing.

Once transcription has begun and ~10 nucleotides are transcribed, the sigma factor is released and RNA polymerase continues along the DNA, copying one strand into mRNA. Elongation of a strand of mRNA occurs in the 5' to 3' direction. As RNA polymerase proceeds, a short stretch (10-12 bps) of RNA/DNA hybrids are formed by hydrogen bonding, but they are rapidly dissociated and the double-stranded DNA helix reforms. The proofreading mechanism observed with DNA polymerase is not as common in RNA transcription, and therefore transcription is inherently more error prone. This is not as devastating to the cell as an error in DNA replication because a cell copies its DNA only once per cell cycle, but a gene can be transcribed many times per cell cycle. So if just one mRNA molecule out of many is incorrect, the cell can tolerate this relatively small inaccuracy. 

The termination of transcription occurs when RNA polymerase encounters a termination signalin the mRNA sequence and it dissociates from the DNA. The RNA polymerase is then ready to bind sigma factor again and initiate transcription on another promoter. There are two types of termination: direct and rho-dependent. Direct termination involves the termination signal, a sequence in the newly synthesized mRNA that is rich in the nucleotides guanine and cytosine, followed by a short stretch that is rich in the nucleotide uracil (see Figure 5). The stretch of cytosines and guanines in the mRNA can anneal, creating an intramolecular double-stranded RNA structure referred to as a hairpin loop. Once the hairpin is formed in the nascent mRNA, RNA polymerase will dissociate from the DNA and release the nascent mRNA. In rho-dependent termination a cytosine-rich sequence near the end of the mRNA is bound by the protein rho, which then causes RNA polymerase to dissociate from the DNA and release the nascent mRNA.

The Eukaryotic Promoter and Transcription Initiation

Eukaryotic promoters, like prokaryotic promoters, are the regions of initiation of transcription. Unlike prokaryotic RNA polymerase, however, eukaryotic RNA polymerases cannot directly bind to a promoter.General (or basal) transcription factors are required to bind sequences in a promoter and to recruit RNA polymerase. The general transcription factors that recruit RNA polymerase II are termed TFII A, B, D, E, F, and H. These transcription factors (each composed of several protein subunits) and RNA polymerase form a large transcription initiation complex. For many promoters this begins by the binding of TFIID to the TATA box, which for many promoters is 25-30 bps upstream from the start of transcription. After the binding of TFIID, the other transcription factors, as well as RNA polymerase II, assemble on the promoter. TFIIH then phosphorylates RNA polymerase II to allow transcription elongation to occur. There is considerable variation in the organization of eukaryotic promoters and not all contain TATA sequences; nonetheless, they initiate transcription via the transcription initiation complex described above. Transcription proceeds in a fashion similar to transcription in prokaryotes; however, it is believed that part of the transcription initiation complex remains at the promoter to facilitate additional RNA polymerase II molecules being recruited and positioned on the promoter. Transcription proceeds until a region of DNA encoding the termination signal is transcribed (usually a G-U-rich sequence in the mRNA). This region of mRNA assumes a structure that is the trigger for RNA polymerase II to dissociate from the DNA and terminate transcription.

Modification of Eukaryotic mRNA

Figure 6.  The 5' and 3' modification of eukaryotic mRNA.The 5' end of the nascent mRNA is capped by the addition of a methylated guanine nucleotide (highlighted in red on left). The 3' end of the mRNA is cleaved and poly-A polymerase adds between 100-250 nucleotides at the 3' end (highlighted in red on right).

As RNA polymerase II transcribes DNA into mRNA, the nascent mRNA is modified at the 5' and 3' ends. After the first 25 nucleotides of the mRNA are synthesized, a methylated guanine nucleotide is added to the 5' end in a template-independent fashion. This modification, referred to as the 5' cap, serves to distinguish mRNA from other types of RNA. It is also important for the export of mature mRNA out of the nucleus into the cytoplasm and for the correct positioning of mRNA on the ribosome for translation. Transcription proceeds until a region of DNA encoding the termination signal is transcribed. This sequence of the mRNA is cleared and a unique template-independent RNA polymerase, poly-A polymerase, adds the nucleotide adenine (A) to the 3' end; this is termed polyadenylation. The length of the poly-A tailwill vary from 100 - 250 nucleotides. The presence of the poly-A tail is important for nuclear export of the mRNA. Thus, both the 5' and 3' ends of eukaryotic mRNAs include nucleotides that are not encoded by the DNA (shown in Figure 6).

Eukaryotic mRNAs Are Spliced

Figure 7. Splicing of eukaryotic mRNA.RNA polymerase transcribes all of the sequences of the pre-mRNA. The red boxes indicate the 5' caps and 3' poly-A tails, respectively. In some cases, the mRNA is spliced prior to addition of the poly-A tail. The mRNA is processed to remove the introns and to join the exons into a single linear molecule of mRNA. Alternative splicing is selective inclusion of some (but not all) exons, thus generating multiple variants of the mature mRNA. Often these variants are expressed in specific tissues; for instance, variant I might be exclusively expressed in muscle cells, variant II exclusively in brain cells, and variant III in all tissues.

Almost all coding regions of eukaryotic mRNAs are interrupted by a non-coding sequence. The stretches of sequence that interrupt coding sequences are referred to as introns (intervening sequences) . The stretches that are coding (or expressed) are referred to as exons. Introns can vary in size from tens of nucleotides to thousands of nucleotides. The number of introns per gene can also vary from a single intron to over two dozen. RNA polymerase II transcribes all of the exons and introns, but as transcription proceeds the introns are removed and the exons are rejoined in a process called splicing. The mRNA containing both exons and introns is referred to as the pre-mRNA; the mRNA after splicing is referred to as the mature mRNA, which is the mRNA to be exported out of the nucleus. This is illustrated in Figure 7. A single pre-mRNA can be spliced in various ways to produce different splice variants that encode for different (but related) proteins. For instance, in muscle cells the mature mRNAs will include all exons (variant I in Figure 7), whereas in brain cells the third exon is skipped over and not included in the mature mRNA (variant II in Figure 7). This selective inclusion of exons is referred to as alternative splicing, and often, cells in different tissues will express different mRNA variants of the same gene.  

Figure 8.  The mechanism of splicing.The exons are shown in dark blue and the intron in light blue. The conserved sequences at the exon-intron boundaries are also shown. The 5' and 3' splice junctions are brought together by the action of snRNPs, which bind the 5' splice junction and the branch-point sequence (the conserved A is highlighted in red). After the cleavage of the 5' splice junction, the intron sequence assumes a distinct structure called the lariat. The 3' splice junction is cleaved and the two exons are joined. The lariat is degraded and the snRNPs are used again for additional splicing.

The introns are recognized and removed in a distinct fashion, where the mRNA actually cleaves itself. Each intron has a different sequence, however, there are a few nucleotides at the ends of all introns that mark the boundary between exon and intron (see Figure 8). In addition, near the 3' end of the intron, referred to as the branch-point sequence, there always is an A that is critical to the mechanism of cleavage. The A attacks the 5' splice junction, giving rise to the distinct lariatstructure of the intron. The 3' splice junction is then cleaved and the exons are joined. The cleavage of the intron and the rejoining of the exons are facilitated by additional RNAs and proteins in a complex called the spliceosome. The spliceosome is composed of small nuclear RNAs (snRNAs) associated with proteins in small nuclear ribonucleoprotein particles (snRNP). The snRNAs that recognized sequences at the intron-exon boundaries, as well as the branch-point sequence, bring the 5' splice junctions and the branch-point sequence together.

Splicing is a unique feature of eukaryotic genomes and this, in part, explains why many eukaryotic organisms, including humans, have a large genome size (relatively little of which is actually for coding). It also explains how humans can synthesize ~90,000 different proteins when our genome only encodes for ~25,000 genes! Differential splicing provides a means to increase the repertoire of proteins that a cell can synthesize, allowing a multicellular organism to generate tissue-specific versions of proteins.

Nuclear Export of the Mature mRNA

The mature mRNA can only leave the nucleus if it has been capped at the 5' end, polyadenylated and splicing is complete. Specific proteins bind both the 5' cap and the poly-A tail and mark the mRNA for transport through the nuclear pore. Once in the cytoplasm, the mRNA will associate with the ribosome for translation. The mRNA will eventually be degraded. The lifetime of different mRNAs varies from minutes to hours.


Transcription is the process of copying DNA into mRNA, catalyzed by the enzyme RNA polymerase. In prokaryotic cells there is a single RNA polymerase, but in eukaryotic cells there are several; RNA polymerase II catalyzes the synthesis of mRNA. In prokaryotes and eukaryotes transcription is initiated at the promoter region, where RNA polymerase initially binds upstream of the start of transcription. In prokaryotic cells, RNA polymerase plus the sigma factor are sufficient for the precise binding of the polymerase to the sequences at positions -35 and -10 of the promoter. Transcription continues until a termination signal in the DNA is transcribed and the RNA structure assumed by this sequence signals the dissociation of the polymerase from the DNA. Eukaryotic transcription is similar yet has several unique features. First, transcription initiation is much more complex in a eukaryotic nucleus. General transcription factors are required to bind the promoter, to recruit RNA polymerase II, and to modify RNA polymerase to start transcription. Second, the mRNA is modified at both the 5' and 3' ends in a template-independent fashion. At the 5' end, the mRNA is capped by the addition of a methylated guanine nucleotide. At the 3' end, the termination signal in the mRNA is cleaved and the mRNA is polyadenylated to add between 100-250 adenine nucleotides. Third, eukaryotic mRNA is spliced to remove the introns and join the exons in the mature mRNA. Cleavage of introns is mediated through an intramolecular cleavage, factilitated by snRNPs, resulting in the distinct intron lariat structure. Alternative splicing creates different mRNAs by selective inclusion of exons; often, this is to create tissue-specific mRNAs. Splicing of the pre-mRNA is mediated by the snRNAs of the spliceosome. Only the capped, spliced and polyadenylated mRNAs are exported from the nucleus into the cytoplasm and used for translation.