Access Keys:
Skip to content (Access Key - 0)

From Gene to Protein

Skip to end of metadata
Go to start of metadata
  • No labels

Introduction and Goals

In the previous tutorial you learned about how DNA is replicated.  In this tutorial you will learn how DNA codes for the production of proteins.  By the end of the tutorial you should have a basic understanding of:

  • RNA transcription
  • Protein translation
  • Mutations

Performance Objectives:

  • Describe the process of transcription, including the molecules and enzymes that are needed  
  • Interpret the genetic code table and state the properties of the genetic code  
  • Explain the phenomenon of gene expression  
  • Summarize the process of translation, including the molecules that are needed
  • Understand how mutations in the DNA can affect the polypeptide produced

Gene To Protein: The Central Dogma

Figure 1. Transcription and Translation in Prokaryotic Cells versus Eukaryotic Cells. (Click to enlarge)

The central dogma (prevailing theme) of molecular biology is: DNA is transcribed into messenger RNA, and messenger RNA is translated into proteins. In other words, DNA codes for the synthesis of proteins. This is accomplished via two processes: transcription and translation. First, one of the DNA strands (the template strand) is used as a template to make messenger RNA (mRNA).  This molecule of mRNA is complementary to the template strand of the DNA molecule and is identical (except for the substitution of uracil for thymine) to the other strand (known as the "coding strand").

Transcription (Figure 1) occurs in the nucleus of eukaryotes, and the mRNA must leave the nucleus and associate with a ribosome to begin the translation of a protein. Figure 1 outlines these steps, along with where they occur in prokaryotic and eukaryotic cells. Note that because prokaryotes do not have a nucleus, proteins can be translated from mRNA immediately.

Gene To Protein: The Triplet Code

It is the sequence of nucleotides (adenines, guanines, cytosines, and thymines) in a DNA molecule that determines the type of protein that is synthesized (as seen in Figure 2). However, there are four possible bases and twenty possible amino acids that join to form a polypeptide chain, so one base cannot code for one amino acid; that scenario would yield only four possibilities. A two-base code would provide only sixteen possibilities; a minimum of three bases is needed to specify a particular amino acid. In 1961, Marshall Nirenberg "cracked" the genetic code by determining the codon (sequence of bases) that specifies the amino acid phenylalanine. The amino acids lysine, glycine and proline followed, and now the sequence of bases that codes for each of the amino acids found in proteins is known. 

Figure 2.  The genetic code is a triplet code. (Click to enlarge)


Figure 3.  The Genetic Code.  Each mRNA codon codes for an amino acid with the exception of UAA, UAG and UGA (the three stop codons). (Click to enlarge)

Figure 3 shows the genetic code used by almost all organisms on earth.  In this version of the code, mRNA codons code for either amino acids or stop codons.  Notice that the codon AUG  codes for the amino acid methionine and is also a start codon.  The AUG codon signals to the protein translation machinery to begin translating the mRNA at that location.   While every newly synthesized protein begins with the amino acid methionine, enzymes may remove the methionine during the final processing of the protein. 

Because most organisms on earth use this code, the genetic code is considered "universal" and indicates a common origin to life on earth.  The genetic code is also "unambiguous" and "redundant." Unambiguous means that the codons are fixed and that each codon specifies one amino acid. For example, ACC codes for tryptophan and nothing else. However, codons may be redundant, meaning that several codons may code for the same amino acid. For example, CAA, CAC, CAG, and CAT all code for a single amino acid (valine).  This redundancy provides protection against the detrimental effects of mutation as will be discussed later in this tutorial.  

Gene To Protein: Transcription

The production of a protein takes place in two stages.   First, one of the two strands of DNA is transcribed (or copied) into a single strand of complementary RNA termed messenger RNA (mRNA). As you should now understand, the mRNA is complementary only to the template strand of DNA.

The process of transcription occurs in the nucleus of eukaryotic cells and requires that the two strands of DNA separate, or open up sufficiently enough so that mRNA can be produced. The enzyme RNA polymerase II separates the DNA strands and joins the RNA nucleotides along the exposed DNA template strand. This process is initiated when certain proteins, known as transcription factors, bind to the starting point of a gene.  This starting point is known as a "promoter". The promoter is a sequence of DNA bases that signals the beginning of RNA synthesis. RNA polymerase II adds nucleotides to the 3' end of the elongating RNA molecule. The enzyme moves down the DNA strand, unwinding as it goes and allowing the DNA helix to reform after a sequence has been transcribed. This continues until a DNA sequence, known as a "terminator sequence", signals the end of RNA synthesis. Transcription is broken down into three stages: initiation, elongation, and termination.  At the end of transcription a molecule of mRNA has been produced that carries the instructions for synthesizing a protein.  

Get Adobe Flash player


This animation shows the process of transcription.

Gene To Protein: Translation Ingredients

Figure 4. Overview of Involvement of Ribosomes, tRNA and mRNA in Translation. (Click to enlarge)

In addition to mRNA, two other types of RNA are needed for protein synthesis. These are ribosomal RNA (rRNA) and transfer RNA (tRNA) (Fig. 5). Ribosomal RNA combines with proteins to form ribosomes; the cellular structures where the synthesis of polypeptides occurs. Ribosomes consist of two subunits: one large and one small. As illustrated in Figure 4, tRNA molecules transport amino acids to the growing polypeptide chain. Each tRNA molecule has an amino acid attachment site for a particular amino acid and an anticodon (a sequence of three nucleotides that is complementary to a sequence of bases in the mRNA strand).

The enzyme aminoacyl-tRNA synthetase (Fig. 6) ensures that a given tRNA molecule picks up only its specific amino acid. Aminoacyl-tRNA synthetase contains sites that bind amino acids and tRNA, and energy is required to bring these raw materials together.  To ensure high fidelity of protein translation, each tRNA has a corresponding aminoacyl-tRNA. 


 Figure 5. The Structure of Transfer RNA (tRNA). (Click to enlarge)

 Figure 6. Aminoacyl-tRNA Synthetase's Role in Translation. (Click to enlarge)

Gene To Protein: Translation

Just like transcription, translation takes place in three stages: initiation, elongation, and termination. 

Initiation: In initiation (Fig. 8), mRNA binds to the small subunit of a ribosome. An initiation codon, AUG, binds with an initiator tRNA molecule that bears the anticodon UAC and the amino acid methionine. Then a large ribosomal subunit attaches, making the initiation complex complete and allowing translation to begin.

Figure 8.  Initiation of Translation. (Click to enlarge)

Figure 9 illustrates the three attachment sites on a ribosome. The E site is the exit site, the P site is the peptidyl-tRNA binding site, and the A site is aminoacyl-tRNA binding site. 

Figure 9. The Three Attachment Sites On a Ribosome. (Click to enlarge)

Elongation: In the elongation stage (Fig. 10), the polypeptide grows by addition of amino acids according to the sequence of bases in the mRNA molecule. This is accomplished through codon recognition, peptide bond formation, and translocation. A tRNA molecule carrying the appropriate amino acid (an aminoacyl-tRNA) binds to the A-site, and a peptide bond forms between the new amino acid and the end of the growing polypeptide. Then the complex shifts down the mRNA molecule: the P-site tRNA is bumped to the E-site, where it dissociates from the ribosome; the A-site tRNA moves into the P-site; and a new aminoacyl-tRNA attaches to the now open A-site.

Termination: Elongation continues until a mRNA stop codon reaches the A-site of the ribosome. Stop codons include UAA, UAG and UGA, and they do not code for an amino acid. Instead of a tRNA, a release factor protein* binds to the stop codon and the newly synthesized polypeptide is liberated from the ribosome.

Figure 10. Peptide Chain Elongation During Translation. (Click to enlarge)

The newly synthesized polypeptide will undergo coiling and folding to form its secondary and tertiary structures, and it may combine with additional polypeptide chains to achieve quaternary structure (protein structure was discussed in Tutorial 3).


View this animation of translation.

Gene To Protein: Gene Regulation

During sexual reproduction in many organisms, such as humans, a sperm fertilizes an egg to produce a zygote. This zygote then begins to divide via mitosis to produce genetically identical daughter cells. Eventually these genetically identical cells will differentiate into different cell and tissue types.  This process of cell differentiation is the result of gene regulation (gene regulation describes how genes are "turned on" and "turned off").  In each cell type, only certain genes are expressed (in fact, in most human cells only 3-5% of genes are expressed).  This control of gene expression is due, in part, to the appropriate binding of transcription factors to the promoter region of a gene.  When the transcription factor binds to the promoter, it starts the process of gene expression.



A mutation is a change in the nucleotide sequence of a cell.  Many of the genetic diseases you learned about in Tutorial 32 are the result of a point mutation (a change in a single nucleotide).  For example, sickle cell anemia is the result of a single base-pair substitution that replaces a thymine with an adenine.  The result is a change in the amino acid sequence of hemoglobin (a glutamic acid (a hydrophilic amino acid) is replaced with a valine (a hydrophobic amino acid)).

Additions and deletions of nucleotides can result in mutations known as "frameshift mutations" because they throw off the reading frame of the genetic message.  Frameshift mutations often result in a completely nonfunctional protein and are thus often significantly more damaging than a point mutation.  



The process of RNA transcription has some similarities to DNA replication (e.g., synthesis occurs in the 5' to 3' direction), but it also has some important differences. First, only one strand of DNA is used as a template for RNA synthesis. Second, ribonucleotides are used instead of deoxyribonucleotides. Finally, not all DNA in a genome is transcribed at once. Rather, via the action of transcription factors and gene regulation, only selected genes are transcribed at a given time. Be sure that you understand the basic aspects of this process.

Protein translation is the process by which messenger RNA (mRNA) supplies the necessary information for the synthesis of proteins. There are three basic components to a cell's translational machinery: mRNA, tRNA, and ribosomes. Messenger RNA provides the template that will be used for ordering the correct sequence of amino acids. Fidelity of the translational process is assured, in part, by the fact that each amino acid has its own transfer RNA. Transfer RNA (tRNA) is found with an appropriate amino acid. For example, a tRNA that has an anticodon of "UAC" will bind to the triplet on the mRNA with the complimentary sequence "AUG." Thus, each tRNA delivers the appropriate amino acid to the ribosome; ordering of amino acids is determined by the linear arrangement of the genetic code. Be sure that you understand the relationship between these three components of the cell's translational machinery.



After reading this tutorial, you should have a working knowledge of the following terms:

  • aminoacyl-tRNA
  • aminoacyl-tRNA synthetase
  • anticodon
  • coding strand (DNA)
  • codon
  • frameshift mutation
  • gene regulation
  • genetic code
  • messenger RNA (mRNA)
  • mutation
  • point mutation
  • promoter
  • release factor protein
  • replication fork
  • RNA polymerase
  • ribosomal RNA (rRNA)
  • ribosome
  • stop codon
  • template strand (DNA)
  • terminator sequence
  • transcription
  • transcription factor
  • translation
  • transfer RNA (tRNA)

Case Study for From Gene to Protein

CFTR (Cystic Fibrosis Transmembrane Conductance Regulator) is the gene that causes the genetic condition Cystic Fibrosis (CF). The normal CFTR gene codes for an ion channel protein that regulates the movement of chloride ions across the cell membrane. The most common mutant allele (ΔF508) codes for a defective protein that does not get established in the cell membrane and thus cannot regulate the proper movement of chloride ions. The result is the buildup of a thick mucus layer in the lungs that allows for frequent bacterial infections. Left untreated, CF is a lethal condition. The coding regions of the CFTR gene are over 250,000 base pairs (250 kb) long. The ΔF508 mutation is located approximately 15,000 base pairs into the coding region.

The normal allele has, as part of its sequence along the coding strand (not the template strand), the


The ΔF508 allele has along the coding strand the corresponding sequence:


• First, determine the mRNA sequences that are transcribed from both the normal and the ΔF508 allele.
• Next, use the genetic code below to determine the amino acid sequences that both alleles encode.
• What is the specific mutation in the ΔF508 allele?


Now that you have read this tutorial and worked through the case study, go to ANGEL and complete the tutorial practice problems  to test your understanding.  Questions?  Either send your instructor a message through ANGEL or attend an online office hour (the times are posted on ANGEL).