Students will identify homologous genes present in diverse animal species. These gene sequences will be aligned so that students can identify highly conserved and highly divergent regions.
Module Objectives (MO6.1, MO6.2, MO6.6, MO6.7):
- Describe biodiversity and phylogeny of important invertebrate animal phyla, including sponges, cnidarians, flatworms, nematodes, molluscs, annelids, arthropods, and echinoderms.
- Describe biodiversity and phylogeny of vertebrate animals.
- Explain how shared characteristics (homologous traits) and DNA are used to construct phylogenies.
- Apply the basic rules of Phylogenetic Systematics (cladistics) to show evolutionary relationships among taxa.
Lab Objectives:
- Search for and identify shared homologous genes in diverse animal species.
- Create a sequence alignment of shared homologous genes.
- Compare homologous genes to identify conserved and divergent regions.
- Explain how sequence alignment data can help clarify evolutionary relationships.
Materials/Supplies:
- Online access for databases
Description:
To complete this lab, you will explore the connection between sequence homology and evolutionary relatedness. This lab will involve three main exercises. To start, you will use a publically-available online database to identify proteins homologous to a provided human protein. These homologous proteins will come from a wide variety of animals. Once the proteins are identified and you have obtained each protein’s amino acid sequence, you will use a second program to create a sequence alignment (i.e. organized comparison of the sequences). This alignment will allow for the quick identification of regions that are conserved across animal species, as well as regions that are more divergent. Lastly, you will use the sequence alignment information to make predictions concerning the relatedness of the represented species.
Exercise 1: Finding a protein sequence
For this first exercise, you will use an online database from the National Center for Biotechnology Information (NCBI). Because many researchers have deposited protein sequences into these databases, they can be used to find the sequence for a gene of interest. These databases are also useful in revealing other sequences that are very similar and potentially had the same evolutionary origin (i.e. homologous).
- Using a browser, navigate to the National Center for Biotechnology Institute’s (NCBI) webpage (https://www.ncbi.nlm.nih.gov/).
- In the search box at the top, select protein from the drop-down menu.
- You can enter the name of a species and protein name in the search area. For this lab, you will be investigating proteins that are homologous to human p53 (an important regulator of the cell cycle). To find the protein sequence of this enzyme, type Homo sapiens and p53 in the box. Click the Search button.
- A page containing your search results should now appear. There will likely be many results, and these may represent the same sequence found in different sources. Some may even be partial sequences belonging to different proteins. Carefully choose the correct (i.e. desired) protein of interest by clicking on the FASTA link below it (FASTA form is a text-based format for representing nucleotide or amino acid sequences).
- The new page that opens will contain the amino acid sequence of Homo sapiens p53. Select and copy this amino acid sequence (including the unique ID and species). Paste this sequence into a word document. It is important to keep track of the species that this sequence belongs to.
- Now, you will use the same process to identify p53 homologs in other species. You will align ten sequences of your choice. In your selections, choose at least one additional mammal species, two bird species, one amphibian species, and one invertebrate species.
- To find the homologs, use the same search process with the scientific name of the species (i.e. look it up) and the name of the protein (p53). Make sure you select full sequences and not “partial” ones. For each homolog, copy/paste the fasta sequence (including the unique ID and species) into your word document.
Exercise 2: Aligning homologous sequences to identify conserved and divergent regions
- Using a browser, navigate to the European Bioinformatics Institute’s (EBI) webpage https://www.ebi.ac.uk/
- Select the Services link at the top.
- Select the link for Clustal Omega, a multiple sequence alignment program that can be used with DNA and protein sequences.
- In the first step, select PROTEIN as your sequence type. Then, copy/paste all of the fasta sequences that you have collected into the provided textbox.
- In step 2, select the output format ClustalW with character counts. Lastly, submit your job. When ready, the results will appear in a new window.
- Once the alignment is available, click the Show Colors link at the top to make it easier to view the identical amino acids. This will give the same color to the amino acids that are the same. You will also see a variety of symbols. An asterisk (*) means that the sequences are identical at that position; a colon (:) indicates conserved substitutions (same colour group); and a period (.) refers to semi- conserved substitution (similar shapes). Colors group the amino acids by characteristics. Red are small, hydrophobic, aromatic; blue are acidic; magenta are basic; green are hydroxyl, amine, amide, basic; and gray are the rest.
- Use your mouse to select your entire alignment. Copy/paste this into your word document (the colors may not show up, but that’s ok). Provide a title for your alignment.
- Select the Results Summary. Choose the link for the Percent Identity Matrix.
- Use your mouse to select the Percent Identity Matrix. Copy/paste this into your word document. Provide a title for your alignment.
- Using the information in the matrix, determine the similarity between the sequences In the word document, identify the two most closely related species and indicate the percent similarity between them. Also identify the two least related species and indicate the percent similarity between them. A video explanation of the Percent Identity Matrix is found in the Blackboard course under Lab 6. This should help you process this question. https://youtu.be/UP4nUMhBzrk?si=uEbr-oSYaE0OdpYa
Exercise 3: Inferring evolutionary relatedness from a sequence alignment
- To view how the sequence similarities may correlate with their evolutionary history, select the link for Phylogenetic Tree.
- Take a screenshot of the generated cladogram and paste this into your word document. Provide a title for your alignment.
Questions for further analysis:
- Sequence alignments can be created using nucleotide and amino acid sequences. Which one would be preferred for analyzing evolutionary relationships? Explain your reasoning.
- Based on your sequence alignment and cladogram, which species have diverged most recently?
- In viewing your alignment, some amino acids are conserved and some are not. Why do you think some of the amino acids have changed due to mutation while others have remained unchanged across all species?
- Explain how sequence alignment data can be used to clarify evolutionary relationships.