Comparative Protein Structure Modelling: Practical exercises

Lorenza Bordoli


Visualization and analysis tool:

DeepView: [tutorial][download]

Web servers:

Databases:

PDB: repository for 3-D biological macromolecular structure data.
Swiss-Prot: Protein knowledgebase.



Part 1:
Homology Modelling: getting started.

In the first part of the practical exercises, we are going to build an homology model for the human protein, Cyclin A1 (Swiss-Prot entry P78396).
1) First have a look at the Swiss-Prot entry to learn more about the function of this protein. Is there any protein domains described for Cyclin A1? (Hint: you can either look at the InterPro annotations for this entry, or use the "Sequence Feature Scan" from the Tools section of the SWISS-MODEL Workspace
to run "InterPro domain scan".

2) Now we can search for a suitable template for building the model: please use the "Template identification" from the Tools section of the SWISS-MODEL Workspace. How many templates are available? Can you build a model for the entire protein? How do the different templates differ? How do you judge the alignment between target and template? (Hint: Exhaustive information about the individual templates is available directly from the template selection output as link to the SWISS-MODEL template library and external resources).

3) Build an homology model for the human Cyclin A1 protein using one of the available templates. Since the alignment between target and template is trivial, you can use the "Automated mode" of the Modelling tools of the Swiss-Model Worksapce, by specifing the desired template. How is the quality of the obtained model?


Part 2: Homology Modelling of TMPRSS3: the SRCR and the serine protease domain.

While the automated approach yields satisfying results for closely related proteins, modelling of proteins based on templates with remote homology requires more user intervention as presented in the case of the human Transmembrane Protease 3 (TMPRSS3). The human transmembrane protease (Swiss-Prot P57727) consists of 3 domains: the LDL receptor domain, the SRCR domain and the serine protease domain. Information about the location and the function of the different domains, can be retrieved by consulting the InterPro entry for this protein or by running "InterPro domain scan" tool as in Part 1 of this practical exercise.

1) First search for a suitable template to build the model for the TMPRSS3 protein. For which domains of the protein do you find an homologous proteins that can be used as template?
You might have find out that there is a template (PDB ID 1z8g, Chain A) which could be used to build a model for the SRCR and the serine protease domain of the protein. If you have a look at the alignment you will notice that the target-template alignment is not so "trivial" as for the Cyclin A1 example. For this reason, in order to improve the target-template alignment, we are going to generate a multiple sequence alignment (MSA) between the target and the template sequences and some additional related homologous of this protein family. Multiple sequence alignments between target, template and related sequences perform better than a simple target/template pair-wise sequence alignmnet.

2)
In order to search for homologous of this protein family you are going to run a BLAST against the Swiss-Prot protein database with your target protein (only the 2 C-terminal domains!!) as query [target.fasta].

3) Multiple sequence alignment (MSA) of the target, template and related sequences can be then calculated using the T-Coffee MSA tool.

4) Once you have obtained your MSA, you can then build a model for the SRCR and the serine protease domain of the protein, using the "Alignment Mode" of the Modelling section of the SWISS-MODEL Workspace. How do you judge the quality of the obatined model. Can you identify regions with positive values (in red) corresponding to an unfavourable energy environment? Can you find any reasons why these regions of teh model have an unfavourable energy environment?


Part 3: Homology Modelling of TMPRSS3: the LDL receptor domain.

Now we are going to model the LDL domain of the protein, whose amino acid sequence can be downloaded here [txt]. Please save a file with the sequence locally on your PC.

1) First of all we are going to look for a suitable template, e.g. using SAM HMM search algorithm accessible from the Tools section (Template identification) of the SWISS-MODEL Workspace. How many different hits do you detect? How do they differ? To build the model we will use the structure corresponding to the PDB entry 2gtl chain N as template.
Please save chain N of the PDB entry 2gtl locally on your PC with the help pf DeepView:
- DeepView->File->Import ...
- type 2gtlN in the window and the press the ExPDB File button.

2) In the second step we are going to build an alignment between the target, the template and related sequences as we did in Part 2 of this exercise. Please repeat steps 2-3 of Part 2.

3) Once we have obtained the MSA alignment we can then build a model for the LDL domain, using the "Alignment Mode" of the Modelling section of the SWISS-MODEL Workspace, as we did in step 4 of Part 2.

4) Once we have obtained the results of the modelling, we can analyse the results with the help of the DeepView program (DeepView-> File-> Open PDB File ...).
Please Carefully check the alignment (DeepView->Window->Alignment) between your target and the template sequences. If needed, amend the alignment *. * To amend the alignment check: Prosite Patterns location, disulfid bridges, Ramachandran Plot, ...
- Then save a "Project" file (File->Save->Project...)
- And submit this file to the Project Mode of the SWISS-MODEL Workspace, in the Modeling section, Project Mode of the server.


Once you have obtained the model, please answer the following questions:

Are you sure that your model has the correct protein fold? Which structural features are characteristic for this protein domain?
Hint: superpose the model and the template structure (DeepView-> Fit-> Iterative magic Fit) and check the residues around the Ca2+ binding site.


Part 4: Model quality evaluation

A 3D model of the Drosophila UDP-glucose 4-epimerase protein has been generated by homology modeling. The structure of the Human homolog protein Q14376 has been used as template. The PDB ID for the template is 1ek5. Two different models, Model1 and Model2 have been obtained: they differ in the alignment between the target and the template.
Evaluate the two models by checking the following criteria:

Which of the model would you trust more and why?

 

Output and input of the different web based programs (as of 26.08.07):

Human Cyclin A1, InterPro Scan [output]
Human Cyclin A1, Template Selection [output]
Human Cyclin A1, Modelling using 1finB template [output]
Human TMPRSS3, Template Identification [output]
Human TMPRSS3, SRCR & Protease Blast [input][output]
Human TMPRSS3, SRCR & Protease MSA [input][output]
Human TMPRSS3, SRCR & Protease domain model [output]
Human TMPRSS3, LDL domain, Template Identification [output]
Human TMPRSS3, LDL domain, MSA [input] [output]
Human TMPRSS3, LDL domain, "Alignment Mode" model [output html] [output.pdb]
Human TMPRSS3, LDL domain "Project Mode" model [input] [output html][output pdb]
ANOLEA and Gromos output for Model1 [html]
ANOLEA and Gromos output for Model2 [html]