Introduction to protein sequences and structures analysis
ToolBox that could be useful for protein sequences analysis:
http://blast.ncbi.nlm.nih.gov/Blast.cgi
https://www.ebi.ac.uk/interpro
After cloning and sequencing of coding DNA, the sequence of the X protein had been determined. The sequence of X is given here:
LAAVSVDCSEYPKPACTLEYRPLCGSDNKTYGNKCNFCNAVVESNGTLTLSHFGKC
In normal conditions, this X protein is expressed but we have no idea about it function. The goal of this practical work is to collect the maximum of information about structure and function of the X protein.
I - Search Patterns, Profiles
A way to identify the function of X is to look if it contains signatures (pattern) of a function or a protein family.
2 options:
http://prosite.expasy.org/scanprosite/
NPS@ and follow the link "ProScan: scan a sequence for sites/signatures against PROSITE database" (activate: Include documentation in result file).
Question
-
Which signature(s) could you identify? Which specific features in this protein?
-
Try to change the parameters and comment the results.
Note
InterPro gives a summary of several methods. You can find it at the EBI.
Keep the signatures that could attest the function in your notepad.
- What do you think about the function of X?
II - Search homolog proteins with BLAST
-
Go to the NCBI BLAST page
-
Choose the Protein Blast (blastp)
-
Paste your sequence
-
Select the Swissprot database
Question
Did you identify homologs? What are their function(s)?
III - Multiple sequences alignment
-
Select several homolog sequences from the Blast results.
-
Perform a multiple sequence alignment (MSA) of these sequence using Clustal Omega for example
-
Try other MSA tools (for example Tcoffee and Muscle)
Question
Do you observe differences between the results obtained from different algorithms?
What can you observe in these MSAs?
Info: You could also retrieve the selected sequences in Fasta format and perform MSAs elsewhere
Clustal Omega and Muscle: available in Seaview alignment viewer
Tcoffee: http://tcoffee.vital-it.ch/apps/tcoffee/index.html
Other tools: http://expasy.org/genomics/sequence_alignment
IV - The Y protein
Another experiment had shown that the X protein was interacting specifically with another protein: Y.
After purification of the active Y protein, from the complex, a partial sequence of Y was obtained (by protein extremity sequencing).
The corresponding peptide could be:
ISGGD or ISGGN
1. Identification of the Y sequence using PROSITE patterns
-
Design the pattern (regular expression) corresponding to these peptides.
-
Search the sequences containing this pattern in SwissProt using PATTERN SEARCH at SIB or PATTINPROT at NPS@.
If needed, use the help to design your pattern.
Question
How many results do you get? How can you identify the right one?
Once the Y protein sequence identified, copy the FASTA sequence in your notepad.
2. Composition analysis
After purification of the Y active protein, the amino-acid composition has been determined (% of each aa in the protein) and is given in the following table:
A | 8.11 | F | 2.70 | L | 3.78 | R | 4.32 | X | 0 |
B | 0 | G | 17.30 | M | 1.08 | S | 11.89 | Y | 5.41 |
C | 2.16 | H | 1.08 | N | 5.41 | T | 15.14 | Z | 0 |
D | 3.78 | I | 3.78 | P | 2.70 | V | 7.57 | ||
E | 1.08 | K | 0.54 | Q | 1.08 | W | 1.08 |
-
Compute the composition of the sequence that you retrieve. Use PROTPARAM or the tool 'Amino-acid composition' at NPS@
-
Compare this computed composition with the composition of Y experimentally determined.
Question
Do you observe differences? Explain.
3. Search pattern in Y
Once the correct sequence of Y obtained, keep it in your notepad, you will need it for the following analyses.
Question
Identify the signatures (motifs, Pfam profiles) of Y using PROSCAN and/or Interpro.
4. Identification of homologs to Y
-
Use NCBI BLASTP or NPS@ BLASTP against SwissProt database to search sequences similar to Y.
-
Use PSI-BLAST (with SwissProt) to see if you can detect more distant sequences.
-
Select sequences from BLAST and/or PSI-BLAST results to perform a multiple sequence alignment.
Question
- Did you observe difference in the results of BLAST and PSI-BLAST? Comment.
- Propose a strategy to retrieve all the proteins having the same catalytic activity as Y protein.
V - Secondary structure prediction for X and Y
-
Go to the consensus secondary structure prediction page at NPS@.
-
Analyze the secondary structure of the protein Y. Include secondary structure predictions by methods (DPM, GOR1, PREDATOR, SIMPA96).
Question
- Conclude on the organization of secondary structures.
- Perform the same analysis for X protein.
VI - Comparison with solved structures
1. The Z protein
The structure of a protein Z has just been published. The sequence of protein Z is shown below:
IAGGEAITTGGSRCSLGFNVSVNGVAHALTAGHCTNISASWSIGTRTGTSFPNNDYGIIRHSNPAAANGRVYLYNGSYQD
ITTAGNAFVGQAVQRSGSTTGLRSGSVTGLNATVNYGSSGIVYGMIQTNVCAQPGDSGGSLFAGSTALGLTSGGSGNCRT
GGTTFYQPVTEALSAYGATVL
Question
Could you use this information for the study of protein Y? Make your own analysis.
2. Find the correct structures
-
Download and install Deep-View - SwissPDBViewer. You can find the tutorial and user guide of DeepView here.
-
Download to the archive PDB_files_part6.zip and unzip it.
-
You might find 8 PDB files in the directory.
-
Open them with DeepView.
-
Display the secondary structure representation mode (see part VII-A-5 and/or the user guide).
Question
Try to identify the structures corresponding to X and Y proteins.
VII - Tridimensional protein structure: Play with 3D structures using SwissPDBViewer (DeepView)
-
Go to the Protein Data Bank
-
Search and download the following PDB files: 1CRN, 1LDM.
You will visualize these protein structures using DeepView
A - Analyze protein structures with DeepView
1. Load a 3D structure
File => Open
Choose the 1CRN.pdb file that you have downloaded from the PDB.
2. Visualize the number of chains
Is it only the protein or can we find ligands? Is it a monomer or a polymer?
3. Visualize the general shape
Try to get the actual space taken by the molecule. You need to use the control panel and use the ':v' column to activate the space-filling spheres representation (+ menu Display > Render in solid 3D).
Test also the Slab mode to visualize the space within the molecule: Display > Slab
4. Display a distance between 2 atoms, angle between 3 atoms
Use the graphical panel. You can now measure the real dimensions of your protein
5. Visualize secondary structure elements
In the control panel, activate "ribbon" (rbn). You can also color the molecule by secondary structures.
6. Visualize ligands (if there is any)
Select and color them. You could also remove the rest, or better, have a look at the residues that are around those ligands (radius function in the graphical panel).
7. Analysis of other protein structures
The teacher will give PDB codes of other structures to analyze. Choose DeepView or Rasmol/Jmol to do so, that is up to you!
B - Optional: if you want to use RasMol/Jmol
1. Load a 3D structure
File => Open
Choose the 1CRN.pdb file that you have downloaded from the PDB.
HELP SECTION FOR RASMOL
Molecule main moves with the mouse:
Left button: XY rotation
Left button + Shift: Zoom
Right button: Translation
Right button + Shift: Z rotation
Keep the graphical window and the command (text) window on your screen (> is a command to type in the text window).
For each selection (SELECT command), the number of selected atoms appears in the text window. After you can apply an action to be able to visualize the elements that you have selected (e.g. COLOR GREEN).
Ctrl+Z does not exist in Rasmol. You can type the command RESET.
If you want to come back in a standard representation of your molecule, type:
SELECT ALL
CPK
=> This will reset previous actions on representation modes (but keep colors). CPK: space-filling spheres representation
COLOR CPK: colors \'atom\' objects by the atom (element) type
Help for Jmol:
A lot of "actions" (color, selection...) are available by right clicking on the main screen
To get the terminal window: menu File > Console
2. Example: visualize the disulfide bonds
Type in the text window
SELECT CYS
The text window \"answers\" 36 atoms selected (selected cysteine's atoms)
COLOR GREEN
- Observe the graphics window.
RESTRICT CYS
- Compare with the SELECT command
Highlight the disulfide bonds:
SSBONDS
COLOR YELLOW
SSBONDS 75
COLOR CPK
3. Visualize secondary structure elements
SSBONDS OFF (remove SS bonds)
SELECT ALL
CARTOONS
COLOR STRUCTURE
4. Display a distance between 2 atoms
Activate the compute distance mode typing:
SET PICKING DISTANCE
Then, you can click the 2 atoms.
You can display angle values typing:
SET PICKING ANGLE
Then pick the 3 atoms
5. Other useful commands
SHOW SEQUENCE
SHOW INFO
SELECT ALL
CPK ON
RESTRICT NOT HOH (remove water molecules)
CPK OFF
HBONDS
SELECT CYCLIC AND NOT PRO
STEREO ON
Try them to better understand the Rasmol command language.
6. Store a command script and reload it
Repeat the actions described in paragraph 2
WRITE SCRIPT MY_SCRIPT.SC
Exit the software (File => Quit)
Restart the software
SOURCE MY_SCRIPT.SC
7. Select the atoms in a sphere
File => Close
Load the file 1LDM.pdb
Discover and analyze the molecule (number of channels, ligands, etc.)
To select all the atoms in a 3Å radius sphere centered on a ligand (e.g. NAD)
SELECT ALL
COLOR CHAIN
SELECT WITHIN (3.0, NAD)
CPK
Option => Slab Mode (comment).