SSP - Secondary structure propensities from chemical shifts

AUTHOR: Joseph Marsh (joseph.marsh@igmm.ed.ac.uk)
MRC Human Genetics Unit
Institute of Genetics and Molecular Medicine
University of Edibnurgh

Formerly:
Department of Biochemistry, University of Toronto
Molecular Structure and Function, Hospital for Sick Children

-Feel free to e-mail me with any questions or comments!

REFERENCE:
Marsh, J.A., Singh, V.K., Jia, Z. and Forman-Kay, J.D. 2006. Sensitivity of secondary structure propensities to sequence differences between alpha- and gamma-synuclein: Implications for fibrillation. Protein Science 15(12)2795-2804

INTRODUCTION:
SSP combines chemical shifts from different nuclei into a single secondary propensity (SSP) score representing the expected fraction of alpha or beta secondary structure at a given residue. The contribution of different chemical shifts are weighted by their sensitivity to alpha- and beta-structure.

USAGE:
The program can be run like any Perl script. Unix/Linux/OSX - most likely Perl is already installed. Windows - Install ActiveState Perl or Cygwin.

perl ssp -s <SEQFILE> -ca <SEQFILE> ...

EXAMPLE:
perl ssp -s eg.seq -ca eg.ca -cb eg.cb -r

FLAGS:
  -s eg.seq: The sequence of your protein in single letter format (REQUIRED)

Chemical shift flags - at least one of these is required:
  -ca eg.ca: Calpha chemical shifts
  -cb eg.cb: Cbeta chemical shifts
  -co eg.co: Carbonyl chemical shifts
  -ha eg.ha: Halpha chemical shifts
  -hn eg.hn: Backbone amide proton chemical shifts
  -n eg.n: Backbone amide nitrogen chemical shifts

  -r: Rereferencing flag. Carbon chemical shifts will be rereferenced. The amount will be shown at the top of the output file. Both CA and CB shifts are required for this (OPTIONAL)
  -o 0.2: Referencing offset will be adjusted by the value given. This occurs after rereferncing if -r is given. (OPTIONAL)
  -m 5: Weighted averaging will occur over the number of residues given. The default is 5. (OPTIONAL)
  -f -1: Offset sequence numbering of the chemical shift files by the value given. Residue 1 must correspond to the first amino acid in the sequence file. (OPTIONAL)
  -d: Secondary chemical shifts flag. Instead of SSP scores, secondary chemical shifts will be output. Use only one type of chemical shift, or if CA and CB are given, CA-CB secondary chemical shifts will be output. No averaging is used. (OPTIONAL)
  -t 10-20: Total average secondary structure over the range give will be output. To show for the whole protein, just use -t 1. These values will be shown at the bottom of the output. (OPTIONAL)

OTHER CONFIGURATION OPTIONS:
Various options can be set by changing variables in the CONFIG section of the ssp file.

-Random coil chemical shift files: $csref{CA}, $csref{CB}...
-Secondary structure chemical shifts and standard deviations: $ssref{CA}...
-Weighting of chemical shift types: $strand_bias{CA}, $helix_bias{CA}...
	These should probably all be left at 1.
-SSP limit: $sslimit [set as 1 for true, 0 for false]
	This sets the max SSP score from a single atom in order to prevent dominance of the weighting by extreme outliers. The default is 1.2, this should be fine for disordered proteins but probably should be increased to at least 1.5 or 2 for folded proteins.
-Ignore residue before proline: $ignore_pro [1 - true, 0 - false]
	This ignores residues immediately preceding prolines which are often extreme outliers. Set to 1 by default.
-Ignore cysteines: $ignore_cys [1 -true, 0 - false]
	If there are disulfides in the protein, they can cause strange results. This is set to 0 by default but should be turned on when working with folded proteins with disulfides. Alternatively, known disulfides could be changed from C to X in the sequence file, thus causing them to be ignored.
-Ignore beta glycines: $ignore_gly [1 - true, 0 - false]
	Since glycine has no CB and CA is a poor measure of beta structure, glycines in the beta structure region often are extreme outliers. This is off by default and should not be necessary if weighted averaging is on since these values are very weakly weighted.
-Ignore residues with no chemical shifts: $skip_blank [1 -true, 0 - false]
	This will not give a SSP score for residues with no chemical shifts if on. Else it will give a value calculated from the weighted average around it. On by default.
-Weighted averaging: $mavg
	This is equivalent to the -m flag above. Note that this is not a normal moving average because it is weighted by the sensitivity of different chemical shifts for secondary structure. The default is 5.

REFERENCE CHEMICAL SHIFTS:
We recommend using random coil and secondary structure chemical shifts and standard deviations from RefDB (Zhang et al. 2003 J Biomol NMR 25:173-195) provided as refdb.ca, refdb.cb, refdb-ss.ca, etc. These are based on a large number of properly referenced chemical shifts from proteins with known structures. Also provided are shifts from (Wang & Jardetzky. 2002. Protein Sci 11:852-861) as wj.ca, wj-ss.ca, etc. These shifts give fairly similar results but are based on a smaller number of shifts so we prefer the RefDB set. Finally, the commonly used random coil shifts from (Wishart et al. 1995 J Biomol NMR 5:67-81) are provided as ala.ca, ala.cb, etc. Any other set of chemical shifts could easily be used by following format of these provided files.

NOTES:
-To ignore certain residues, the easiest thing to do is change them to the letter X in the sequence file. This would be important for things like phosphorylated residues or oxidized cysteines because we don't have proper reference chemical shifts.
-We recommend only using CA, CB and HA chemical shifts for disordered proteins. HN and N are not that useful although interesting sometimes to try by themselves. CO tends to give some strange results as it is much more sensitive to local sequence and may be subject to misreferencing even if CA and CB are properly referenced due to pulse calibration (Wishart & Case 2001. Methods Enzymol 338:3-34).
-Rereferencing with -r and manual offset with -o are only applied to carbon atoms. To adjust referencing of other atoms, the input chemical shift files need to be changed with reref. For example, to rereference HA by 0.2 use:
perl reref -s eg.seq -ha eg.ha -o 0.2

****************************

reref - A separate rereferencing script

reref uses the same re-referencing algorithm as SSP and nearly the same syntax. It takes as input a sequence file and chemical shift files and outputs rereferenced chemical shift files. Either automatic or manual rereferencing can be selected.

USAGE:
perl reref.pl -s <SEQFILE> -ca <SEQFILE> ...

The flags are the same as used by ssp.pl:
  -s eg.seq: The sequence of your protein in single letter format (REQUIRED)

Chemical shift flags - at least one of these is required:
  -ca eg.ca: Calpha chemical shifts
  -cb eg.cb: Cbeta chemical shifts
  -co eg.co: Carbonyl chemical shifts
  -ha eg.ha: Halpha chemical shifts
  -hn eg.hn: Backbone amide proton chemical shifts
  -n eg.n: Backbone amide nitrogen chemical shifts

  -f -1: Offset sequence numbering of the chemical shift files by the value given. Residue 1 must correspond to the first amino acid in the sequence file. (OPTIONAL)

Rereferencing flag - at least one of these is required
  -r: Rereferencing flag. Carbon chemical shifts will be rereferenced. The amount will be shown at the top of the output file. Both CA and CB shifts are required for this
  -o 0.2: Referencing offset will be adjusted by the value given. This occurs after rereferncing if -r is given.

Output files will have _reref appended. For example eg.ca -> eg_reref.ca

NOTE: While SSP only applies rereferencing to carbon chemical shifts, reref.pl applies rereferencing to ALL chemical shifts.


*************************

CHANGES:
Nov 2009 -  '==' changed to 'eq' in line 220. This could cause problems if CACB rereferencing was used, it would also be applied to non-carbon atoms. 
	- Alanine 15N in wj.n corrected (132.52->123.52). This file wasn't used by default so hopefully this didn't cause any problems.
Oct 2010 - -t flag for total secondary structure wasn't using selected region fixed.
	-Removed TraDES and ENSEMBLE options, those are obsolete
Jan 2014 - fixed automatic rereferencing, which wasn't working in some cases if -r flag used
Jul 2015 - added missing N, M and P values to refdb.co, fixed a couple minor typos
