mkdssp - Calculate secondary structure for proteins in a PDB file
mkdssp [OPTION] pdbfile [dsspfile]
The mkdssp program was originally designed by Wolfgang Kabsch and Chris Sander to standardize secondary structure assignment. DSSP is a database of secondary structure assignments (and much more) for all protein entries in the Protein Data Bank (PDB) and mkdssp is the application that calculates the DSSP entries from PDB entries. Please note that mkdssp does not predict secondary structure.
If you invoke mkdssp with only one parameter, it will be interpreted as
the PDB file to process and output will be sent to stdout. If a second
parameter is specified this is interpreted as the name of the DSSP file
to create. Both the input and the output file names may have either .gz
or .bz2 as extension resulting in the proper compression.
-i, --input filename
The file name of a PDB formatted file containing the protein
structure data. This file may be a file compressed by gzip or
bzip2.
-o, --output filename
The file name of a DSSP file to create. If the filename ends in
.gz or .bz2 a compressed file is created.
-v, --verbose
Write out diagnositic information.
--version
Print the version number and exit.
-h, --help
Print the help message and exit. The directory containing the
parser scripts for mrs.
The DSSP program works by calculating the most likely secondary structure assignment given the 3D structure of a protein. It does this by reading the position of the atoms in a protein (the ATOM records in a PDB file) followed by calculation of the H-bond energy between all atoms. The best two H-bonds for each atom are then used to determine the most likely class of secondary structure for each residue in the protein. This means you do need to have a full and valid 3D structure for a protein to be able to calculate the secondary structure. There's no magic in DSSP, so e.g. it cannot guess the secondary structure for a mutated protein for which you don't have the 3D structure.
The header part of each DSSP file is self explaining, it contains some
of the information copied over from the PDB file and there are some
statistics gathered while calculating the secondary structure.
The second half of the file contains the calculated secondary structure
information per residue. What follows is a brief explanation for each
column.
Column Name Description
# The residue number as counted by mkdssp
RESIDUE The residue number as specified by the PDB
file followed by a chain identifier.
AA The one letter code for the amino acid. If
this letter is lower case this means this
is a cysteine that form a sulfur bridge
with the other amino acid in this column
with the same lower case letter.
STRUCTURE This is a complex column containing
multiple sub columns. The first column
contains a letter indicating the secondary
structure assigned to this residue. Valid
values are:
Code Description
H Alpha Helix
B Beta Bridge
E Strand
G Helix-3
I Helix-5
T Turn
S Bend
What follows are three column indicating
for each of the three helix types (3, 4
and 5) whether this residue is a candidate
in forming this helix. A > character
indicates it starts a helix, a number
indicates it is inside such a helix and a
< character means it ends the helix.
The next column contains a S character if
this residue is a possible bend.
Then there's a column indicating the
chirality and this can either be positive
or negative (i.e. the alpha torsion is
either positive or negative).
The last two columns contain beta bridge
labels. Lower case here means parallel
bridge and thus upper case means anti
parallel.
BP1 and BP2 The first and second bridge pair
candidate, this is followed by a letter
indicating the sheet.
ACC The accessibility of this residue, this is
the surface area expressed in square
ngstrom that can be accessed by a water
molecule.
N-H-->O..O-->H-N Four columns, they give for each residue
the H-bond energy with another residue
where the current residue is either
acceptor or donor. Each column contains
two numbers, the first is an offset from
the current residue to the partner residue
in this H-bond (in DSSP numbering), the
second number is the calculated energy for
this H-bond.
TCO The cosine of the angle between C=O of the
current residue and C=O of previous
residue. For alpha-helices, TCO is near
+1, for beta-sheets TCO is near -1. Not
used for structure definition.
Kappa The virtual bond angle (bend angle)
defined by the three C-alpha atoms of the
residues current - 2, current and current
+ 2. Used to define bend (structure code
'S').
PHI and PSI IUPAC peptide backbone torsion angles.
X-CA, Y-CA and Z-CA The C-alpha coordinates
The original DSSP application was written by Wolfgang Kabsch and Chris Sander in Pascal. This version is a complete rewrite in C++ based on the original source code. A few bugs have been fixed since and the algorithms have been tweaked here and there.
The code desperately needs an update. The first thing that needs implementing is the improved recognition of pi-helices. A second improvement would be to use angle dependent H-bond energy calculation.
If you find any, please let me know.
Maarten L. Hekkelman (m.hekkelman (at) cmbi.ru.nl)
Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.
Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.
Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.
Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.
The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.
Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.
Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.
Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.