phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Validation tools in Phenix

Overview
Running validation
Geometry restraints outliers
Validation of protein geometry
All-atom contacts
Real-space correlation
POLYGON
Suggestions for improving models
References

Overview

This document covers the use of the validation software in the Phenix GUI, which is run both as a standalone program (phenix.validate) and as part of phenix.refine. Much of this software is derived from the Molprobity web server. Some programs are available as command-line tools; their use is covered in the last section. There are two versions of validation in the GUI: one which performs only geometry validation (phenix.validate_model), and one which also compares the model to an electron-density map and calculates R-factors (phenix.validate). Since the geometry results are identical, this document covers the latter program. The analyses performed by Phenix include:

  • identification of outliers from restrained geometric parameters (bonds, angles, etc.)
  • Ramachandran plot and outlier info
  • identification of disfavored protein sidechain rotamers (and chi1-chi2 plot)
  • all-atom contact analysis with explicit hydrogens (including clashscore)
  • suggestions for asymmetric sidechain flips (Asn, Gln, His)
  • real-space correlation to electron density (RSCC)
  • plot of B, RSCC, and density values by residue number
Any non-protein molecule wil be included in the real-space fit evaluation, and if CIF files are provided, geometric outliers will be detected as well. Comprehensive validation of nucleic acid structures is planned for a future release. All of the outliers listed in the GUI link to supported molecular graphics programs (currently Coot, PyMOL, and the internal viewer) if launched from within Phenix. Clicking an outlier will recenter the graphics window on that atom or residue, and in some cases will also highlight the relevant selection. Certain outlier lists are also automatically displayed from within Coot in a separate window, with similar behavior. When the real-space correlation or refinement is performed, buttons in the new tabs will open the resulting maps and model in Coot.

Running validation

If you are performing refinement with the *phenix.refine* GUI, the validation will be performed automatically as soon as the final model is ready. To validate a model with diffraction data, launch phenix.validate from the command-line or the main Phenix GUI. At a minimum, you will need a PDB file, and a reflections file containing intensities or amplitudes and R-free flags. Most parameters should be extracted automatically.

images/validate_gui.png
In both programs, following an initial tab reporting basic program output and statistics such as R-free, there will be up to three validation tabs, one for geometry restraints, one for advanced geometry validation, and one for real-space correlation. If a problem was encountered while calculating geometry restraints, that tab will be absent but the other analyses will still be displayed.

Geometry restraints outliers

Phenix, like most crystallographic software, uses the Engh and Huber (1991) restraints for proteins, nucleic acids, and other common molecules, here in the form of the CCP4 monomer library. Other restraints come from CIF files provided by the user; these can be generated by phenix.elbow and associated programs. These restraints are used in refinement to prevent distortions of model geometry, and to increase the observation-to-parameter ratio. The default restraints are for bond lengths, bond angles, dihedral (torsion) angles, chiral centers, planar groups (such as aromatic rings), and nonbonded (VDW) interactions. All of these are analyzed here except for the nonbonded restraint outliers, which are made redundant by the much more thorough all-atom contact analysis (see below).

images/validate_restraints.png
All restrained values have an associated "sigma", i.e. the observed standard deviation from that value in very high-resolution structures; this is used to weight the restraint during refinement. Large deviations (greater than 4 sigma) from the restrained values usually indicate something seriously wrong with the model, and should not be taken as examples of genuine structural strain unless the electron density for that feature is exceptionally good (usually only at ultra-high-resolution, i.e. better than 1.0 Angstrom) and the deviation is no more than approximately 7 sigma. At moderate-to-low resolutions, there should be almost no outliers, although strained dihedral angles are possible.

Validation of protein geometry

The geometry of protein chains is restricted by additional empirical observations that are not part of the standard restraints. The classic example of this is the Ramachandran plot, which sets limits on the combination of Phi (CA-C-N-CA) and Psi (N-CA-C-O) dihedral angles for any residue. In Phenix and Molprobity, the standard distribution of values is taken from the Top500 database of high-resolution protein structures. The main window contains a list of scored outliers (the lower the score, the worse the residue), and all residues are plotted graphically in a separate window. The scoring depends on the residue type and relative position; Gly, Pro, and adjacent residues depend on different distributions, all of which can be viewed in the plot window.

images/validate_rama.png
images/rama_plot.png
Rotamer comparisons also use the Top500 database, and all sidechain Chi angles for a residue are evaluated together. Any residue which has an outlying sidechain dihedral restraint is likely to be flagged by this analysis too. The distribution of Chi1-Chi2 angle combinations for each residue type is also plotted in a separate window, against a background distribution taken from the Top500 database. (Note that longer residues may be flagged as outliers based on the value of Chi3 or Chi4, even though they fall into the favored region of the Chi1-Chi2 plot.) Finally, a separate evaluation of C-beta positions is performed; deviations tend to be caused by incompatible sidechain and mainchain positions, which are usually due to incorrect fitting to the density. This will most likely be reflected in the comparison to restraints, with significant deviation from chirality and bond lengths or angles.
images/validate_rota.png
images/chi_plot.png
Target values for these statistics are given at the top of the tab. Many structures contain residues that have been correctly placed but are flagged as Ramachandran or rotamer outliers; however, there are rarely more than a handful of these. At resolutions below 3.0A, any outliers should be considered errors. No C-beta outliers are acceptable at anything worse than sub-atomic (< 0.8A) resolution.
images/validate_goals.png

All-atom contacts

Nonbonded restraints used in refinement can function with or without explicit hydrogen atoms; however, if hydrogens are absent, the atomic radii of other atoms will be increased to compensate. This approximation works decently on a global structural level, but often leaves chemically impossible geometries in place. Therefore, for this step, hydrogens are added to the model (proteins and nucleic acids only at this time) by phenix.reduce; this will first strip off any existing hydrogens. Reduce will flag residues whose sidechains require flipping based on hydrogen-bonding geometry and clashes caused by newly added hydrogens. These include Asn, Gln, and His, which are easily mis-fit due to the apparent symmetry of the sidechain without hydrogens.

images/validate_asn.png
Once hydrogens have been added, phenix.probe is run to analyze the atomic contacts. All atomic overlaps worse than 0.4A are listed, and a global "clash score" is reported for the entire structure (calculated as 1000 * (number of bad overlaps) / (number of atoms)). This should be as low as possible, although a value of zero is both difficult and unusual. We strongly recommend refining with explicit hydrogens at any resolution, as this typically improves overall model geometry (and rarely makes anything worse).
images/validate_probe.png
Since Coot is able to read the output of Probe and graphically display the contacts as dots and dashes, this information will be automatically read into Coot when run from Phenix. Display of the dots can be toggled using the external validation window popped up in Coot. Bad overlaps will be drawn as hot-pink dashes. It is usually much easier to visualize the cause of the problem if hydrogens are drawn as well.
images/probe_dots.png

Real-space correlation

This is the only part of validation that requires the underlying diffraction data. Phenix will perform bulk-solvent correction and scaling on the data and calculate a likelihood-weighted 2mFo-DFc map. This is compared to a map calculated from the model alone, and correlation coefficients for each residue are obtained. At resolutions better than 2.5 Angstroms, the values for individual atoms will also be displayed. In the GUI, these lists can be filtered and/or sorted by CC.

images/validate_rscc.png
Although real-space CC can be a useful indicator of poorly fit regions of a molecule, it should not be interpreted as an absolute score, and it is difficult to identify an ideal cutoff below which the score should never fall. However, it is generally useful to inspect the 2mFo-DFc and mFo-DFc maps for residues which score below 0.8. Some of these will be correctly placed but poorly ordered; this is not a problem as long as the geometry is within normal bounds. The B-factor, occupancy, and absolute density values should also be considered when evaluating the results, as these may indicate parts of the structure that are intrinsically disordered. These values are plotted together in 100-residue blocks in the GUI.

POLYGON

The program POLYGON (Urzhumtseva et al. 2009) has been ported to the Phenix GUI for comparing model quality indicators to similar structures in the PDB. Pre-computed values for a selection of 1000 structures determined at a similar resolution are plotted radially as one-dimensional histograms. The score for the model for each of these statistics is also plotted on the histograms, and the lines connecting these points form a complete polygon. For a high-quality, well-refined structure, the shape should be approximately symmetric and relatively small.

images/polygon.png

Suggestions for improving models

Except for the pseudo-symmetric sidechain flips performed by Reduce, there are currently no fully automatic corrections for problems identified in validation. phenix.autobuild can be used to rebuild problem regions of the structure, but this becomes more difficult as the quality of the map decreases or the resolution gets worse. Re-refinement with a slightly different protocol is often helpful; in particular, explicit hydrogens can help constrain the model. The weight applied to X-ray terms during refinement may need to be reduced in favor of geometric restraints; this can be done automatically by phenix.refine.

References

  • Molprobity : Chen VB, Arendall WB 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Cryst. 2010 D66:12-21
  • POLYGON : Urzhumtseva L, Afonine PV, Adams PD, Urzhumtsev A. Crystallographic model quality at a glance. Acta Cryst. 2009, D65:297-300.