Tutorial: Validation with MolProbity


This tutorial will show you how to do comprehesive model validation within the PHENIX graphical user interface (GUI). (Validation in the PHENIX GUI reviews many of the aspects of this tutorial.)


The example used for this tutorial is called Protein Kinase A (3dnd). To get the requisite files open the GUI and set up the tutorial, as described here, called Protein Kinase A (validation). This should setup the main/GUI window. On the right-hand side click Validation and then select Comprehensive Validation (MolProbity).


File inputs

In the "Comprehensive Validation" window, browse to the tutorial directory that you specified above, and select 3dnd.pdb as the "Input model", 3dnd.mtz as the "Reflections file", and 3dnd.ligands.cif as the "Restraints (CIF)" file. This CIF file defines restraints for the ligand in 3dnd. Now you are ready to run the program, select Run from the top menu bar.

Analyze validation outputs

When validation has run, click on the "Compare statistics" button. This will launch a tool called "Polygon", which is used to simultaneously compare R-work, R-free, RMSD-bonds, RMSD-angles, Clashscore, and Average B factor. Well-built models will usually have a small, fairly equilateral polygon, whereas larger or significantly asymmetric deviations are indicative of model problems. As you can see for 3dnd, both RMSD-bonds and RMSD-angles are a bit large, which could indicate some misfit areas of the structure that are causing geometric strain.


Close the "POLYGON" window and return to the comprehensive validation window. Click the "Open in Coot" button. When Coot loads, note that it says "Connected to PHENIX" in the toolbar - this allows the validation GUI to communicate with Coot.

Now select the "MolProbity" tab. Under this tab, you will see sub-tabs for "Summary", "Basic geometry", "Protein", and "Clashes". If your model had RNA, there would also be an "RNA" tab. Under the "Summary" tab are most of the same overall statistics that you would find when running the MolProbity webserver. Notice the high percentage of rotamer outliers, and large number of C-beta deviations (those combine geometry problems around the Calpha into a single measure, as explained in Lovell 2003).


Back in the "Comprehensive validation" window, click on the "Basic geometry" tab. Here you find a summary of all bond, angle, dihedral, chirality, and planarity outliers. Outliers will be listed in the associated lists, and each item is clickable, which will center in Coot. Note the bond outlier for Ile A 163 - we'll be seeing this residue again later.

Next, go to the "Protein" tab. This section contains validation information for Ramachandran and rotamer outliers, C-beta deviations, recommended Asn/Gln/His sidechain flips (these have NOT already been done for you, as you can tell if you click on His 39 in the flip list to see it in Coot), and non-trans peptide bonds.

Find the list of rotamer outliers, and click on Leu 27 from the A chain, which will center on this residue in the Coot window. As you can see in Coot, this orientation is not a terrible fit to the density - but it is a rotamer outlier and energetically unfavorable due to an eclipsed Chi angle, and it has a suggestive difference peak. We'll use the tools in Coot to fix this sidechain. In Coot, click Calculate-> Model/Fit/Refine to bring up the window of modeling tools. First, we need to select a map, which you can do by pressing the "Select Map" button. Choose the 2FOFCWT map. Next, select "Auto Fit Rotamer", and then click on an atom in the Leu A 27 sidechain. Did you see it rotate ~180 degrees? These kinds of misfit Leu residues are common in crystal structures, but are easy to identify and fix. Change your point of view (maybe center on C-gamma) and Click "Undo" and "Redo" a few times until you are comfortable with how this change is carried out. Similarly, you can correct many of the other sidechain outliers in the GUI list.


Return to the Validation GUI window, and navigate to the list of C-beta position outliers. Select "Ile 163" from the A chain. Do you recall this sidechain from the bond-length outlier list? Misfit sidechains will often have multiple diagnostic indicators of a problem, which is useful in easily identifying the worst offenders. Notice the blob of positive density near to the sidechain, indicating that it may not be in an optimal position. Correcting misfit sidechains such as these can be tricky, as many rounds of refinement have caused distortions in the model to accommodate the misfit.

One approach that works well to fix this type of problem is to first mutate the offending residue to an alanine, and then run real-space refinement. These steps allow the C-beta position to be properly refined, without being trapped by the other misfit sidechain atoms (similar to the backrub in KiNG). To do this, select "Simple Mutate", click on an atom in Ile A 163, and then select Ala from the pop-up window. Next, select "Real Space Refine Zone", and click on 2 atoms to specify a range that runs at least 2 residues on either side of the Ile A 163 (or pick Ile 163 and hit the "a" key for autozone). Notice the subtle, but distinct movement of the C-beta position. Accept the change. Next, we'll mutate the residue back to Ile. To do this, select "Mutate and Auto Fit", click on an atom in Ile A 163, and select Ile from the pop-up menu. Notice that the newly fit sidechain is now rotated ~180 degrees, has a much better density fit for the CD1 atom, and has now placed the CG2 atom in the positive density peak. Upon further refinement, the position of this Ile will further improve, as the neighboring atoms are able to recover from the strain caused by the initial outlier.


The kind of sidechain fixups you've done in KiNG or Coot can mostly be accomplished using real-space refinement in phenix.refine (which includes a rotamer correction component), up to somewhere between 2 and 2.5 Å (and of course NQH flips are done automatically by Reduce in either MolProbity or Phenix). At lower resolution, however, real-space refinement with rotamer correction cannot reliably discern between correct and incorrect rotamers. That's a hard job for people as well, but can often be done if you have the interactive information on clashes and H-bonds from the non-pairwise, H-aware all-atom-contact dots.


Structure validation by Calpha geometry: phi,psi and Cbeta deviation. S.C. Lovell, I.W. Davis, W.B. Arendall, P.I. de Bakker, J.M. Word, M.G. Prisant, J.S. Richardson, and D.C. Richardson. Proteins 50, 437-50 (2003). MolProbity: all-atom structure validation for macromolecular crystallography. V.B. Chen, W.B. Arendall, J.J. Headd, D.A. Keedy, R.M. Immormino, G.J. Kapral, L.W. Murray, J.S. Richardson, and D.C. Richardson. Acta Cryst. D66, 16-21 (2010). Crystallographic model quality at a glance. L. Urzhumtseva, P.V. Afonine, P.D. Adams, and A. Urzhumtsev. Acta Cryst. D65, 297-300 (2009).