Summarize geometric quality of a model
This tool summarizes the geometric quality of a model by calculating pseudo-energies (square of the value of the deviation from ideality divided by sigma) for every aspect of a model (each bond length, angle, rotamer, Ramachandran value, etc). For each metric (such as angles), the mean energy and worst energy are noted, and the probabilities of this mean and this worst value not occurring by chance are estimated. To reduce the dominance of very poor values, energy values E over 10 are filtered by reducing them toward 10 with a logarithmic function. The weighted energy for each metric is sum of the mean energy and the filtered worst energy for that metric, each weighted by their probabilities of not occurring by chance.
The overall energy is the sum of weighted energies for all metrics. The metrics used are:
CBetadev: Deviation of CB position from ideal (A) Clash: Clash of non-bonded atoms (A) Omega: Peptide omega angle deviation (degrees) Rama: Ramachandran outlier probability Rota: Rotamer outlier probability Angle: Angle deviation (degrees) Bond: Bond length deviation (A) Chir: Chirality deviation (A**3) Torsion: Torsion angle deviation (degrees) Full-nonbond: Non-bonded deviations, including all instances Nonbond: Non-bonded deviations (using Lennard-Jones potential, excluding bonds that are plausible and have low energies) Plane: Planarity deviations (A)
This tool estimates the expected energy for a structure that has all bonds, angles, etc, distributed as normal distributions with the sigmas used in the geometry validation step above. That is, if the structure had a normal distribution of errors, it would have about this energy. The energy is estimated by sampling from a normal distribution when calculating the energy terms instead of taking the actual deviations. The sampling process is carried out many times (20) and the averages are reported.
An ideal model would have all individual energies distributed normally with variances approximately equal to the corresponding parent variances. The exact parent distributions for different structures would therefore be different. The distribution of overall scores for ideal models can be estimated by sampling all the contributions using normal distributions with appropriate parent variances.
The tool estimates the overall ratio of deviations to sigmas in three steps
First it lists all the values deviation/sigma for all the metrics. If the deviations were drawn from random Gaussian distributions with standard deviations of their corresponding sigmas, these values of deviation/sigma would be normally distributed with a standard deviation of one.
Then the 99.7th percentile of values of deviation/sigma is noted. For a Gaussian distribution, this percentile corresponds to 3 standard deviations. The estimated ratio of deviations/sigma overall is then 1/3 of this 99.7th percentile value, or 1 standard deviation of the ratios of deviation to sigma.
If you want to be more certain about whether model A.pdb is better than B.pdb, you can supply both models to holton_geometry_validation (one as model and the other as other_models). The tool will calculate scores for A.pdb and B.pdb using not just the standard parameters, but all reasonable values of these parameters. The overall estimate of the difference in scores is obtained by averaging over the differences in scores using the sets of varied parameters. An overall difference and SD of the difference is supplied. If A.pdb consistently has a better score than B.pdb, you can be relatively confident that it really is better.
You can supply a list of models as well (or a model and a set of other_models), and each will be compared to the first model in the list.
You can use holton_geometry_validation to compare models for the same structure and identify which has fewer overall geometric problems.
You can use the expected energy to see if your structure has errors that are about what is expected based on ideal model geometry and uncertainties.
You can use the estimate of ratio of deviations to sigmas to further examine whether the deviations from ideality in your structure are about what is expected based on the expected variation in geometric values.
You can compare a series of models to see how each compares to the first model.
Running holton_geometry_validation is easy. From the command-line you can type:
phenix.holton_geometry_validation 1ss8_A.pdb
where 1ss8_A.pdb is the model you would like to evaluate.
If your structure has hydrogen positions that do not match those expected by Phenix, you can ignore them in all metrics except non-bonded interactions with ignore_h_except_in_nonbond=True. Alternatively you can redo their positions with keep_hydrogens=False.
If you want to ignore just hydrogen positions in Arginine residues, you can specify ignore_arg_h_nonbond=True.
If you want to exclude hydrogen nonbond contacts involving waters, use ignore_water_h_bonds, and if you want to exclude all bonds involving hydrogen, ignore_bond_lengths_with_h=True
The value of the score depends strongly on the worst energy in each category. This makes it possible to game the score (down to the next-worst values) by focusing attention on those individual energies. This limitation means that in the current implementation, refinement against the score is not useful. This limitation might be reduced by including not just the worst energy in each category, but rather weighted values of several or all the worst energies.
Parameters in the score function are somewhat arbitrary, and those that are not arbitrary do not have a perfect value. If parameters have ranges of reasonable values, the difference in score between two models cannot be known more accurately than the range of differences in score associated with reasonable values of the parameters.
The method is described at https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/#score