Evaluating a model with Holton geometry validation

Author(s)

holton_geometry_validation: James Holton (Python coding by Tom Terwilliger)

Purpose

Summarize geometric quality of a model

How holton_geometry_validation works:

This tool summarizes the geometric quality of a model by calculating pseudo-energies (square of the value of the deviation from ideality divided by sigma) for every aspect of a model (each bond length, angle, rotamer, Ramachandran value, etc). For each metric (such as angles), the mean energy and worst energy are noted, and the probabilities of this mean and this worst value not occurring by chance are estimated. To reduce the dominance of very poor values, energy values E over 10 are filtered by reducing them toward 10 with a logarithmic function. The weighted energy for each metric is sum of the mean energy and the filtered worst energy for that metric, each weighted by their probabilities of not occurring by chance.

The overall energy is the sum of weighted energies for all metrics. The metrics used are:

CBetadev: Deviation of CB position from ideal (A)
Clash:    Clash of non-bonded atoms (A)
Omega:    Peptide omega angle deviation (degrees)
Rama:     Ramachandran outlier probability
Rota:     Rotamer outlier probability
Angle:    Angle deviation (degrees)
Bond:     Bond length deviation (A)
Chir:     Chirality deviation (A**3)
Torsion:  Torsion angle deviation (degrees)
Full-nonbond: Non-bonded deviations, including all instances
Nonbond:  Non-bonded deviations (using Lennard-Jones potential,
          excluding bonds that are plausible and have low energies)
Plane:    Planarity deviations (A)

Estimate of expected energy for ideal structure

This tool estimates the expected energy for a structure that has all bonds, angles, etc, distributed as normal distributions with the sigmas used in the geometry validation step above. That is, if the structure had a normal distribution of errors, it would have about this energy. The energy is estimated by sampling from a normal distribution when calculating the energy terms instead of taking the actual deviations. The sampling process is carried out many times (20) and the averages are reported.

Estimate of ratio of deviations to sigmas

An ideal model would have all individual energies distributed normally with variances approximately equal to the corresponding parent variances. The exact parent distributions for different structures would therefore be different. The distribution of overall scores for ideal models can be estimated by sampling all the contributions using normal distributions with appropriate parent variances.

The tool estimates the overall ratio of deviations to sigmas in three steps

First it lists all the values deviation/sigma for all the metrics. If the deviations were drawn from random Gaussian distributions with standard deviations of their corresponding sigmas, these values of deviation/sigma would be normally distributed with a standard deviation of one.

Then the 99.7th percentile of values of deviation/sigma is noted. For a Gaussian distribution, this percentile corresponds to 3 standard deviations. The estimated ratio of deviations/sigma overall is then 1/3 of this 99.7th percentile value, or 1 standard deviation of the ratios of deviation to sigma.

Comparing two models

If you want to be more certain about whether model A.pdb is better than B.pdb, you can supply both models to holton_geometry_validation (one as model and the other as other_models). The tool will calculate scores for A.pdb and B.pdb using not just the standard parameters, but all reasonable values of these parameters. The overall estimate of the difference in scores is obtained by averaging over the differences in scores using the sets of varied parameters. An overall difference and SD of the difference is supplied. If A.pdb consistently has a better score than B.pdb, you can be relatively confident that it really is better.

You can supply a list of models as well (or a model and a set of other_models), and each will be compared to the first model in the list.

Uses

You can use holton_geometry_validation to compare models for the same structure and identify which has fewer overall geometric problems.

You can use the expected energy to see if your structure has errors that are about what is expected based on ideal model geometry and uncertainties.

You can use the estimate of ratio of deviations to sigmas to further examine whether the deviations from ideality in your structure are about what is expected based on the expected variation in geometric values.

You can compare a series of models to see how each compares to the first model.

Examples

Standard run of holton_geometry_validation:

Running holton_geometry_validation is easy. From the command-line you can type:

phenix.holton_geometry_validation 1ss8_A.pdb

where 1ss8_A.pdb is the model you would like to evaluate.

Possible Problems

If your structure has hydrogen positions that do not match those expected by Phenix, you can ignore them in all metrics except non-bonded interactions with ignore_h_except_in_nonbond=True. Alternatively you can redo their positions with keep_hydrogens=False.

If you want to ignore just hydrogen positions in Arginine residues, you can specify ignore_arg_h_nonbond=True.

If you want to exclude hydrogen nonbond contacts involving waters, use ignore_water_h_bonds, and if you want to exclude all bonds involving hydrogen, ignore_bond_lengths_with_h=True

Limitations

The value of the score depends strongly on the worst energy in each category. This makes it possible to game the score (down to the next-worst values) by focusing attention on those individual energies. This limitation means that in the current implementation, refinement against the score is not useful. This limitation might be reduced by including not just the worst energy in each category, but rather weighted values of several or all the worst energies.

Parameters in the score function are somewhat arbitrary, and those that are not arbitrary do not have a perfect value. If parameters have ranges of reasonable values, the difference in score between two models cannot be known more accurately than the range of differences in score associated with reasonable values of the parameters.

Literature

The method is described at https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/#score

Additional information

List of all available keywords

model = None input PDB file
other_models = None Models to compare with model
get_individual_residue_scores = None Calculate individual residue scores in addition to overall score
round_numbers = True Round numbers before calculation
worst_clash_is_one_plus_n_times_worst_clash = True Scale worst clash score by (1 + n) where n is total clashes
clash_energy_add_n = True Add number of clashes to clash energy score
minimum_nonbond_score_to_be_worst = -0.1 Only include worst nonbond score if it is at least this value
minimum_nonbond_score_to_be_included_in_average = 0 Only include nonbond score in average if it is at least this value
include_full_nonbond_score = True If set, add additional scoring term in which all nonbond values (even those less than the minimums) are included
keep_hydrogens = True If set, keep input hydrogens, but add any necessary riding H.
ignore_cis_peptides = False If set, ignore cis peptides. Otherwise (if False), penalize them.
ignore_h_except_in_nonbond = True If set, ignore H atoms except in nonbond term
ignore_arg_h_nonbond = True If set, ignore H atoms in Arginine
ignore_bond_lengths_with_h = False If set, ignore bond lengths involving H atoms
ignore_water_h_bonds = False If set, ignore nonbonded contacts with water hydrogens
rotalyze_ramalyze_max_energy = 99 Maximum value of energy for a bad rotamer or phi-psi combination
overall_max_energy = None Maximum value of energy
omega_angle_sigma = 4 Sigma for omega angle
cbetadev_sigma = 0.05 Sigma for CB position
clashscore_ideal_dist = 3 Ideal distance in Lennard-Jones potential for clashscore result
lj_dist_that_yields_zero = 6 Distance at which modified Lennard-Jones potential crosses zero
const_shrink_donor_acceptor = 0 Allow contacts closer by const_shrink_donor_acceptor from normal target for H-bonding atoms. NOTE: matches behavior of Phenix pre-2024.
remove_waters = None Remove waters before analysis
score_this_altloc_only = None Score only this altloc if specified
include_random = True Estimate expected value for structure with normal errors
n_random = 20 Number of samples for estimation of expected values
random_seed = 171927 Random seed
softPnna_params
- y0 = 1 Parameter y0 for softPnna calculation
- a2 = -0.0192266 Parameter a2 for softPnna calculation
- a1 = 0.751694 Parameter a1 for softPnna calculation
- a0 = 1.12482 Parameter a0 for softPnna calculation
- mx = 0.21805 Parameter mx for softPnna calculation
- my = 0.736621 Parameter my for softPnna calculation
variable_params
- clashscore_ideal_dist_values = 2.5 3 3 3 3.5 Values to try for variables that are uncertain in comparison between models
- lj_dist_that_yields_zero_values = 5 6 6 6 7
- omega_angle_sigma_values = 4 5 6 7 8 9 10
- cbetadev_sigma_values = .04 .05 .06
- worst_clash_is_one_plus_n_times_worst_clash_values = 0 1
- clash_energy_add_n_values = 0 1
- minimum_nonbond_score_to_be_worst_values = -0.1 0.0
- minimum_nonbond_score_to_be_included_in_average_values = 0 0 0 0 -1.1
- include_full_nonbond_score_values = 0 1
- ignore_arg_h_nonbond_values = 0 1
- rotalyze_ramalyze_max_energy_values = 20 99 99 99 1000