|Python-based Hierarchical ENvironment for Integrated Xtallography|
Frequently asked questions for phenix.refine
Note: questions specific to the GUI can be found in its documentation.
The OpenMP parallelization is not particularly efficient, since the FFTs take less than half of the typical runtime; a speedup of 40% is usually the maximum. The process-level parallelization with 'nproc' is most useful when restraint weight optimization is enabled, since these procedures can be run as multiple separate processes. In these circumstances a speedup of 4-5x is possible. However, a run using default parameters will not significantly benefit from setting 'nproc'.
Simulated annealing (SA) is most useful early in refinement, when the model is far from convergence. Manually built models, or MR solutions involving significant local conformational changes, are common inputs where SA can improve over simple gradient-driven refinement. It is generally less helpful later in refinement, and/or at high resolution.
This typically only needs to be performed once after molecular replacement, unless you dock in additional domains later. Continuing to use rigid-body refinement in later runs will not improve your structure, and only adds to the runtime.
Either amplitudes (F) or intensities (I) may be used in refinement (in any file format), but intensities will be used preferentially. Both anomalous and non-anomalous data are supported; there does not appear to be any particular benefit to model quality using one or the other with the default strategy. However, anomalous data may be used to refine anomalous scattering factors for heavy atoms, and anomalous difference map coefficients will automatically be created in the output MTZ file. For these reasons, anomalous data are recommended, but in general the R-factors will be similar.
Reflections with abnormal values tend to reduce the performance of the refinement engine. These are identified based on several criteria (see Read 1999 for details) and filtered out at the beginning of each macro-cycle. You can prevent this by setting xray_data.remove_outliers=False.
We recommend at least five to ensure convergence, but in some cases (especially poorly refined structures) considerably more may be required for optimal results. (The default is three macro-cycles due to speed considerations.)
This is a contentious subject, and not settled, although it is widely agreed that throwing away usable data simply to reduce the R-factors is not an acceptable practice. Traditional criteria for resolution limits include (among others) truncating the data at the resolution where the mean I/sigmaI falls below 2, but this may exclude valuable data. See the documentation on unmerged data (especially the cited references) for more details. In general it may be useful to include additional, weaker data in refinement in the final stages, since the weighting performed by maximum-likelihood refinement prevents these from degrading the model, and may improve it in some cases. (Note that more data will also cause phenix.refine to take longer to run, which often imposes a practical limit on resolution.)
This may be beneficial if the automatic weighting does not pick a good scale for the X-ray and restraint terms; this will often be recognizable by higher-than-expected bond and angle RMSDs. In general it rarely hurts to optimize the weights, and often results in a significantly better refinement, but it is several times slower than ordinary refinement unless you have a highly parallel system. However, we strongly recommend weight optimization in the final round of refinement, where it becomes essential to prevent overfitting.
Our usual response is: don't do this manually, use the automatic optimization instead. Although this takes significantly longer to run, in practice most users will spend an equivalent amount of time manually adjusting the weights by trial-and-error. If you are certain you need to have manual control, the parameters fix_wxc (for geometry restraints) and fix_wxu (for B-factor restraints) will set the weights.
An approximate cutoff for NCS restraints is 2.0 Angstrom - at higher resolution the data alone are usually sufficient, but at lower resolution additional restraints are usually necessary. This is somewhat subjective due to the behavior of the global NCS restraints currently used by default in PHENIX, but will be addressed in future versions.
The global NCS restraints groups as rigid bodies, where all atoms in each group are expected to be related to the others by a single rotation and translation operation. This does not respect local deformations in the related molecules, which are common even at lower resolution. The torsion NCS restraints restraint dihedral angles instead, and allow them to be unrestrained if genuinely different. This will eventually become the default, since it often results in significantly better refinement.
In most cases this is due to the restraint of B-factors between NCS groups, which may actually have very different levels of disorder. Setting the NCS B-factor weight term to zero usually fixes the problem.
This is somewhat controversial, but absolute upper limits for a well-refined protein structure at high resolution are typically 0.02 for RMS(bonds) and 2.0 for RMS(angles); usually they will be significantly lower. As resolution decreases the acceptable deviation from geometry restraints also decreases, so at 3.5 Angstrom, more appropriate values would be 0.01 and 1.0. We recommend using the POLYGON tool in the validation summary to judge your structure relative to others at similar resolutions.
This usually means that the automatic X-ray/geometry weighting did not work properly; this can sometimes happen if the starting model also has poor geometry. Optimizing the weight (optimize_xyz_weight=True, or equivalent GUI control in the "Refinement settings" tab) will usually fix this problem.
The experimental phases used to restraint refinement describe a bimodal probability distribution for every angle, rather than the single values used to generate a map. In most cases the additional restraints will not hurt refinement, and can often help.
This often happens when the restraints were generated using ReadySet from a PDB file, and the ligand code is not recognizable in the Chemical Components Database. eLBOW will try to guess the molecular topology based on the coordinates alone, but this is imprecise and may not yield the desired result. For best results, restraints for non-standard ligands should be generated in eLBOW using a SMILES string or similar source of topology information.
In general, if NCS is present in your structure, you should always use NCS restraints at low resolution; it is worth trying both the Cartesian (global) and torsion restraints to see which works best for your model. This alone usually helps with the geometry and overfitting, although it is rarely sufficient by itself. There are also several different types of restraint specifically designed to help with low-resolution refinement (consult the full phenix.refine manual page for details on each):
Use phenix.ready_set or phenix.metal_coordination to generate custom bond (and optionally, bond angle) restraints, which will be output to a parameter file ending in ".edits". If you are using the PHENIX GUI, there is a toolbar button for ReadySet in the phenix.refine interface, which will automatically load the output files for use in phenix.refine.
I had previously generated custom restraints using ReadySet in the PHENIX GUI, but the atoms have changed. Now phenix.refine crashes because it can't find the atom selections. How do I remove the old custom restraints?
TLS refinement is generally valid at any resolution; at low resolution, it may be best to make each chain a single group, instead of trying to split them into smaller pieces. However, it is best to wait until near the end of refinement to add TLS; until then you should refine with isotropic ADPs only.
phenix.refine does not have a single global switch for defining ADP parameterization; rather, when the "Individual ADPs" strategy is defined, the program uses several criteria to determine how atoms should be treated:
In the GUI, several common parameterizations are pre-defined in the dialog for entering ADP selections. Note that although it is possible to combine all of the different ADP refinement strategies in a single run, the atom selections for individual and grouped refinement may not overlap, nor may the selections for anisotropic ADPs and TLS groups.
There may be circumstances where anisotropic refinement is permissible at slightly lower resolution, but 1.7 Angstrom is probably a lower limit. Exceptions may sometimes be made for metal ions, since they scatter very strongly. As always, you should use the drop in R-free to judge whether the change in parameterization was appropriate - a decrease of 0.5% (i.e. 0.005) or better indicates success.
It is again difficult to give an exact rule, since it depends on several properties of the crystal including resolution, solvent content, presence of NCS, etc. In general, the higher the data-to-parameter ratio, the more likely individual ADPs are to work well. As an approximate example, consider these two hypothetical structures:
In this case, the latter structure can probably be refined with individual ADPs, while the former is more marginal. If in doubt, early rounds of refinement may be done with grouped ADPs, switching to individual as the structure nears convergence. In general, it is usually worth trying individual ADPs at some point; ultimately the effect on R-factors (primarily R-free, but also the gap between R-work and R-free) is the most important guideline.
A partial answer can be obtained by looking at POLYGON, which plots histograms of statistics for PDB structures solved at similar resolutions, and compares these to the statistics for your output model. As a general rule, R-factors alone should not be used to decide if a structure is "done", but should be examined in combination with the validation report.
Overfitting during refinement is usually helped by adding more restraints, and/or tightening the standard geometry restraints. If the output geometry is already within reasonable limits (typically RMS(bonds) < 0.016 and RMS(angles) < 1.8), ideas to try include adding NCS restraints if NCS is present, secondary structure restraints, or reference model restraints (if a high-resolution structure is available). At lower resolutions (worse than 3.0A), it may also be prudent to try grouped ADP refinement, and if desperate, Ramachandran restraints. TLS refinement can often improve overfitting across a wide range of resolutions. However, depending on the degree of overfitting, it may be necessary to perform extensive manual rebuilding first. (Note that if the large R/R-free gap suddenly appears after refinement of a model that was previously not overfit, this usually indicates incorrect parameterization of the refinement, e.g. using anisotropic ADPs at an inappropriate resolution.)
This is largely a matter of personal preference. Using explicit riding hydrogen atoms can improve geometry at any resolution; at higher resolutions, approximately 2 Angstrom or better, they will generally improve R-free as well. At atomic resolution (1.5 A or better) they should always be part of the final model. Note that at unless you have true subatomic resolution (0.9 A or better), the hydrogens should always be refined as "riding", meaning that their coordinates are defined by the heavy atoms, not individually refined.
Versions of Coot prior to 0.6.2 used a version of the CCP4 monomer library with hydrogen atoms named according to the PDB format version 2 standard; PHENIX can recognize these, but defaults to PDB v.3. To reconcile the different conventions, you can download the newer version of the monomer library (currently available here) and set the environment variable COOT_REFMAC_LIB_DIR to point to the directory in which you unpack it.
We strongly discourage removing any atoms used in refinement from the model, as it makes reproducing the published R-factors very difficult and eliminates essential information about how the structure was refined.
The charge occupies columns 79-80 at the end of each ATOM or HETATM record, immediately following the element symbol. The format is the number of electrons followed by the charge sign, for example "1-" or "2+". You can edit the PDB file manually to add this, but we recommend using phenix.pdbtools:
phenix.pdbtools model.pdb charge_selection="element Mn" charge=2
This is also available in the GUI under "Model tools". The effect of setting the charge will be to use modified scattering factors for X-ray refinement, which can be helpful if you notice difference density appearing at ion sites. Note that it will have no effect on the geometry, since phenix.refine does not take electrostatics into account.
A third approach, setting the occupancy of missing atoms to zero but leaving them in the model, is strongly disfavored, as the resulting positions and B-factors are entirely theoretical (but not immediately obvious as such).
phenix.refine and phenix.model_vs_data use the same code to perform the bulk solvent correction and scaling, so they should report approximately the same R-factors given identical inputs. The discrepancy arises when taking a PDB file from refinement and running it back through phenix.model_vs_data. Because of the limited precision of the PDB format (three digits after the decimal point for coordinates, two digits for B-factors and occupancy), the atomic properties recorded in the PDB file will not be exactly the same as their actual refined values. In practice the difference in R-factors is statistically insignificant, however.
This is most useful when you have a large number of strong anomalous scatterers, where map artifacts are common. In such cases a more precise modeling of atomic scattering may improve the R-factors as well. It is generally not necessary for routine cases, but it may be advantageous for identifying weak anomalous scatterers (such as ions from the crystallization condition) when used to calculate an LLG map at the end of refinement.