Structure refinement in PHENIX

phenix.refine is run from the command line:

% phenix.refine <pdb-file(s)> <reflection-file(s)> <monomer-library-file(s)>

When you do this a number of things happen:

The program automatically generates a ".eff" file which contains all of the parameters for the job (for example if you provided lysozyme.pdb the file lysozyme_refine_001.eff will be generated). This is the set of input parameters for this run.
The program automatically interprets the reflection file(s). If there is an unambiguous choice of data arrays these will be used for the refinement. If there is a choice, you're given a message telling you how to select the arrays. Several reflection files can be provided, for example: one containing Fobs and another one with R-free flags.
Once the data arrays are chosen, the program writes all of the data it will be using in the refinement to a new MTZ file, for example, lysozyme_refine_data.mtz. This makes it very easy to keep track of what you actually used in the refinement (instead of having the arrays spread across multiple files).
At the end of refinement the program generates:
1. a new PDB file, with the refined model, called for example lysozyme_refine_001.pdb;
2. two maps: likelihood weighted mFo-DFc and 2mFo-DFc. These are in ASCII X-PLOR format. A reflection file with map coefficients is also generated for use in Coot or XtalView (e.g. lysozyme_refine_001_map_coeffs.mtz);
3. a new defaults file to run the next cycle of refinement, e.g. lysozyme_refine_002.def. This means you can run the next cycle of refinement by typing:
```
% phenix.refine lysozyme_refine_002.def
```

To get information about command line options type:

% phenix.refine --help

To have the program generate the default input parameters without running the refinement job (e.g. if you want to modify the parameters prior to running the job):

% phenix.refine --dry_run <pdb-file> <reflection-file(s)>

If you know the parameter that you want to change you can override it from the command line:

% phenix.refine data.hkl model.pdb main.low_resolution=8.0 \
  simulated_annealing.start_temperature=5000

Note that you don't have to specify the full parameter name. What you specify on the command line is matched against all known parameters names and the best substring match is used if it is unique.

To rerun a job that was previously run:

% phenix.refine --overwrite lysozyme_refine_001.def

The --overwrite option allows the program to overwrite existing files. By default the program will not overwrite existing files - just in case this would remove the results of a refinement job that took a long time to finish.

To see all default parameters:

% phenix.refine --show-defaults=all

Available features

Restrained coordinate refinement
Restrained isotropic Atomic Displacement Parameters (ADP) refinement
Rigid body refinement
Bulk solvent correction (flat model using a mask) and anisotropic scaling
Simulated Annealing refinement
Multiple refinement and scale target functions: least-squares (ls), maximum-likelihood (ml), phased maximum-likelihood (mlhl)
FFT and direct summation based refinement
Various electron density map calculations (including likelihood-weighted)
Simple structure factor calculation (with or without bulk solvent and scaling)
Combined automatic ordered solvent building, update and refinement
Complete model and data statistics (including twinning analysis, Wilson B calculation, stereo-chemistry statistics and much more)
Group isotropic ADP refinement
Automatic detection of NCS related copies and building NCS restraints
TLS refinement
Occupancy refinement (individual or group)
Complete ADP refinement (combined TLS + individual ADP)
Refinement using X-ray, neutron or both experimental data
Complex refinement strategies in one run. Example: one part of a model can be refined as a rigid body with B-factors modeled via TLS, another part can have individual sites and B-factors refined, while the rest of the model is kept fixed + automatic water picking

Current limitations

No support for omit maps

Running phenix.refine

Refinement with default parameters:

% phenix.refine data.hkl model.pdb

By default this will perform coordinate refinement and restrained isotropic ADP refinement. Three macrocycles will be executed, each consisting of bulk solvent correction, anisotropic scaling of the data, coordinate refinement (25 iterations of the LBFGS minimizer) and ADP refinement (25 iterations of the LBFGS minimizer). At the end the updated coordinates, maps, map coefficients, and statistics are written to files.

Giving parameters on the command line or in files

In phenix.refine parameters to control refinement can be given by the user on the command line (for example):

% phenix.refine data.hkl model.pdb simulated_annealing=true

However, sometimes the number of parameters is large enough to make it difficult to type them all on the command line, for example:

% phenix.refine data.hkl model.pdb refine.adp.tls="chain A" \
  refine.adp.tls="chain B" main.number_of_macro_cycles=4 \
  main.high_resolution=2.5 wxc_scale=3 wxu_scale=5 \
  output.prefix=my_best_model strategy=tls+individual_sites+individual_adp \
  simulated_annealing.start_temperature=5000

The same result can be achieved by using (for example):

% phenix.refine data.hkl model.pdb my_parameters

where the my_parameters file contains the following lines:

refinement.refine.strategy=tls+individual_sites+individual_adp
refinement.refine.adp.tls="chain A"
refinement.refine.adp.tls="chain B"
refinement.main.number_of_macro_cycles=4
refinement.main.high_resolution=2.5
refinement.target_weights.wxc_scale=3
refinement.target_weights.wxu_scale=5
refinement.output.prefix=my_best_model
refinement.simulated_annealing.start_temperature=5000

which can also be formatted by grouping the parameters under the relevant scopes:

refinement.main {
   number_of_macro_cycles=4
   high_resolution=2.5
}
refinement.refine {
   strategy = *individual_sites \
               rigid_body \
              *individual_adp \
               group_adp \
              *tls \
               individual_occupancies \
               group_occupancies \
               group_anomalous \
               none
   adp {
     tls = "chain A"
     tls = "chain B"
   }
}
refinement.target_weights {
   wxc_scale=3
   wxu_scale=5
}
refinement.output.prefix=my_best_model
refinement.simulated_annealing.start_temperature=5000

The easiest way to create a file like the my_parameters file is to generate a template file containing all parameters by using the command phenix.refine --show-defaults=all and then take the parameters that you want to use (and remove the rest).

Refinement scenarios

The refinement of atomic parameters is controlled by the strategy keyword. Those include:

- individual_sites (refinement of individual atomic coordinates)
- individual_adp   (refinement of individual atomic B-factors)
- group_adp        (group B-factors refinement)
- group_anomalous  (refinement of f' and f" values)
- tls              (TLS refinement = refinement of ADP through TLS parameters)
- rigid_body       (rigid body refinement)
- none             (bulk solvent and anisotropic scaling only)

Below are examples to illustrate the use of the strategy keyword as well as a few others.

Refining group isotropic B-factors

One B-factor per residue:

% phenix.refine data.hkl model.pdb strategy=group_adp

Selection of specific groups:

% phenix.refine data.hkl model.pdb strategy=group_adp \
  one_adp_group_per_residue=false adp.group="chain A" adp.group="chain B"

This will refine one isotropic B for chain A and one B for chain B.

This can be combined with other refinement modes (individual sites, ...):
```
% phenix.refine data.hkl model.pdb strategy=group_adp+individual_sites
```

The command above will perform refinement of one B-factor per residue along with refinement of individual coordinates.

Refinement with Simulated Annealing:

% phenix.refine data.hkl model.pdb simulated_annealing=true

This will perform the default refinement (refinement of individual sites and individual isotropic B-factors) plus Simulated Annealing.

Rigid body refinement

One rigid body group (whatever is in the PDB file is refined as a single rigid body):
```
% phenix.refine data.hkl model.pdb strategy=rigid_body
```

Multiple groups (requires a basic knowledge of the PHENIX atom selection language, see below):

% phenix.refine data.hkl model.pdb strategy=rigid_body \
  sites.rigid_body="chain A" sites.rigid_body="chain B"

This will refine the chain A and chain B as two rigid bodies. The rest of the model will be kept fixed.

Alternatively (if one have many rigid groups), one can create a parameter file, for example, rigid_body_selections, containing the following lines:

refinement.refine.sites
{
  rigid_body = chain A
  rigid_body = chain B
}

The command line will then be:

% phenix.refine data.hkl model.pdb strategy=rigid_body rigid_body_selections

Files like this can be created, for example, by copy-and-paste from the complete list of parameters (phenix.refine --show-defaults=all).

Combining rigid body refinement with individual site and B-factor refinement:

% phenix.refine data.hkl model.pdb \
  strategy=rigid_body+individual_sites+individual_adp

This will refine the whole model as one rigid body, plus individual coordinates and B-factors.

Using NCS restraints in refinement

phenix.refine can find NCS automatically or use NCS selections defined by the user.

Providing NCS selections manually

Create a model.ncs file with the NCS selections:

refinement.ncs.restraint_group {
  reference = chain B resid 1:2
  selection = chain C and resid 4:5
  selection = chain D and resid 7:8
}
refinement.ncs.restraint_group {
  reference = chain E
  selection = chain F
}

Specify model.ncs as an additional input when running phenix.refine:

% phenix.refine data.hkl model.pdb model.ncs main.ncs=True

This will perform the default refinement round (individual coordinates and B-factors) using NCS restraints on coordinates and B-factors.

Note: user specified NCS restraints in model.ncs can be modified automatically if better selection is found. To disable this potential automatic adjustment:

% phenix.refine data.hkl model.pdb model.ncs main.ncs=True \
  ncs.find_automatically=False

Automatic detection of NCS groups:

% phenix.refine data.hkl model.pdb main.ncs=True

Note: 1) gaps in selected sequences are allowed - a sequence alignment is performed to detect insertions or deletions; 2) we recommend to check the automatically detected /or adjusted NCS groups.

TLS refinement:

Refinement of TLS parameters only (whole model as one TLS group):
```
% phenix.refine data.hkl model.pdb strategy=tls
```

Refinement of TLS parameters only (multiple TLS group):

% phenix.refine data.hkl model.pdb strategy=tls tls_group_selections

where, similar to the rigid body or group B-factor refinement, the selection for TLS groups has been made in a user-created parameter file (tls_group_selections) as following:

refinement.refine.adp {
  tls = chain A
  tls = chain B
}

Alternatively, the selection for the TLS groups can be made from the command line (see rigid body refinement for an example).

Note: TLS parameters will be refined only for selected fragments. This, for example, will allow to not include the solvent molecules into the TLS groups.

The most comprehensive option:

% phenix.refine data.hkl model.pdb tls_group_selections \
  strategy=tls+individual_sites+individual_adp

(Data and model quality permitting; the line above will involve the refinement of TLS, individual sites and individual isotropic B-factors in one refinement job run).

Note: when refining TLS, the output PDB file always has the ANISOU records for the atoms involved in TLS groups. The anisotropic B-factor in ANISOU records is the total B-factor (B_tls + B_individual). The isotropic equivalent B-factor in ATOM records is the mean of the trace of the ANISOU matrix divided by 10000 and multiplied by 8*pi^2 and represents the isotropic equivalent of the total B-factor (B_tls + B_individual). To obtain the individual B-factors, one needs to compute the TLS component (B_tls) using the TLS records in the PDB file header and then subtract it from the total B-factors (on the ANISOU records).

Refinement + water picking:

% phenix.refine data.hkl model.pdb ordered_solvent=true

This will perform new water picking, anylysis of existing waters and refinement of individual coordinates and B-factors for both, macromolecule and waters. Several cycles will be performed allowing the sorting out of spurious water and refinement of well placed ones. This protocol allows the ordered solvent picking in one run.

Note: as a technical limitation, the water picking cannot be performed in one refinement run with: TLS, rigid body and group B-factors refinement.

A powerfull protocol is the combination of water picking with Simulated Annealing refinement. This can be performed in one refinement run:

% phenix.refine data.hkl model.pdb ordered_solvent=true simulated_annealing=true

Refining only coordinates:

% phenix.refine data.hkl model.pdb strategy=individual_sites

Refining only isotropic B-factors:

% phenix.refine data.hkl model.pdb strategy=individual_adp

Occupancy refinement

Refinement of occupancies for all atoms in the model:

% phenix.refine data.hkl model.pdb strategy=individual_occupancies

Refinement of occupancies for selected atoms only:

% phenix.refine data.hkl model.pdb strategy=individual_occupancies \
  refine.occupancies.individual="chain A and resid 2 and name ca"

In the example above, the only occupancy of CA atom of second residue in chain A will be refined.

Refinement of one occupancy factor per selected group of atoms (group occupancy refinement):

% phenix.refine data.hkl model.pdb strategy=group_occupancies \
  refine.occupancies.group="chain A and resid 1"

group_anomalous refinement

If the structure contains anomalous scatterers (e.g. Se in a SAD or MAD experiment), and if anomalous data are available, it is possible to refine the dispersive (f') and anomalous (f") scattering contributions (see e.g. Ethan Merritt's tutorial for more information). In phenix.refine, each group of scatterers with common f' and f" values is defined via an anomalous_scatterers scope, e.g.:

refinement.refine.anomalous_scatterers {
  group {
    selection = name BR
    f_prime = 0
    f_double_prime = 0
    refine = *f_prime *f_double_prime
  }
}

NOTE: The refinement of the f' and f" values is carried out only if group_anomalous is included under refine.strategy! Otherwise the values are simply used as specified but not refined.

If required, multiple scopes can be specified, one for each unique pair of f' and f" values. These values are assigned to all selected atoms (see below for atom selection details). Often it is possible to start the refinement from zero. If the refinement is not stable, it may be necessary to start from better estimates, or even to fix some values. For example:

refinement.refine.anomalous_scatterers {
  group {
    selection = name BR
    f_prime = -5
    f_double_prime = 2
    refine = f_prime *f_double_prime
  }
}

Here f' is fixed at -5 (note the missing * in front of f_prime in the refine definition), and the refinement of f" is initialized at 2.

The cctbx.form_factor_query command is available for obtaining estimates of f' and f" given an element type and a wavelength, e.g.:

% cctbx.form_factor_query element=Br wavelength=0.8

Information from Sasaki table about Br (Z = 35) at 0.8 A
fp:  -1.0333
fdp: 2.9928

Run cctbx.form_factor_query without arguments for usage information.

Unrestrained refinement

At subatomic resolution often one need to perform unrestrained refinement. A couple of examples of how to do so are given below.

Unrestrained refinement of individual coordinates:

% phenix.refine data.hkl model.pdb strategy=individual_sites target_weights.wc=0

This assigns the contribution of the geometry restraints target to zero. However, it is still calculated for statistics output.

Unrestrained refinement of individual ADPs:

% phenix.refine data.hkl model.pdb strategy=individual_adp target_weights.wu=0

This assigns the contribution of the ADP restraints target to zero. However, it is still calculated for statistics output.

Useful options

Changing the number of refinement cycles and minimizer iterations:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=5 \
  main.max_number_of_iterations=20

Creating R-free flags (if not present in the input reflection files):

% phenix.refine data.hkl model.pdb main.generate_r_free_flags=True

Modifying the automatically determined target weight for coordinate refinement:

% phenix.refine data.hkl model.pdb wxc_scale=5

Note: the default value for wxc_scale is 0.5. Increasing wxc_scale will make the X-ray target contribution greater (the automatically calculated weight is multipled by wxc_scale).

Note: wxc_scale=0 will completely exclude the experimental data from the refinement resulting in idealization of the stereochemistry. For stereochemistry idealization use the separate command:

% phenix.geometry_minimization [options] pdb_file [output_pdb_file]

Modifying the automatically determined target weight for ADP refinement:

% phenix.refine data.hkl model.pdb wxu_scale=3

Note: the default value for wxu_scale is 1.0. Increasing wxu_scale will make the X-ray target contribution greater and therefore the B-factors restraints weaker.

Specify the name for output files:

% phenix.refine data.hkl model.pdb output.prefix=lysozyme

Reflection output

At the end of refinement a plain text file with Fobs, Fmodel, Fcalc, Fmask, fom, alpha, beta, R-free_flags, resolution can be written out:

% phenix.refine data.hkl model.pdb output.prefix=lysozyme \
  export_final_f_model=mtz

Note: Fmodel is the total model structure factors including all scales:

Fmodel = scale_k1 * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask)

Setting the resolution range for the refinement:

% phenix.refine data.hkl model.pdb main.low_resolution=15.0 \
  main.high_resolution=2.0

Bulk solvent correction and anisotropic scaling only:

% phenix.refine data.hkl model.pdb strategy=none

Default refinement with user specified X-ray target function:

% phenix.refine data.hkl model.pdb main.target=ls

Manually setting the target weight for coordinate refinement (instead of using the default automatic choice):

% phenix.refine data.hkl model.pdb target_weights.fix_wxc=10

Manually setting the target weight for ADP refinement (instead of using the default automatic choice):

% phenix.refine data.hkl model.pdb target_weights.fix_wxu=10

Modifying the initial model before refinement starts

Randomly shake atomic coordinates:

% phenix.refine data.hkl model.pdb sites.shake=0.3

Randomize isotropic B-factors:

% phenix.refine data.hkl model.pdb adp.randomize=true

Set all B-factors to given value:

% phenix.refine data.hkl model.pdb adp.set_b_iso_to=25

Set all B-factors to Wilson B:

% phenix.refine data.hkl model.pdb adp.set_biso_to_wilson_b=true

To write out the modified model (without any refinement), add: main.number_of_macro_cycles=0, e.g.:

% phenix.refine data.hkl model.pdb adp.set_b_iso_to=25 \
  main.number_of_macro_cycles=0

Turn OFF/ON stereochemical restraints:

% phenix.refine data.hkl model.pdb target_weights.wc=0

Turn OFF/ON restraints on isotropic B-factors:

% phenix.refine data.hkl model.pdb target_weights.wu=0

Refinement using FFT (fast, default) or direct (slow) structure factor calculation algorithm:

% phenix.refine data.hkl model.pdb main.sf_algorithm=fft

or:

% phenix.refine data.hkl model.pdb main.sf_algorithm=direct

phenix.refine for the refinement of twinned data

phenix.refine can handle the refinement of hemihedrally twinned data (two twin domains). Least square twin refinement can be carried out using the following commands line instructions:

% phenix.refine data.hkl model.pdb twinning=True twin_law="-k,-h,-l"

The twin law (in this case -k,-h,-l) can be obtained from phenix.xtriage. If more than a single twin law is possible for the given unit cell and space group, using phenix.twin_map_utils might give clues which twin law is the most likely candidate to be used in refinement.

Other twinning options are defined in the following scope:

refinement.twinning{
  twin_law = None
  detwin{
    mode = algebraic proportional *auto
    local_scaling = False
    map_types{
      twofofc = *two_m_dtfo_d_fc two_dtfo_fc
      fofc = m_dtfo_d_fc *gradient m_gradient
      aniso_correct = False
    }
  }
}

At this giving moment, it is best to leave the map types alone, although correcting for anisotropy might be usefull (detwin.map_types.aniso_correct=True).

The detwinning mode is auto by default: it will perform algebraic detwinning for twin fraction below 40%, and detwinning using proportionality rules (SHELXL style) for fractions above 40%.

Please note that the gradient maps (fofc=gradient) are detwinned by nature: no detwinning is needed. At this point in time, the gradient maps are the best choice for picking waters or building missing ligands.

An important point to stress is that phenix.refine will only deal properly with twinning that involves two twin domains.

Using phenix.refine to calculate structure factors

Use phenix.refine as a structure factor calculator

Calculate Fcalc from atomic model (no solvent modeling or scaling):

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=0 \
  main.bulk_solvent_and_scale=false

Calculate Fcalc from atomic model including bulk solvent and all scales:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \
  strategy=none

Note: the number of calculated structure factors will be exactly the same as the number of observed data (Fobs) provided in the input reflection files.

Note: If desired, resolution limits can be applied, for example:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \
  main.low_resolution=15.0 main.high_resolution=2.0

CIF modifications and links

phenix.refine uses the CCP4 monomer library to build geometry restraints (bond, angle, dihedral, chirality and planarity restraints). The CCP4 monomer library comes with a set of "modifications" and "links" which are defined in the file mon_lib_list.cif. Some of these are used automatically when phenix.refine builds the geometry restraints (e.g. the peptide and RNA/DNA chain links). Other links and modifications have to be applied manually, e.g.:

refinement.pdb_interpretation.apply_cif_modification
{
  data_mod = 5pho
  residue_selection = resname GUA and name O5T
}

Here a custom 5pho modification is applied to all GUA residues with an O5T atom. I.e. the modification can be applied to multiple residues with a single apply_cif_modification block. The CIF modification is supplied as a separate file on the phenix.refine command line, e.g.:

data_mod_5pho
#
loop_
_chem_mod_atom.mod_id
_chem_mod_atom.function
_chem_mod_atom.atom_id
_chem_mod_atom.new_atom_id
_chem_mod_atom.new_type_symbol
_chem_mod_atom.new_type_energy
_chem_mod_atom.new_partial_charge
 5pho     add      .      O5T    O    OH      .
loop_
_chem_mod_bond.mod_id
_chem_mod_bond.function
_chem_mod_bond.atom_id_1
_chem_mod_bond.atom_id_2
_chem_mod_bond.new_type
_chem_mod_bond.new_value_dist
_chem_mod_bond.new_value_dist_esd
 5pho     add      O5T     P         coval        1.520    0.020

Similarly, a link can be applied like this:

refinement.pdb_interpretation.apply_cif_link {
  data_link = MAN-THR
  residue_selection_1 = chain A and resname MAN and resid 900
  residue_selection_2 = chain A and resname THR and resid 42
}

The residue selections for links must select exactly one residue each. The MAN-THR link is pre-defined in mon_lib_list.cif. Custom links can be supplied as additional files on the phenix.refine command line. See mon_lib_list.cif for examples. The full path to this file can be obtained with the command:

phenix.where_mon_lib_list_cif

All apply_cif_modification and apply_cif_link definitions will be included into the .def files. I.e. it is not necessary to specify the definitions again if further refinement runs are started with .def files.

Note that all LINK, SSBOND, HYDBND, SLTBRG and CISPEP records in the input PDB files are ignored.

Definition of custom bonds and angles

Most geometry restraints (bonds, angles, etc.) are generated automatically based on the CCP4 monomer library. Additional custom bond and angle restraints, e.g. between protein and a ligand or ion, can be specified in this way:

refinement.geometry_restraints.edits {
  zn_selection = chain A and resname ZN and resid 200 and name ZN
  his117_selection = chain A and resname HIS and resid 117 and name NE2
  asp130_selection = chain A and resname ASP and resid 130 and name OD1
  bond {
    action = *add
    atom_selection_1 = $zn_selection
    atom_selection_2 = $his117_selection
    distance_ideal = 2.1
    sigma = 0.02
  }
  bond {
    action = *add
    atom_selection_1 = $zn_selection
    atom_selection_2 = $asp130_selection
    distance_ideal = 2.1
    sigma = 0.02
  }
  angle {
    action = *add
    atom_selection_1 = $his117_selection
    atom_selection_2 = $zn_selection
    atom_selection_3 = $asp130_selection
    angle_ideal = 109.47
    sigma = 5
  }
}

The atom selections must uniquely select a single atom. Save the geometry_restraints.edits to a file and specify the file name as an additional argument when running phenix.refine for the first time. The edits will be included into the .def files. I.e. it is not necessary to manually specify them again if further refinement runs are started with .def files.

Atom selection examples

All C-alpha atoms (not case sensitive):

name ca

All atoms with H in the name (* is a wildcard character):

name *H*

Atoms names with * (backslash disables wildcard function):

name o2\*

Atom names with spaces:

name 'O 1'

Atom names with primes don't necessarily have to be quoted:

name o2'

Boolean and, or and not:

resname ALA and (name ca or name c or name n or name o)
chain a and not altid b
resid 120 and icode c and model 2
segid a and element c and charge 2+ and anisou

Residue 188:

resseq 188

resid is a synonym for resseq:

resid 188

All residues from 188 to the end (including 188):

resseq 188:

Alternative to the previous:

resseq 188-

All residues from the beginning to 188 (including 188):

resseq :188
resseq -188

Residues 2 through 10 (including 2 and 10):

resseq 2:10
resseq 2-10

"Smart" selections:

resname ALA and backbone
resname ALA and sidechain
peptide backbone
rna backbone or dna backbone
water or nucleotide
dna and not (phosphate or ribose)
within(5, (nucleotide or peptide) backbone)