Structure refinement in PHENIX

phenix.refine is run from the command line:

% phenix.refine <pdb-file(s)> <reflection-file(s)> <monomer-library-file(s)>

When you do this a number of things happen:

The program automatically generates a defaults file which contains all of the parameters for the job (for example if you provided lysozyme.pdb the file lysozyme_refine_001.eff will be generated). This is the set of input parameters for this run.
The program automatically interprets the reflection file(s). If there is an unambiguous choice of data arrays these will be used for the refinement. If there is a choice, you're given a message telling you how to select the arrays. Several reflection files can be provided in input, for example: one containing Fobs and another one with test/work flags.
Once the data arrays are determined, the program writes all of the data it will be using in the refinement to a new MTZ file, for example, lysozyme_refine_data.mtz. This makes it very easy to keep track of what you actually used in the refinement (instead of having the arrays spread across multiple files).
At the end of refinement the program generates:
1. a new PDB file, with the refined model, called for example lysozyme_refine_001.pdb;
2. a new reflection file, with the calculated structure factors, scale factors, figures of merit for each reflection etc., called for example lysozyme_001.mtz;
3. two maps: likelihood weighted mFo-DFc and 2mFo-DFc. These are in ASCII X-PLOR format. A reflection file with map coefficients is also generated for use in Coot or XtalView (e.g. lysozyme_refine_001_map_coeffs.mtz);
4. a new defaults file to run the next cycle of refinement, e.g. lysozyme_refine_002.def. This means you can run the next cycle of refinement by typing:
```
% phenix.refine lysozyme_refine_002.def
```

To get information about command line options type:

% phenix.refine --help

To have the program generate the input defaults file without running the refinement job (i.e. so you can modify the inputs prior to running the job):

% phenix.refine --dry_run <pdb-file> <reflection-file(s)>

If you know the parameter that you want to change you can override it from the command line:

% phenix.refine data.hkl model.pdb main.low_resolution=8.0 \
  simulated_annealing.start_temperature=5000

Note that you don't have to specify the full parameter name. What you specify on the command line is matched against all known parameters names and the best substring match is used if it is unique.

To rerun a job that was previously run:

% phenix.refine --overwrite lysozyme_refine_001.def

The --overwrite option allows the program to overwrite existing files. By default the program will not overwrite existing files - just in case this would remove the results of a refinement job that took a long time to finish.

To see all default parameters:

% phenix.refine --show_defaults=all

Current features

Restrained coordinate refinement
Restrained isotropic ADP refinement
Rigid body refinement
Bulk solvent correction (Flat model using mask) and anisotropic scaling
Simulated Annealing refinement
Make use of multiple refinement and scale target functions: least-squares (ls), maximum-likelihood (ml), phased maximum-likelihood (mlhl)
FFT (like CNS, Refmac) and direct summation (like SHELX) based refinement
Various electron density map (including likelihood-weighted) calculations
Simple structure factors calculation (with or without bulk solvent and scaling)
Combined automatic ordered solvent building, update and refinement
Complete model and data statistics output (including twinning analysis, Wilson B calculation and much more)
Group isotropic ADP refinement
TLS refinement
Combined TLS, coordinate and individual isotropic ADP refinement

Current limitations

No restrained individual anisotropic ADP refinement

Running phenix.refine

Refinement with default parameters:

% phenix.refine data.hkl model.pdb

By default this will perform coordinate refinement and restrained isotropic ADP refinement. Three macrocycles will be executed, each consisting of bulk solvent correction, anisotropic scaling of the data, coordinate refinement (25 iterations of the LBFGS minimizer) and ADP refinement (25 iterations of the LBFGS minimizer). At the end the updated coordinates, maps, map coefficients, and statistics are output.

Giving parameters on the command line or in files

In phenix.refine parameters to control refinement can given by the user on the command line (for example):

% phenix.refine data.hkl model.pdb simulated_annealing=true

However, sometimes the number of parameters is large enough to make it difficult to type them all in on the command line, for example:

% phenix.refine data.hkl model.pdb refine_tls=true tls.selection="chain A" \
  tls.selection="chain B" main.number_of_macro_cycles=10 \
  main.high_resolution=2.5 wxc_scale=3 wxu_scale=5 \
  output.prefix=my_best_model \
  simulated_annealing.start_temperature=5000

The same result can be achieved by using (for example):

% phenix.refine data.hkl model.pdb my_parameters

where the my_parameters file contains the following lines:

refinement.main.refine_tls=true
refinement.tls.selection="chain A"
refinement.tls.selection="chain B"
refinement.main.number_of_macro_cycles=10
refinement.main.high_resolution=2.5
refinement.target_weights.wxc_scale=3
refinement.target_weights.wxu_scale=5
refinement.output.prefix=my_best_model
refinement.simulated_annealing.start_temperature=5000

which can also be formatted by grouping the parameters under the relevant scopes:

refinement.main {
   refine_tls=true
   number_of_macro_cycles=10
   high_resolution=2.5
}
refinement.tls {
   selection="chain A"
   selection="chain B"
}

refinement.target_weights {
   wxc_scale=3
   wxu_scale=5
}
refinement.output.prefix=my_best_model
refinement.simulated_annealing.start_temperature=5000

The easiest way to create a file like the my_parameters file is to generate a template file containing all parameters by using the command phenix.refine --show_defaults=all and then take the parameters that you want to use.

Refinement scenarios

Refining group isotropic B-factors:

% phenix.refine data.hkl model.pdb refine_adp_individual=false \
  refine_adp_group=true

Refinement with Simulated Annealing:

% phenix.refine data.hkl model.pdb simulated_annealing=true

Rigid body refinement

One rigid body group (whatever is in the PDB file is refined as a single rigid body):

% phenix.refine data.hkl model.pdb refine_site=false \
  refine_adp_individual=false rigid_body=true

Multiple groups (requires a basic knowledge of the PHENIX atom selection language, see below):

% phenix.refine data.hkl model.pdb main.refine_site=false \
  main.refine_adp_individual=false main.rigid_body=true \
  rigid_body.selection="chain A" rigid_body.selection="chain B"

Alternatively, one can create a parameter file, for example, my_parameters, containing the following lines:

refinement.rigid_body
{
  selection = chain A
  selection = chain B
}

Files like this can be created, for example, by copy-and-paste from the total list of parameters.

Note: the only selected parts of the model will be refined (the rest will remain fixed).

Combined: rigid body refinement + standard refinement (individual coordinates and ADP):
```
% phenix.refine data.hkl model.pdb my_parameters main.rigid_body=true
```

Using NCS restraints in standard refinement

Create a model.ncs file with the NCS selection:

refinement.ncs_restraint_group {
  reference = chain A
  selection = chain B
  selection = chain C
  selection = chain D
}

Specify model.ncs as an additional input when running phenix.refine:

% phenix.refine data.hkl model.pdb model.ncs

The NCS selections are incorporated in the model_refinement_#.eff and model_refinement_#.def files. I.e. you don't have to remember to specify model.ncs for future refinement cycles if you use the .def files.

Multiple independent refinement.ncs_restraint_group selection blocks are allowed:

refinement.ncs_restraint_group {
  reference = chain A
  selection = chain B
}
refinement.ncs_restraint_group {
  reference = chain C
  selection = chain D
}

Note: gaps in selected sequences are allowed - a sequence alignment is performed to detect insertions or deletions.

TLS refinement

To refine TLS parameters along with the coordinates and individual ADPs:

% phenix.refine data.hkl model.pdb refine_tls=true my_parameters

where, similar to the rigid body refinement, the selection for TLS groups has been made in a user created parameter file (my_parameters) as following:

refinement.tls {
  selection = chain A
  selection = chain B
}

Note: TLS parameters will be refined only for selected fragments

Note: in the absence of any selections TLS parameters, coordinates and individual ADPs will be refined for the whole model as a single group.

To run TLS refinement only (without coordinate and individual ADP refinement):

% phenix.refine data.hkl model.pdb refine_tls=true refine_site=false \
  refine_adp_individual=false

Refinement + water picking:

% phenix.refine data.hkl model.pdb ordered_solvent=true

Refining only coordinates:

% phenix.refine data.hkl model.pdb refine_adp_individual=false

Refining only isotropic B-factors:

% phenix.refine data.hkl model.pdb refine_site=false

Useful options

Changing the number of refinement cycles and minimizer iterations:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=5 \
  main.max_number_of_iterations=20

Creating R-free flags (if not present in the input reflection files):

% phenix.refine data.hkl model.pdb main.generate_r_free_flags=True

Modifying the automatically determined target weight for coordinate refinement:

% phenix.refine data.hkl model.pdb wxc_scale=5

Note: the default value for wxc_scale is 0.5. Increasing wxc_scale will make the X-ray target contribution greater (the automatically calculated weight is multipled by wxc_scale).

Note: wxc_scale=0 will completely exclude the experimental data from the refinement resulting in idealization of the stereochemistry. For stereochemistry idealization use the separate command:

% phenix.geometry_minimization [options] pdb_file [output_pdb_file]

Modifying the automatically determined target weight for ADP refinement:

% phenix.refine data.hkl model.pdb wxu_scale=3

Note: the default value for wxu_scale is 1.0. Increasing wxu_scale will make the X-ray target contribution greater and therefore the B-factors restraints weaker.

Specify the name for output files:

% phenix.refine data.hkl model.pdb output.prefix=lysozyme

Reflection output file format

At the end of refinement a file with FOBS, FMODEL, FOM, ALPHA, BETA, CV_FLAGS is written out. The default format for this file is MTZ. To write this data as a plain text file:

% phenix.refine data.hkl model.pdb output.prefix=lysozyme \
  write_refined_hkl_file=True

Note: FMODEL contains the total model structure factors including all scales:

FMODEL = k_overall * exp(-h*U_overall*ht) * (F_atoms + k_sol * exp(-B_sol*s^2) * F_mask)

Setting the resolution range for the refinement:

% phenix.refine data.hkl model.pdb main.low_resolution=15.0 \
  main.high_resolution=2.0

Bulk solvent correction and anisotropic scaling only:

% phenix.refine data.hkl model.pdb refine_site=false \
  refine_adp_individual=false

Default refinement with user specified X-ray target function:

% phenix.refine data.hkl model.pdb main.target=ls

Manually setting the target weight for coordinate refinement (instead of using the default automatic choice):

% phenix.refine data.hkl model.pdb target_weights.fix_wxc=10

Manually setting the target weight for ADP refinement (instead of using the default automatic choice):

% phenix.refine data.hkl model.pdb target_weights.fix_wxu=10

Modifying the initial model before refinement starts

Randomly shake atomic coordinates:

% phenix.refine data.hkl model.pdb main.shake_start_model=0.3

Randomize isotropic B-factors:

% phenix.refine data.hkl model.pdb adp.iso.set_biso_random=true

Set all B-factors to given value:

% phenix.refine data.hkl model.pdb adp.iso.set_biso=25

Set all B-factors to Wilson B:

% phenix.refine data.hkl model.pdb adp.iso.set_biso_to_wilson_b=true

To write out the modified model (without any refinement), add: main.number_of_macro_cycles=0, e.g.:

% phenix.refine data.hkl model.pdb adp.iso.set_biso=25 \
  main.number_of_macro_cycles=0

Turn OFF/ON stereochemical restraints:

% phenix.refine data.hkl model.pdb target_weights.wc=0

Turn OFF/ON restraints on isotropic B-factors:

% phenix.refine data.hkl model.pdb target_weights.wu=0

Refinement using FFT (fast, default) or direct (slow) structure factors caclulation algorithm:

% phenix.refine data.hkl model.pdb main.sf_algorithm=fft

or:

% phenix.refine data.hkl model.pdb main.sf_algorithm=direct

Using phenix.refine to calculate structure factors

Use phenix.refine as a structure factors calculator

Calculate Fcalc from atomic model (no solvent modeling or scaling):

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=0 \
  main.bulk_solvent_and_scale=False

Calculate Fcalc from atomic model including bulk solvent and all scales:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \
  main.refine_site=false main.refine_adp_individual=false

Note: the number of calculated structure factors will be exactly the same as the number of observed data (Fobs) provided in the input reflection files.

Note: If desired, resolution limits can be applied, for example:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \
  main.low_resolution=15.0 main.high_resolution=2.0

Atom selection examples

All C-alpha atoms (not case sensitive):

name ca

All atoms with H in the name (* is a wildcard character):

name *H*

Atoms names with * (backslash disables wildcard function):

name o2\*

Atom names with spaces:

name 'O 1'

Atom names with primes don't necessarily have to be quoted:

name o2'

Boolean and, or and not:

resname ALA and (name ca or name c or name n or name o)
chain a and not altid b
resid 120 and icode c and model 2
segid a and element c and charge 2+ and anisou

Residue 188:

resseq 188

resid is a synonym for resseq:

resid 188

All residues from 188 to the end (including 188):

resseq 188:

Alternative to the previous:

resseq 188-

All residues from the beginning to 188 (including 188):

resseq :188
resseq -188

Residues 2 through 10 (including 2 and 10):

resseq 2:10
resseq 2-10

"Smart" selections:

resname ALA and backbone
resname ALA and sidechain
peptide backbone
rna backbone or dna backbone
water or nucleotide
dna and not (phosphate or ribose)
within(5, (nucleotide or peptide) backbone)