phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated Model Building and Rebuilding using AutoBuild

Author(s)
Purpose
Purpose of the AutoBuild Wizard
Usage
How the AutoBuild Wizard works
Automation and user control
Core modules in the AutoBuild Wizard
How to run the AutoBuild Wizard
What the AutoBuild wizard needs to run
...and optional files
Specifying which columns of data to use from input data files
Specifying other general parameters
Picking waters in AutoBuild
Keeping waters from your input file in AutoBuild
Specifying phenix.refine parameters
Specifying resolve/resolve_pattern parameters
Including ligand coordinates in AutoBuild
Specifying arbitrary commands and cif files for phenix.refine
Output files from AutoBuild
Standard building, rebuild_in_place, and multiple-models
Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel
Model editing during rebuilding with the Coot-PHENIX interface
Resolution limits in AutoBuild
Examples
Run AutoBuild automatically after AutoSol
Run AutoBuild beginning with experimental data
Merge in hires data
Make a SA-omit map around atoms in target.pdb
Make a simple composite omit map
Make an iterative-build omit map around atoms in target.pdb
Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb
Create one very good rebuilt model
Touch up a model
Create 20 very good rebuilt models that are as different as possible
Build an RNA chain
Build a DNA chain
Just make maps; don't do any building.
Just calculate a prime-and-switch map
Possible Problems
General limitations
Specific limitations and problems
Literature
Additional information
List of all AutoBuild keywords

Author(s)

  • AutoBuild Wizard: Tom Terwilliger
  • PHENIX GUI and PDS Server: Nigel W. Moriarty
  • phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams
  • RESOLVE: Tom Terwilliger
  • TEXTAL: Kreshna Gopal, Thomas Ioerger, Rita Pai, Tod Romo, James Sacchettini, Erik McKee, Lalji Kanbi
  • phenix.xtriage: Peter Zwart

Purpose

Purpose of the AutoBuild Wizard

The purpose of the AutoBuild Wizard is to provide a highly automated system for model rebuilding and completion. The Wizard design allows the user to specify data files and parameters through an interactive GUI, or alternatively through keyworded scripts. The AutoBuild Wizard begins with datafiles with structure factor amplitudes and uncertainties, along with either experimental phase information or a starting model, carries out cycles of model-building and refinement alternating with model-based density modification, and producing a relatively complete atomic model.

The AutoBuild Wizard uses RESOLVE, (optionally also TEXTAL), xtriage and phenix.refine to build an atomic model, refine it, and improve it with iterative density modification, refinement, and model-building

The Wizard begins with either experimental phases (i.e., from AutoSol) or with an atomic model that can be used to generate calculated phases. The AutoBuild Wizard produces a refined model that can be nearly complete if the data are strong and the resolution is about 2.5 A or better. At lower resolutions (2.5 - 3 A) the model may be less complete and at resolutions > 3A the model may be quite incomplete and not well refined.

The AutoBuild Wizard can be used to generate OMIT maps (simple omit, SA-omit, iterative-build omit) that can cover the entire unit cell or specific residues in a PDB file.

The AutoBuild Wizard can generate a set of models compatible with experimental data (multiple_models)

Usage

The AutoBuild Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here.

How the AutoBuild Wizard works

The AutoBuild Wizard begins with experimental structure factor amplitudes, along with either experimental or model-based estimates of crystallographic phases. The phase information is improved by using statistical density modification to improve the correlation of NCS-related density in the map (if present) and to improve the match of the distribution of electron densities in the map with those expected from a model map. This improved map is then used to build and refine an atomic model.

In subsequent cycles, the models from previous cycles are used as a source of phase information in statistical density modification, iteratively improving the quality of the map used for model-building.

Additionally, during the first few cycles additional phase information is obtained by detecting and enhancing (1) the presence of commonly-found local patterns of density in the map, and (2) the presence of density in the shape of helices and strands. The final model obtained is analyzed for residue-based map correlation and density at the coordinates of individual atoms, and an analysis including a summary of atoms and residues that are in strong, moderate, or weak density and out of density is provided.

Automation and user control

The AutoBuild Wizard has been designed for ease of use combined with maximal user control, with as many parameters set automatically by the Wizard as possible, but maintaining parameters accessible to the user through a GUI and through keyword-based scripts. The Wizard uses the input/output routines of the cctbx library, allowing data files of many different formats so that the user does not have to convert their data to any particular format before using the Wizard. Use of the phenix.refine refinement package in the AutoBuild Wizard allows a high degree of automation of refinement so that the neither user nor Wizard is required to specify parameters for refinement. The phenix.refine package automatically includes a bulk solvent model and automatically places solvent molecules.

Core modules in the AutoBuild Wizard

The five core modules in the AutoBuild Wizard are

  • (1) building a new model into an electron density map
  • (2) rebuilding an existing model
  • (3) refinement
  • (4) iterative model- building beginning from experimental phase information, and
  • (5) iterative model-building beginning from a model.

The standard procedures available in the AutoBuild Wizard that are based on these modules include:

  • (a) model-building and completion starting from experimental phases,
  • (b) rebuilding a model from scratch, with or without experimental phase information, and
  • (c) rebuilding a model in place, maintaining connectivity and sequence register.

Starting from a set of experimental phases and structure factor amplitudes, normally procedure (a) is carried out, and then the resulting model is rebuilt with procedure (b).

Starting from a model (e.g., from molecular replacement) and experimental structure factor amplitudes, procedure (c) is normally carried out if the starting model differs less than about 50% in sequence from the desired model, and otherwise procedure (b) is used.

How to run the AutoBuild Wizard

Running the AutoBuild Wizard is easy. For example, from the command-line you can type:

phenix.autobuild data=w1.sca seq.dat model=coords.pdb

The AutoBuild Wizard will carry out iterative model-building, density modification and refinement based on the data in w1.sca and the model in coords.pdb, editing the model as necessary to match the sequence in seq.dat.

What the AutoBuild wizard needs to run

  • (1) a data file, optionally with phases and HL coeffs and freeR flag (w1.sca or data=w1.sca)
  • (2) a sequence file (seq.dat or seq_file=seq.dat) or a model (coords.pdb or model=coords.pdb)

...and optional files

  • (3) coefficients for a starting map (map_file=resolve.mtz)
  • (4) a file for refinement (refinement_file=exptl_fobs_freeR_flags.mtz)
  • (5) a high-resolution datafile (hires_file=high_res.sca)

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file ref.mtz has columns FP SIGFP and FreeR then you might specify

refinement_file=ref.mtz
input_refinement_labels="FP SIGFP None None None None None None FreeR"

The keywords for labels and anticipated input labels (program labels) are:

input_labels (for data file): FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag
input_refinement_labels: FP SIGFP FreeR_flag
input_map_labels: FP PHIB FOM
input_hires_labels: FP SIGFP FreeR_flag

You can find out all the possible label strings in a data file that you might use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

NOTES: if your data files contain a mixture of amplitude and intensity data then only the amplitude data is available. If you have only intensity data in a data file and want to select specific columns, then you need to specify the column names as they are after importing the data and conversion to amplitudes (see below under General Limitations for details).

Specifying other general parameters

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

data=w1.sca       # data file
model=coords.pdb  # starting model
seq_file=seq.dat  # sequence file
map_file=map_coeffs.mtz # coefficients for a starting map for building
resolution=3     # dmin of 3 A
s_annealing=True  # use simulated annealing refinement at start of each cycle
n_cycle_build_max=5  # max number of build cycles (starting from experimental phases)
n_cycle_rebuild_max=5  # max number of rebuild cycles (starting from a model)

Picking waters in AutoBuild

By default AutoBuild will instruct phenix.refine to pick waters using its standard procedure. This means that if the resolution of the data is high enough (typically 3 A) then waters are placed.

You can tell AutoBuild not to have phenix.refine pick waters with the command:

place_waters=False
If you want to place waters at a lower resolution, you will need to reset the low-resolution cutoff for placing waters in phenix.refine. You would do that in a "refinement_params.eff" file containing lines like these (see below for passing parameters to phenix.refine with an ".eff" file):
refinement {
  ordered_solvent {
    low_resolution = 2.8
  }
} 

Keeping waters from your input file in AutoBuild

You can tell AutoBuild to keep the waters in your input file when you are using rebuild_in_place (the default is to toss them and replace them with new ones). You can say,

keep_input_waters=True
place_waters=No
NOTE: If you specify keep_input_waters=True you should also specify either "place_waters=No" or "keep_pdb_atoms=No" . This is because if place_waters=Yes and keep_pdb_atoms=Yes then phenix.refine will add waters and then the wizard will keep the new waters from the new PDB file created by phenix.refine preferentially over the ones in your input file.

Specifying phenix.refine parameters

You can control phenix.refine parameters that are not specified directly by AutoBuild using a refinement parameters (.eff) file:

refine_eff_file=refinement_params.eff    # set any phenix.refine params not set by AutoBuild
This file might contain a twin-law for refinement:
refinement {
  twinning {
    twin_law = "-k, -h, -l"
  }
}

You can put any phenix.refine parameters in this file, but a few parameters that are set directly by AutoBuild override your inputs from the refine_eff_file. These parameters are listed below.

Refinement parameters that must be set using AutoBuild Wizard keywords (overwriting any values provided by user in input_eff_file)

phenix.refine keyword

Wizard keyword(s) and notes

refinement.main.number_of_macro_cycles

ncycle_refine

refinement.main.simulated_annealing

s_annealing (only applies to 1st refinement in rebuild. SA in any other refinements controlled by input_eff_file, if any)

refinement.ncs.find_automatically

refine_with_ncs=True turns on automatic ncs search

refinement.main.ncs

refine_with_ncs=True turns on ncs

refinement.ncs.coordinate_sigma

Normally not set by Wizard. However if the Wizard keyword ncs_refine_coord_sigma_from_rmsd is True then the ncs coordinate sigma is equal to ncs_refine_coord_sigma_from_rmsd_ratio times the rmsd among ncs copies

refinement.main.random_seed

i_ran_seed sets the random seed at the beginning of a Wizard... this affects refinement.main.random_seed but does not set it to the value of i_ran_seed (because i_ran_seed gets updated by several different routines)

refinement.main.ordered_solvent

place_waters=True will set ordered_solvent to True. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement.

refinement.main.ordered_solvent

place_waters_in_combine=True will set ordered_solvent to True, only applying this to the final combination step of multiple-model generation. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement.

refinement.ordered_solvent.low_resolution

ordered_solvent_low_resolution=3.0 (default) will set the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) to 3 A. If the resolution used for refinement is larger than the value of ordered_solvent_low_resolution then ordered solvent is not added.

refinement.main.use_experimental_phases

use_mlhl=True will set refinement.main.use_experimental_phases to True

refinement.refine.strategy

The Wizard keywords refine refine_b refine_xyz all affect refinement.refine.strategy. If refine=True then refinement is carried out. If refine_b=True (default) isotropic displacement factors are refined. If refine_xyz=True (default) coordinates are refined.

refinement.main.occupancy_max

max_occ=1.0 sets the value of refinement.main.occupancy_max to 1.0. Default is to do nothing and use the default from phenix.refine (1.0)

refinement.refine.occupancies.individual

The combination of Wizard keywords of semet=True and refine_se_occ=True will add "(name SE)" to the value of refinement.refine.occupancies.individual. You can add to your .eff file other names of atoms to have occupancies refined as well.

refinement.main.high_resolution

Either of the Wizard keywords refinement_resolution and resolution will set the value of refinement.main.high_resolution, with refinement_resolution being used if available.

refinement.pdb_interpretation.link_distance_cutoff

link_distance_cutoff

The following parameters controlling phenix.refine output are set directly in AutoBuild and cannot be set by the user

  • refinement.output.write_eff_file
  • refinement.output.write_geo_file
  • refinement.output.write_def_file
  • refinement.output.write_maps
  • refinement.output.write_map_coefficients

Specifying resolve/resolve_pattern parameters

Similarly, you can control resolve and resolve_pattern parameters. For these parameters, your inputs will not be overridden by AutoBuild. The format is a little tricky: you have to put two sets of quotes around the command like this:

resolve_command="'resolution 200 3'"    # NOTE ' and " quotes
This will put the text
resolution 200 3
at the end of every temporary command file created to run resolve. (This is why it is not overridden by AutoBuild commands; they will all come before your commands in the resolve command file.) Note that some commands in resolve may be incompatible with this usage.

Including ligand coordinates in AutoBuild

If your input PDB file contains ligands (anything other than solvent that is not protein if your chain_type=PROTEIN, for example) then by default these ligands will be kept, used in refinement, and written out to your output PDB file. Any solvent molecules will by default be discarded. You can change this behavior by changing the keywords from these defaults:

keep_input_ligands=True
keep_input_waters=False
The AutoBuild Wizard will use phenix.elbow to generate geometries for any ligands that are not recognized.

You can also tell AutoBuild to add the contents of any PDB files that you wish to supply to the current version of the structure just before refinement, so all the refined models produced contain whatever AutoBuild has built, plus the contents of these PDB files. This can be done through the GUI, the command-line, or a script. In the command-line version you do this with:

input_lig_file_list=my_ligand.pdb

NOTE: The files in input_lig_file_list will be edited to make them all HETATM records to tell AutoBuild to ignore these residues in rebuilding.

NOTE You may need to tell phenix.refine about the geometry of your ligands. You will get an error message if the ligand is not recognized and an automatic run of phenix.elbow does not succeed in generating your ligand. In that case you will want to run phenix.elbow to create a cif definition file for this ligand:

phenix.elbow my_ligand.pdb --id=LIG
where LIG is the 3-letter ID code that you use in my_ligand.pdb to identify your ligand. If the automatic run does not work you may need to give phenix.elbow additional information to generate your ligand.

Once phenix.elbow has generated your ligand you can use the keyword "cif_def_file_list" to tell AutoBuild about this ligand:

cif_def_file_list=elbow.LIG.my_ligand.pdb.cif

Specifying arbitrary commands and cif files for phenix.refine

You can tell AutoBuild to apply any set of cif definitions to the model during refinement by using a combination of specification files and the commands cif_def_file_list and refine_eff_file_list:

refine_eff_file_list=link.eff cif_def_file_list=link.cif
This example comes from the phenix.refine manual page in which a link is specified in a cif definition file link.cif:
 data_mod_5pho
#
loop_
_chem_mod_atom.mod_id
_chem_mod_atom.function
_chem_mod_atom.atom_id
_chem_mod_atom.new_atom_id
_chem_mod_atom.new_type_symbol
_chem_mod_atom.new_type_energy
_chem_mod_atom.new_partial_charge
 5pho     add      .      O5T    O    OH      .
loop_
_chem_mod_bond.mod_id
_chem_mod_bond.function
_chem_mod_bond.atom_id_1
_chem_mod_bond.atom_id_2
_chem_mod_bond.new_type
_chem_mod_bond.new_value_dist
_chem_mod_bond.new_value_dist_esd
 5pho     add      O5T     P         coval        1.520    0.020

and this is applied with a parameters file link.eff:

 refinement.pdb_interpretation.apply_cif_modification
{
  data_mod = 5pho
  residue_selection = resname GUA and name O5T
}

You can have any number of cif files and parameters files.

Output files from AutoBuild

When you run AutoBuild the output files will be in a subdirectory with your run number:

AutoBuild_run_1_/   # subdirectory with results
  • A summary file listing the results of the run and the other files produced:
    AutoBuild_summary.dat  # overall summary
    
  • A warnings file listing any warnings about the run
    AutoBuild_warnings.dat  # any warnings
    
  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    AutoBuild_Facts.dat   # all Facts about the run
    
  • Final refined model
    overall_best.pdb
    

    NOTE: The "overall_best.pdb" file is always the current best model. Similarly "overall_best_denmod_map_coeffs.mtz" is always the best map_coefficients file. The AutoBuild_summary.dat file lists the names of the current best set of files. The contents of "overall_best.pdb" and of the best model listed in AutoBuild_summary.dat will be the same.

  • Final map coefficients used to build refined model. Use FP PHIM FOMM in maps. Normally this is a density-modified map from resolve. See also the map coefficients from phenix.refine below.
    overall_best_denmod_map_coeffs.mtz
    
  • Final sigmaA-weighted 2mFo-DFc and Fo-Fc map coefficients from phenix.refine based on overall_best.pdb final model. The map coefficients are 2FOFCWT PH2FOFCWT for the 2mFo-DFc map and FOFC and PHFOFC for the Fo-Fc difference map. See also the map coefficients from density modification above.
    overall_best_refine_map_coeffs.mtz
    
  • MTZ file with FP, phases and HL coeffs if present, and freeR_flags used in refinement
    exptl_fobs_phases_freeR_flags.mtz
    
  • Final log file for model-building
    overall_best.log
    
  • Final log file for refinement
    overall_best.log_refine
    
  • Evaluation of fit of model to map
    overall_best.log_eval
    
  • Summary of NCS information
    ncs_info.ncs
    

Standard building, rebuild_in_place, and multiple-models

The AutoBuild Wizard has two overall methods for building a model.

The first method (standard build) is to build a model from scratch. This involves identification of where helices (and strands, for proteins) are located, extension using fragment libraries, connection of segments, identification of side-chains, and sequence alignment. These methods are augmented in the standard building procedure by loop-fitting and building model outside of the region that has already been built.

The second method (rebuild_in_place) takes an existing model and rebuilds it without adding or deleting any residues and without changing the connectivity of the chain. The way this works is a segment of the model is deleted and then is filled-in again by rebuilding from the remaining ends. This is repeated for overlapping segments covering the entire model.

The multiple-models approach really has two levels of multiple models. At the first level, several (multiple_models_group_number, default is number_of_parallel_models) models are built (using rebuild_in_place) and are then recombined into a single good model. At the next level, this whole process may be done more than once (multiple_models_number times), yielding several very good models. By default, if you ask for rebuild_in_place, then you will get a single very good model, created by running rebuild_in_place several times and recombining the models.

Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel

The AutoBuild Wizard is set up to take advantage of multi-processor machines or batch queues by splitting the work into separate tasks. See Tutorial 4: Iterative model-building, density modification and refinement starting from experimental phases and Tutorial 6: Automatically rebuilding a structure solved by Molecular Replacement for a description of the method used by the AutoBuild Wizard to run build jobs as sub-processes and to combine the results into single models.

Here are the key factors that determine how splitting model-building into batches and running them on one or more processors works:

  • nbatch is the number of batches of work. As long as nbatch is fixed then the results of running the Wizard will be the same, no matter how many processors are used. It is most efficient however to have nbatch be at least as large as nproc, the number of processors. Otherwise some processors may end up doing nothing. The default is nbatch=3. The value of nbatch is used to set other defaults (such as number_of_parallel_models).
  • nproc is the number of processors to split the work among
  • number_of_parallel_models is the number of models to build at once. The default is to set number_of_parallel_models=nbatch. This affects both standard building (number_of_parallel_models sets how many initial models to build) and rebuild_in_place (number_of_parallel_models determines whether a single model is built or a set of models are built and recombined into a single model).

Model editing during rebuilding with the Coot-PHENIX interface

The AutoBuild Wizard allows you to edit a model and give it back to the Wizard during the iterative model-building, density modification and refinement process. The Wizard will consider the model that you give it along with the models that it generates automatically, and will choose the parts of your model that fit the density better than other models.

You can edit a model using the PHENIX-Coot interface. This interface is accessible through the GUI and via the command-line. Using the GUI, when a model has been produced by the AutoBuild Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start Coot with your current map and model. If you are running from the command-line, you can open a new window and type:

phenix.autobuild coot 
which will do the same (provided the necessary map and model are ready).

When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. You can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process.

The precise use of the model that you save depends on the type of model-building that is being carried out by the AutoBuild Wizard. If you are using rebuild_in_place then the main-chain and side-chains of the model are considered as replacements for the current working model. Any ligands or unrecognized residues are (by default) not rebuilt but are included in refinement. By default, solvent in the model is ignored. If you are not using rebuild_in_place, only the main-chain conformation is considered, and the side-chains are ignored. Ligands (but not solvent) in the model are (by default) kept and included in refinement.

As the AutoBuild Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX.

Resolution limits in AutoBuild

There are several resolution limits used in AutoBuild. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name

Description

How default value is set

resolution

Overall resolution. Used as high-resolution limit for density modification. Used as default for refinement resolution and model-building resolution if they are not set.

Resolution of input datafile. If a hires datafile is provided, the resolution of that data is used.

refinement_resolution

Resolution for refinement

value of "resolution"

resolution_build

Resolution for model-building

value of "resolution"

overall_resolution

Resolution to truncate all data. This should only be used if you need to truncate the data in order to get the Wizard to run. It causes the Wizard to ignore all data at higher resolution than overall_resolution. It is normally better to use the resolution keyword to define the resolution limits, as that will keep all the data in the output and working files.

None

multiple_models_starting_resolution

Resolution for the initial rebuilding of a model in the multiple-models procedure. Normally a low resolution to generate diversity.

4 A by default

Examples

Run AutoBuild automatically after AutoSol

phenix.autobuild after_autosol

Run AutoBuild beginning with experimental data

phenix.autobuild data=solve_1.mtz seq_file=seq.dat

Merge in hires data

phenix.autobuild data=solve_2.mtz hires_file=w1.sca  seq_file=seq.dat

Make a SA-omit map around atoms in target.pdb

phenix.autobuild data=data.mtz model=coords.pdb omit_box_pdb=target.pdb   composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make a simple composite omit map

phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=simple_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make an iterative-build omit map around atoms in target.pdb

phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=target.pdb \
   composite_omit_type=iterative_build_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb

phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=coords.pdb \
   omit_res_start_list=3 omit_res_end_list=4 omit_chain_list=A   \
   composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note 1: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz). (Note 2: Although the variables are omit_res_start_list omit_res_end_list omit_chain_list, you can only specify one region of a molecule to omit, not a list of them, all others are ignored.)

Create one very good rebuilt model

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
  include_input_model=True  \
  multiple_models_number=1 n_cycle_rebuild_max=5
The final model will be in the file MULTIPLE_MODELS/all_models.pdb (this file will contain just one model).

Note that this procedure will keep the sequence that is present in coords.pdb. If you supply a sequence file it will edit the sequence of coords.pdb to match your sequence file and discard any residues that do not match. (If you want to input a sequence file but not edit the sequence in coords.pdb and not discard any non-matching residues, then specify also edit_pdb=False.)

Note also that if include_input_model=True then no randomization cycle will be carried out and multiple_models_starting_resolution is ignored.

Touch up a model

phenix.autobuild data=data.mtz model=coords.pdb \
touch_up=True worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8
You can rebuild just the worst parts of your model by settting touch_up=True. You can decide what parts to rebuild based on a minimum model-map correlation (by residue). You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both.

Create 20 very good rebuilt models that are as different as possible

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
   multiple_models_number=20 n_cycle_rebuild_max=5
The 20 models will be in the file MULTIPLE_MODELS/all_models.pdb. This procedure is useful for generating an ensemble of models that are each individually consistent with the data, and yet are diverse. The variation among these models is an indication of the uncertainty in each of the models. Note that the ensemble of models is not a representation of the ensemble of structures that is truly present in the crystal.

Build an RNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=RNA

Build a DNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=DNA

Just make maps; don't do any building.

phenix.autobuild data=data.mtz model=coords.pdb maps_only=True     

Just calculate a prime-and-switch map

phenix.autobuild data=data.mtz solvent_fraction=.6 \
   ps_in_rebuild=True model=coords.pdb maps_only=True 
The output prime-and-switch map will be in the file prime_and_switch.mtz.

Possible Problems

General limitations

  • The AutoBuild wizard edits input PDB files to remove multiple conformations. It will also renumber residues if the file contains residues with insertion codes. All references to residue numbers (e.g. rebuild_res_start_list) refer to the edited, renumbered model. This model can be found in the AutoBuild_run_1_ (or appropriate) directory as "edited_pdb.pdb".
  • The AutoBuild wizard expects residue numbers to not decrease along a chain. It will stop if residue 250 in chain B is found between residues 116 and 117 in the same chain, for example. To get around this, use insertion codes (make residue 250 residue 116A instead).
  • The AutoBuild model-building can only build one type of chain at a time (default chain_type='PROTEIN'; other choices are RNA and DNA). If you supply a PDB file containing more than one type of chain for rebuilding, then all the residues that are not that type of chain are treated as ligands and are (by default, keep_input_ligands=True) included in refinement but not in rebuilding. Any input solvent molecules are (by default, keep_input_waters=False) ignored.

    You can include more than one type of chain in rebuilding by supplying one type of chains as ligands with input_lig_file_list and rebuilding another type:

    chain_type=PROTEIN  # build only protein
    input_lig_file_list=MyDNA.pdb  # just read in DNA coordinates and include in refinement
    
    In this case only protein chains will be built, but the DNA coordinates in MyDNA.pdb will be included in all refinements and will be written out to the final coordinate file. You may wish to add the keyword:
    keep_pdb_atoms=False  #keep the ligand atoms if model (pdb) and ligand overlap
    which will tell AutoBuild that the ligand (DNA) atoms are to be kept if the model that is being built (protein) overlaps with it. (The default is to keep the model that is being built and to discard any ligand atoms that overlap).

    This whole process is likely to require substantial editing of the PDB files by hand because when you build DNA, a lot of chains are going to be built into the protein region, and when you build protein, it is going to be accidentally built into the DNA.

  • Any file in input_lig_file_list containing ATOM records will have them replaced with HETATM records. This is so that the rebuild_in_place algorithm does not try to use them in rebuilding.
  • The ligand generation routine in phenix.elbow will not generate heme groups at this point. Most other ligands can be automatically generated.
  • If your input data file contains both intensity data and amplitude data, only the amplitude data is exposed in the AutoBuild Wizard. If you want to use the intensity data then you have to create a file that does not have amplitude data in it.
  • If your input data file has only intensity data and you wish to specify which columns of data the AutoBuild Wizard is to use, then you have to specify the names that the columns will have AFTER importing the data and conversion to amplitudes, not the original column names.

    These column names may not be obvious. Here is how to find out what they will be. Do a quick dummy run like this with XXX as labels:

    phenix.autobuild w2.sca coords.pdb input_labels="XXX XXX"
    
    The Wizard will print out a list of available labels like this:
    Sorry, the label XXX does not exist as an amplitude array in
    the input_data_file ImportRawData_run_8_/w2_PHX.mtz
    ...available labels are: ['w2', 'SIGw2', 'None']
    
    Then you know that the correct command is:
    phenix.autobuild w2.sca coords.pdb input_labels="w2 SIGw2"
    
  • The AutoBuild Wizard cannot build modified residues. If you supply a model with modified residues, these will be taken out of the chain and treated as ligands, and the chain will be broken at that point. By default the modified residues will be added to your model just before refinement and a cif definitions file will be automatically generated for these residues. You can also add these residues with the input_lig_file_list procedure if you want.
  • The AutoBuild Wizard will not build very short chains unless you set the variable group_ca_length (default=4 for building a model from scratch) to a smaller number. The shortest chain that will be built is group_ca_length. If you use rebuild_in_place, then the default shortest chain allowed is 1 residue, so any part of a model you supply is rebuilt.

Specific limitations and problems

  • The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoBuild wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.
  • The AutoBuild Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, N. W. Moriarty, P. H. Zwart, L.-W. Hung, R. J. Read, and P. D. Adams Acta Cryst. D64, 61-69 (2008)
[pdf]
Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, P. D. Adams, N. W. Moriarty P. H. Zwart, R. J. Read, D. Turk and L.-W. Hung Acta Cryst. D63, 597-610 (2007)
[pdf]
Using prime-and-switch phasing to reduce model bias in molecular replacement. T. C. Terwilliger Acta Cryst. D60, 2144-2149 (2004)
[pdf]
Improving macromolecular atomic models at moderate resolution by automated iterative model building, statistical density modification and refinement. T.C. Terwilliger. Acta Cryst. D59, 1174-1182 (2003)
[pdf]
Statistical density modification using local pattern matching. T.C. Terwilliger. Acta Cryst. D59, 1688-1701 (2003)
[pdf]
Automated side-chain model building and sequence assignment by template matching. T.C. Terwilliger. Acta Cryst. D59, 45-49 (2003)
[pdf]
Automated main-chain model building by template matching and iterative fragment extension. T.C. Terwilliger. Acta Cryst. D59, 38-44 (2003)
[pdf]
Rapid automatic NCS identification using heavy-atom substructures T.C. Terwilliger. Acta Cryst. D58, 2213-2215 (2002)
[pdf]
Statistical density modification with non-crystallographic symmetry T.C. Terwilliger. Acta Cryst. D58, 2082-2086 (2002)
[pdf]
Maximum likelihood density modification T. C. Terwilliger Acta Cryst. D56 , 965-972 (2000)
[pdf]
Maximum-likelihood density modification with pattern recognition of structural motifs. T. C. Terwilliger Acta Cryst. D57 , 1755-1762 (2001)
[pdf]
Map-likelihood phasing T. C. Terwilliger Acta Cryst. D57 , 1763-1775 (2001)
[pdf]

Additional information

List of all AutoBuild keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
autobuild
   write_run_directory_to_file= None Writes the full name of a run directory
                                to the specified file. This can be used as a
                                call-back to tell a script where the output is
                                going to go. (Command-line only)
   coot= None Set coot to True and optionally run=[run-number] to run Coot
         with the current model and map for run run-number. In some wizards
         (AutoBuild) you can edit the model and give it back to PHENIX to use
         as part of the model-building process. If you just say coot then the
         facts for the highest-numbered existing run will be shown.
         (Command-line only)
   ignore_blanks= None ignore_blanks allows you to have a command-line keyword
                  with a blank value like "input_lig_file_list="
   stop= None You can stop the current wizard with "stopwizard" or "stop". If
         you type "phenix.autobuild run=3 stop" then this will stop run 3 of
         autobuild. (Command-line only)
   display_facts= None Set display_facts to True and optionally
                  run=[run-number] to display the facts for run run-number. If
                  you just say display_facts then the facts for the
                  highest-numbered existing run will be shown. (Command-line
                  only)
   display_summary= None Set display_summary to True and optionally
                    run=[run-number] to show the summary for run run-number.
                    If you just say display_summary then the summary for the
                    highest-numbered existing run will be shown. (Command-line
                    only)
   carry_on= None Set carry_on to True to carry on with highest-numbered run
             from where you left off. (Command-line only)
   run= None Set run to n to continue with run n where you left off.
        (Command-line only)
   copy_run= None Set copy_run to n to copy run n to a new run and continue
             where you left off. (Command-line only)
   display_runs= None List all runs for this wizard. (Command-line only)
   delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
   display_labels= None display_labels=test.mtz will list all the labels that
                   identify data in test.mtz. You can use the label strings
                   that are produced in AutoSol to identify which data to use
                   from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" #
                   the entire string in quotes counts here You can use the
                   individual labels from these strings as identifiers for
                   data columns in AutoSol and AutoBuild like this:
                   input_refinement_labels="FP SIGFP FreeR_flags" # each
                   individual label counts
   dry_run= False Just read in and check parameter names
   data= None Datafile (alias for input_data_file) This file can be a .sca or
         mtz or other standard file. The Wizard will guess the column
         identification. You can specify the column labels to use with:
         input_labels='FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag'
         Substitute any labels you do not have with None. If you only have
         myFP and mysigFP you can just say input_labels='myFP mysigFP'.
         (Command-line only)
   model= None PDB file with starting model (alias for input_pdb_file) NOTE:
          If your PDB file has been previously refined, then please make sure
          that you provide the free R flags that were used in that refinement.
          These can come from the data file or from the refinement_file.
          (Command-line only).
   seq_file= Auto Sequence file (alias for input_seq_file). The format is
             plain text, with chains separated by >>>: >>>
             any text here ignored, next lines are sequence; any blanks
             ignored galmtdeqr ragwqst >>> indicate new chain with
             '>>>' asqrarpt >>> input 1 copy of each unique
             chain. (Command-line only)
   map_file= Auto MTZ file containing starting map (alias for input_map_file)
             This file must be a mtz file. The Wizard will guess the column
             identification. You can specify the column labels to use with:
             input_map_labels='FP PHIB FOM' Substitute any labels you do not
             have with None. If you only have myFP and myPHIB you can just say
             input_map_labels='myFP myPHIB'. (Command-line only)
   refinement_file= Auto File for refinement (alias for input_refinement_file)
                    This file can be a .sca or mtz or other standard file.
                    This file will be merged with your data file, with any
                    phase information coming from your data file. If this file
                    has free R flags, they will be used, otherwise if the data
                    file has them, those will be used, otherwise they will be
                    generated. The Wizard will guess the column
                    identification. You can specify the column labels to use
                    with: input_refinement_labels='FP SIGFP FreeR_flag'
                    Substitute any labels you do not have with None. If you
                    only have myFP and mysigFP you can just say
                    input_refinement_labels='myFP mysigFP'. (Command-line
                    only).
   hires_file= Auto File with high-resolution data (alias for
               input_hires_file) This file can be a .sca or mtz or other
               standard file. The Wizard will guess the column identification.
               You can specify the column labels to use with:
               input_hires_labels='FP SIGFP'. (Command-line only)
   acceptable_r= 0.25 Used to decide whether the model is acceptable enough to
                 quit if it is not improving much. A good value is 0.25
   after_autosol= Yes *No True False You can specify that you want to continue
                  on starting with the highest-scoring run of AutoSol.
   allow_negative_residues= Yes *No True False Normally the wizard does not
                            allow negative residue numbers, and all residues
                            with negative numbers are rejected when they are
                            read in. You can allow them if you wish.
   background= *Yes No True False When you specify nproc=nn, you can run the
               jobs in background (default if nproc is greater than 1) or
               foreground (default if nproc=1). If you set run_command=qsub
               (or otherwise submit to a batch queue), then you should set
               background=False, so that the batch queue can keep track of
               your runs. There is no need to use background=True in this case
               because all the runs go as controlled by your batch system. If
               you use run_command=csh (or similar, csh is default) then
               normally you will use background=True so that all the jobs run
               simultaneously.
   background_map= None You can supply an mtz file (REQUIRED LABELS: FP PHIM
                   FOMM) to use as map coefficients to calculate the electron
                   density in all points in an omit map that are not part of
                   any omitted region. (Default="")
   boundary_background_map= None You can supply an mtz file (REQUIRED LABELS:
                            FP PHIM FOMM) to use as map coefficients to
                            calculate the electron density in all points in
                            the boundary map that are not part of any omitted
                            region. (Default="")
   build_outside= *Yes No True False Define whether to use the BuildOutside
                  module in build_model
   build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build
               models with RESOLVE and TEXTAL or either one, and how many
               different models to build with RESOLVE. The more you build, the
               more likely to get a complete model. Note that rebuild_in_place
               can only be carried out with RESOLVE model-building
   cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta gamma
   chain_type= *Auto PROTEIN DNA RNA You can specify whether to build protein,
               DNA, or RNA chains. At present you can only build one of these
               in a single run. If you have both DNA and protein, build one
               first, then run AutoBuild again, supplying the prebuilt model
               in the "input_lig_file_list" and build the other. NOTE: default
               for this keyword is Auto, which means "carry out normal process
               to guess this keyword". The process is to look at the sequence
               file and/or input pdb file to see what the chain type is. If
               there are more than one type, the type with the larger number
               of residues is guessed. If you want to force the chain_type,
               then set it to PROTEIN RNA or DNA.
   cif_def_file_list= None You can enter any number of CIF definition files.
                      These are normally used to tell phenix.refine about the
                      geometry of a ligand or unusual residue. You usually
                      will use these in combination with "PDB file with
                      metals/ligands" (keyword "input_lig_file_list" ) which
                      allows you to attach the contents of any PDB file you
                      like to your model just before it gets refined. You can
                      use phenix.elbow to generate these if you do not have a
                      CIF file and one is requested by phenix.refine
   clean_up= Yes *No True False At the end of the entire run the TEMP
             directories will be removed if clean_up is True. The default is
             No, keep these directories. If you want to remove them after your
             run is finished use a command like "phenix.autobuild run=1
             clean_up=True"
   combine_only= Yes *No True False Once you have created a set of initial
                 models you can merge them together into a final set. This
                 option is useful if you have split up the creation of
                 multiple models into different directories, and then you have
                 copied all the initial models to one directory for combining.
   composite_omit_type= *None simple_omit sa_omit iterative_build_omit Your
                        choices of types of OMIT maps are: None - normal
                        operation, no omit simple_omit - omit the atoms in
                        OMIT region in calculating a sigmaA-weighted 2mFo-DFc
                        map with no refinement sa_omit - omit the atoms in
                        OMIT region, carry out simulated-annealing refinement,
                        then calculate a sigmaA-weighted 2mFo-DFc map.
                        iterative_build_omit - set occupancy of atoms in OMIT
                        region to 0 throughout an entire iterative
                        model-building, density modification and refinement
                        process (takes a long time). All these omit map types
                        are available as composite omit maps (default) or as
                        omit maps around a region defined by a PDB file (using
                        omit_box_pdb_list) The resulting OMIT map will be in
                        the directory OMIT with file name
                        resolve_composite_map.mtz . This mtz file contains the
                        map coefficients to create the OMIT map. The file
                        "omit_region.mtz" contains the coefficients for a map
                        showing the boundaries of the OMIT region.
   connect= *Yes No True False Define whether to use the connect module in
            build_model. This module tries to connect nearby chains with
            loops, without using the sequence. This is different than
            fit_loops (which uses the sequence to identify the exact number of
            residues in the loop).
   consider_main_chain_list= None This keyword lets you name any number of PDB
                             files to consider as templates for
                             model-building. Every time models are built, the
                             contents of these files will be merged with them
                             and the best parts will be kept. NOTE: this only
                             uses the main-chain atoms of your PDB files.
   coot_name= coot If your version of coot is called something else, then you
              can specify that here.
   d_max_textal= 1000.0 This low-resolution limit is only used for Textal
                 model-building
   d_min_textal= 2.8 Textal has an optimal high-resolution limit of 2.8 A This
                 limit is only used for Textal model-building
   debug= Yes *No True False You can have the wizard stop with error messages
          about the code if you use debug. NOTE: you cannot use Pause with
          debug.
   dist_close= None If main-chain atom rmsd is less than dist_close then
               crossover between chains in different models is allowed at this
               point. If you input a negative number the defaults will be used
   dist_close_overlap= 1.5 Model or ligand coordinates but not both are kept
                       when model and ligand coordinates are within
                       dist_close_overlap and ligands in input_lig_file_list
                       are being added to the current model. NOTE: you might
                       want to decrease this if your ligand atoms get removed
                       by the wizard. Default=1.5 A
   dmax= 500.0 Low-resolution limit
   edit_pdb= *Yes No True False You can choose to edit the input PDB file in
             rebuild_in_place to match the input sequence (default=True).
             NOTE: residues with residue numbers higher than 'highest_resno'
             are assumed to not have a known sequence and will not be edited.
             By default the value of 'highest_resno' is the highest residue
             number from the sequence file, after adding it to the starting
             residue number from start_chains_list. You can also set it
             directly
   extend_try_list= None You can fill out the list of parallel jobs to match
                    the number of jobs you want to run at one time, as
                    specified with nbatch.
   extensive_build= Yes *No True False You can choose whether to build a new
                    model on every cycle and carry out extra model-building
                    steps every cycle. Default is No (build a new model on
                    first cycle, after that carry out extra steps).
   extra_verbose= Yes *No True False Facts and possible commands will be
                  printed every cycle if Yes
   find_ncs= *Auto Yes No True False This script normally deduces ncs
             information from the NCS in chains of models that are built
             during iterative model-building. The update is done each cycle in
             which an improved model is obtained. Say No to skip this. See
             also "input_ncs_file" which can be used to specify NCS at the
             start of the process. If find_ncs="No" then only this starting
             NCS will be used and it will not be updated. You can use find_ncs
             "No" to specify exactly what residues will be used in NCS
             refinement and exactly what NCS operators to use in density
             modification. You can use the function
             $PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to help
             you set up an input_ncs_file that has your specifications in it.
   fit_loops= *Yes No True False You can fit loops automatically if sequence
              alignment has been done.
   group_ca_length= 4 In resolve building you can specify how short a fragment
                    to keep. Normally 4 or 5 residues should be the minimum.
   group_length= 2 In resolve building you can specify how many fragments must
                 be joined to make a connected group that is kept. Normally 2
                 fragments should be the minimum.
   helices_strands_only= Yes *No True False You can choose to use a quick
                         model-building method that only builds secondary
                         structure. At low resolution this may be both quicker
                         and more accurate than trying to build the entire
                         structure If you are running the AutoSol Wizard,
                         normally you should choose 'Yes' and use the quick
                         model-building. Then when your structure is solved by
                         AutoSol, go on to AutoBuild and build a more complete
                         model (this time normally using
                         helices_strands_only=False).
   highest_resno= None Highest residue number to be considered "placed" in
                  sequence for rebuild_in_place
   hl= Yes *No True False You can choose whether to calculate hl coeffs when
       doing density modification ('Yes') or not to do so ('No'). Default is
       No.
   i_ran_seed= 289564 Random seed (positive integer) for model-building and
               simulated annealing refinement
   include_input_model= *Yes No True False The keyword include_input_model
                        defines whether the input model (if any) is to be
                        crossed with models that are derived from it, and the
                        best parts of each kept. Note that if
                        multiple_models=True and include_input_model=True then
                        no initial cycle of randomization will be carried out
                        and the keyword multiple_models_starting_resolution is
                        ignored. In most cases you should use
                        include_input_model=True If you want to generate
                        maximum diversity with multiple-models then you may
                        wish to use include_input_model=False. Also if you
                        want to decrease the amount of bias from your starting
                        model you may wish to use include_input_model=False.
   include_molprobity= Yes *No True False You can choose to include the clash
                       score from MolProbity as one of the scoring criteria in
                       comparing and merging models. The score is combined
                       with the model-map correlation CC by summing in a
                       weighted clashscore. If clashscore for a residue has a
                       value < ok_molp_score then its value is
                       (clashscore-ok_molp_score)*scale_molp_score, otherwise
                       its value is zero.
   input_compare_file= NONE If you are rebuilding a model or already think you
                       know what the model should be, you can include a
                       comparison file in rebuilding. The model is not used
                       for anything except to write out information on
                       coordinate differences in the output log files. NOTE:
                       this feature does not always work correctly.
   input_data_file= None Enter the a file with input structure factor data.
                    For structure factor data only (e.g., FP SIGFP) any format
                    is ok. If you have free R flags, phase information or HL
                    coefficients that you want to use then an mtz file is
                    required. If this file contains phase information, this
                    phase information should be experimental (i.e.,
                    MAD/SAD/MIR etc), and should not be density-modified
                    phases (enter any files with density-modified phases as
                    input_map_file instead). If you also specify a hires data
                    file, then FP and SIGFP will come from that data file (and
                    not this one) If an input_refinement_file is specified,
                    then F, Sigma, FreeR_flag (if present) from that file will
                    be used for refinement instead of this one.
   input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set then
                  density at sites specified with input_ha_file is truncated
                  to improve the density modification procedure.
   input_hires_labels= None Labels for input hires file (FP SIGFP FreeR_flag)
   input_labels= None Labels for input data columns NOTE: Applies to input
                 data file for LigandFit and AutoBuild, but not to AutoMR. For
                 AutoMR use instead 'input_label_string'.
   input_lig_file_list= None This script adds the contents of these PDB files
                        to each model just prior to refinement. Normally you
                        might use this to put in any heavy-atoms that are in
                        the refined structure (for example the heavy atoms
                        that were used in phasing), or to add a ligand to your
                        model. If the atoms in this PDB file are not
                        recognized by phenix.refine, then you can specify
                        their geometries with a cif definitions file using the
                        keyword "cif_def_files_list". You can easily generate
                        cif definitions for many ligands using phenix.elbow in
                        PHENIX. You can put anything you like in the files in
                        input_lig_file_list, but any atoms that fall within
                        1.5 A of any atom in the current model will be tossed
                        (not written to the model).
   input_map_file= Auto Enter an mtz file with coefficients for map (if
                   different file or different coefficients than input
                   structure factor data ). This map will be used in the first
                   cycle of model-building. NOTE: default for this keyword is
                   Auto, which means "carry out normal process to guess this
                   keyword". This means if you specify "after_autosol" in
                   AutoBuild, AutoBuild will automatically take the value from
                   AutoSol. If you do not want this to happen, you can specify
                   None which means "No file"
   input_map_labels= None Labels for input map coefficient columns (FP PHIB
                     FOM) NOTE: FOM is optional (set to None if you wish)
   input_ncs_file= None You can enter NCS information in 3 ways: (1) an
                   ncs_spec file produced by AutoSol or AutoBuild with NCS
                   information (2) a heavy-atom PDB file that contains ncs in
                   the heavy-atom sites (3) a PDB file with a model that
                   contains chains with NCS The wizard will derive NCS
                   information from any of these if specified. See also
                   "find_ncs" which determines whether the wizard will update
                   NCS from models that are built during iterative building.
   input_pdb_file= None You can enter a PDB file containing a starting model
                   of your structure NOTE: If you enter a PDB file then the
                   AutoBuild wizard will start right in with rebuild steps,
                   skipping the build process. If the model is very poor than
                   it may be better to leave it out as the build process
                   (which includes pattern recognition and recognition of
                   helical and strand fragments) is optimized for improving
                   poor maps, while the rebuild process is optimized for
                   better maps that can be produced by having a partial model.
   input_refinement_file= Auto Data file to use for refinement. The data in
                          this file should not be corrected for anisotropy. It
                          will be combined with experimental phase information
                          (if any) from input_data_file for refinement. If you
                          leave this blank, then the data in the
                          input_data_file will be used in refinement. If no
                          anisotropy correction is applied to the data you do
                          not need to specify a datafile for refinement. If an
                          anisotropy correction is applied to the data files,
                          then you should enter an uncorrected datafile for
                          refinement. Any standard format is fine; normally
                          only F and sigF will be used. Bijvoet pairs and
                          duplicates will be averaged. If an mtz file is
                          provided then a free R flag can be read in as well.
                          Any HL coeffs and phase information in this file is
                          ignored. NOTE: default for this keyword is Auto,
                          which means "carry out normal process to guess this
                          keyword". This means if you specify "after_autosol"
                          in AutoBuild, AutoBuild will automatically take the
                          value from AutoSol. If you do not want this to
                          happen, you can specify None which means "No file"
   input_refinement_labels= None Labels for input refinement file columns (FP
                            SIGFP FreeR_flag)
   input_seq_file= Auto Enter name of file with 1-letter code of protein
                   sequence NOTES: 1. lines starting with >>> are
                   ignored and separate chains 2. FASTA format is fine 3. If
                   there are multiple copies of a chain, just enter one copy.
                   4. If you enter a PDB file for rebuilding and it has the
                   sequence you want, then the sequence file is not necessary.
                   NOTE: You can also enter the name of a PDB file that
                   contains SEQRES records, and the sequence from the SEQRES
                   records will be read, written to
                   seq_from_seqres_records.dat, and used as your input
                   sequence. NOTE: for AutoBuild you can specify
                   start_chains_list on the first line of your sequence file:
                   >>> start_chains_list 23 11 5 NOTE: default for
                   this keyword is Auto, which means "carry out normal process
                   to guess this keyword". This means if you specify
                   "after_autosol" in AutoBuild, AutoBuild will automatically
                   take the value from AutoSol. If you do not want this to
                   happen, you can specify None which means "No file"
   keep_input_ligands= *Yes No True False You can choose whether to (by
                       default) let the wizard keep ligands by separating them
                       out from the rest of your model and adding them back to
                       your rebuilt model, or alternatively to remove all
                       ligands from your input pdb file before
                       rebuild_in_place.
   keep_input_waters= Yes *No True False You can choose whether to keep input
                      waters (solvent) when using rebuild_in_place. If you
                      keep them, then you should specify either
                      "place_waters=No" or "keep_pdb_atoms=No" because if
                      place_waters=Yes and keep_pdb_atoms=Yes then
                      phenix.refine will add waters and then the wizard will
                      keep the new waters from the new PDB file created by
                      phenix.refine preferentially over the ones in your input
                      file.
   keep_pdb_atoms= *Yes No True False You can choose whether to keep the model
                   coordinates when model and ligand coordinates are within
                   dist_close_overlap and ligands in input_lig_file_list are
                   being added to the current model. Default=Yes
   link_distance_cutoff= 3.0 You can specify the maximum bond distance for
                         linking residues in phenix.refine called from the
                         wizards.
   loop_cc_min= 0.4 You can specify the minimum correlation of density from a
                loop with the map.
   maps_only= Yes *No True False You can choose whether to skip all
              model-building and just calculate maps and write out the
              results. This also runs just 1 cycle and turns on HL
              coefficients.
   mask_type= *histograms probability wang Choose method for obtaining
              probability that a point is in the protein vs solvent region.
              Default is "histograms". If you have a SAD dataset with a heavy
              atom such as Pt or Au then you may wish to choose "wang" because
              the histogram method is sensitive to very high peaks. Options
              are: histograms: compare local rms of map and local skew of map
              to values from a model map and estimate probabilities. This one
              is usually the best. probability: compare local rms of map to
              distribution for all points in this map and estimate
              probabilities. In a few cases this one is much better than
              histograms. wang: take points with highest local rms and define
              as protein.
   max_occ= None You can choose to set the maximum value of occupancy for
            atoms that have their occupancies refined. Default is None (use
            default value of 1.0 from phenix.refine)
   max_wait_time= 100.0 You can specify the length of time (seconds) to wait
                  when testing the run_command. If you have a cluster where
                  jobs do not start right away you may need a longer time to
                  wait.
   min_cc_res_rebuild= 0.5 You can rebuild just the worst parts of your model
                       by settting touch_up=True. You can decide what parts to
                       rebuild based on a minimum model-map correlation (by
                       residue). You can decide how much to rebuild using
                       worst_percent_res_rebuild or with min_cc_res_rebuild,
                       or both.
   min_seq_identity_percent= 50.0 The sequence in your input PDB file will be
                             adjusted to match the sequence in your sequence
                             file (if any). If there are insertions/deletions
                             in your model and the wizard does not seem to
                             identify them, you can split up your PDB file by
                             adding records like this: BREAK You can specify
                             the minimum sequence identity between your
                             sequence file and a segment from your input PDB
                             file to consider the sequences to be matched.
                             Default is 50.0%. You might want a higher number
                             to make sure that deletions in the sequence are
                             noticed.
   min_seq_identity_percent_rebuild_in_place= 50.0 The sequence in your input
                                              PDB file will be adjusted to
                                              match the sequence in your
                                              sequence file (if any) You can
                                              specify the minimum sequence
                                              identity between your sequence
                                              file and a segment from your
                                              input PDB file to consider the
                                              sequences to be matched. Default
                                              is 50.0%. You might want a
                                              higher number to make sure that
                                              deletions in the sequence are
                                              noticed. The value you specify
                                              applies to rebuild_in_place
                                              only. Use
                                              min_seq_identity_percent instead
                                              for non rebuild_in_place runs.
   model_list= None This keyword lets you name any number of PDB files to
               consider as starting models for model-building. NOTE: This
               differs from consider_main_chain_list which will try to add
               your PDB files EVERY cycle of merging models. In contrast
               model_list will only do it on the first cycle. NOTE: this only
               uses the main-chain atoms of your PDB files.
   modify_outside_delta_solvent= 0.05 You can set the initial solvent content
                                 to be a little lower than calculated when you
                                 are running modify_outside_model Usually 0.05
                                 is fine.
   modify_outside_model= Yes *No True False You can choose whether to modify
                         the density in the "protein" region outside the
                         region specified in your current model by matching
                         histograms with the region that is specified by that
                         model. This can help by raising the density in this
                         protein region up to a value similar to that where
                         atoms are already placed.
   multiple_models= Yes *No True False You can build a set of models, all
                    compatible with your data. You can specify how many models
                    with multiple_models_number. If you are using
                    rebuild_in_place you can specify whether to generate
                    starting models or not with multiple_models_starting.
   multiple_models_first= 1 Specify which model to build first
   multiple_models_group_number= 5 You can build several initial models and
                                 merge them. Normally 5 initial models is
                                 fine.
   multiple_models_last= 20 Specify which model to end with
   multiple_models_number= 20 Specify how many models to build.
   multiple_models_starting= *Yes No True False You can specify how to
                             generate starting models for multiple models. If
                             you are using rebuild_in_place and you specify
                             "Yes" then the Wizard will rebuild your starting
                             model at the resolution specified in
                             multiple_models_starting_resolution. If you are
                             not using rebuild_in_place the Wizard will always
                             build a starting model at the current resolution.
   multiple_models_starting_resolution= 4.0 You can set the resolution for
                                        rebuilding an initial model. A value
                                        of 0.0 will use the resolution of the
                                        dataset.
   n_box_target= None You can tell the Wizard how many omit boxes to try and
                 set up (but it will not necessarily choose your number
                 because it has to be nicely divisible into boxes that fit
                 your asymmetric unit). A suitable number is 24. The larger
                 the number of boxes, the better the map will be, but the
                 longer it will take to calculate the map.
   n_cycle_build= -1 Choose number of cycles (3). This does not apply if
                  TEXTAL is selected for build_type
   n_cycle_build_max= 6 Maximum number of cycles for iterative model-building,
                      starting from experimental phases without a model. Even
                      if a satisfactory model is not found, a maximum of
                      n_cycle_build_max cycles will be carried out.
   n_cycle_build_min= 1 Minimum number of cycles for iterative model-building,
                      starting from experimental phases without a model. Even
                      if a satisfactory model is found, n_cycle_build_min
                      cycles will be carried out.
   n_cycle_image_min= 3 Pattern recognition (resolve_pattern) and fragment
                      identification ("image based density modification") are
                      used as part of the density modification process. These
                      are normally only useful in the first few cycles of
                      iterative model-building. This script tries
                      model-building both with and without including image
                      information, and proceeds with the most complete model.
                      Once at least n_cycle_image_min cycles have been carried
                      out with image information, if the image-based map
                      results in a less-complete model than the one without
                      image information, image information is no longer
                      included.
   n_cycle_rebuild_in_place= None Number of cycles for rebuild_in_place for
                             multiple models only
   n_cycle_rebuild_max= 10 Maximum number of cycles for iterative
                        model-rebuilding, starting from a model. Even if a
                        satisfactory model is not found, a maximum of
                        n_cycle_rebuild_max cycles will be carried out.
   n_cycle_rebuild_min= 1 Mininum number of cycles for iterative
                        model-rebuilding, starting from a model. Even if a
                        satisfactory model is found, n_cycle_rebuild_min
                        cycles will be carried out.
   n_cycle_rebuild_omit= 10 Model-building is normally carried out using the
                         "best" available map. If omit_on_rebuild is Yes, then
                         every n_cycle_rebuild_omit cycle of model rebuilding,
                         a composite omit map is used instead. If you specify
                         0 and omit_on_rebuild is Yes, omit maps will be used
                         every cycle. Normally every 10th cycle is optimal.
   n_mini= 10 You can choose how many times to retrace your model in
           "retrace_before_build"
   n_random_frag= 0 In resolve building you can randomize each fragment
                  slightly so as to generate more possibilities for tracing
                  based on extending it.
   n_random_loop= 3 Number of randomized tries from each end for building
                  loops If 0, then one try. If N, then N additional tries with
                  randomization based on rms_random_loop.
   n_rebuild_in_place= 1 You can choose how many times to rebuild your model
                       in place with rebuild_in_place
   n_try_rebuild= 2 Number of attempts to build each segment of chain
   n_xyz_list= None You can specify the grid to use for map calculations.
   nbatch= 3 You can specify the number of processors to use (nproc) and the
           number of batches to divide the data into for parallel jobs.
           Normally you will set nproc to the number of processors available
           and leave nbatch alone. If you leave nbatch as None it will be set
           automatically, with a value depending on the Wizard. This is
           recommended. The value of nbatch can affect the results that you
           get, as the jobs are not split into exact replicates, but are
           rather run with different random numbers. If you want to get the
           same results, keep the same value of nbatch.
   ncs_copies= None Number of copies of the molecule in the au (note: only one
               type of molecule allowed at present)
   ncs_refine_coord_sigma_from_rmsd= Yes *No True False You can choose to use
                                     the current NCS rmsd as the value of the
                                     sigma for NCS restraints. See also
                                     ncs_refine_coord_sigma_from_rmsd_ratio
   ncs_refine_coord_sigma_from_rmsd_ratio= 1.0 You can choose to multiply the
                                           current NCS rmsd by this value
                                           before using it as the sigma for
                                           NCS restraints See also
                                           ncs_refine_coord_sigma_from_rmsd
   ncycle_refine= 3 Choose number of refinement cycles (3)
   no_merge_ncs_copies= Yes *No True False Normally False (do merge NCS
                        copies). If True, then do not use each NCS copy to try
                        to build the others.
   nproc= 1 You can specify the number of processors to use (nproc) and the
          number of batches to divide the data into for parallel jobs.
          Normally you will set nproc to the number of processors available
          and leave nbatch alone. If you leave nbatch as None it will be set
          automatically, with a value depending on the Wizard. This is
          recommended. The value of nbatch can affect the results that you
          get, as the jobs are not split into exact replicates, but are rather
          run with different random numbers. If you want to get the same
          results, keep the same value of nbatch.
   number_of_models= -1 This parameter lets you choose how many initial models
                     to build with RESOLVE within a single build cycle. This
                     parameter is now superseded by number_of_parallel_models,
                     which sets the number of models (but now entire build
                     cycles) to carry out in parallel. A zero means set it
                     automatically. That is what you normally should use. The
                     number_of_models is by default set to 1 and
                     number_of_parallel_models is set to the value of nbatch
                     (typically 4).
   number_of_parallel_models= 0 This parameter lets you choose how many models
                              to build in parallel. A zero means set it
                              automatically. That is what you normally should
                              use. This parameter supersedes the old parameter
                              number_of_models. The value of number_of_models
                              is by default set to 1 and
                              number_of_parallel_models is set to the value of
                              nbatch (typically 4).
   offset_boundary= 1.0 Specify the boundary around omit_box_pdb for
                    definition of omit region.
   offset_boundary_background_map= None You can set the offset of the
                                   boundary_background_map.
   offsets_list= 53 7 23 You can specify an offset for the orientation of the
                 helix and strand templates in building. This is used in
                 generating different starting models.
   ok_molp_score= None You can choose to include the clash score from
                  MolProbity as one of the scoring criteria in comparing and
                  merging models. The score is combined with the model-map
                  correlation CC by summing in a weighted clashscore. If
                  clashscore for a residue has a value < ok_molp_score (the
                  threshold defined by ok_molp_score) then its value is
                  (clashscore-ok_molp_score)*scale_molp_score, otherwise its
                  value is zero.
   omit_box_end= 0 To only carry out omit in some of the omit boxes, use
                 omit_box_start and omit_box_end
   omit_box_pdb_list= None This keyword applies if you have set OMIT region
                      specification to "omit_around_pdb". To automatically set
                      an OMIT region specify a PDB file(s) with
                      omit_box_pdb_list. The omit region boundaries will be
                      the limits in x y z of the atoms in this file, plus a
                      border of offset_boundary. To use only some of the atoms
                      in the file, specify values for starting, ending and
                      chain to omit (omit_res_start_list and omit_res_end_list
                      and omit_chain_list) If you specify more than one file
                      (or if you specify more than one segment of a file with
                      omit_chain_list or omit_res_start_list and
                      omit_res_end_list) then a set of omit runs will be
                      carried out and combined into one composite omit.
   omit_box_start= 0 To only carry out omit in some of the omit boxes, use
                   omit_box_start and omit_box_end
   omit_chain_list= None You can choose to omit just a portion of your model
                    keywords omit_res_start_list 3 omit_res_end_list 4
                    omit_chain_list chain1 (use "" to select all chains) The
                    residues from 3 to 4 of chain1 will be omitted. You can
                    specify more than one region by using the Parameter Group
                    Options button to add lines. If you specify more than one
                    region, a separate omit run will be carried out for each
                    one and then the maps will be put together afterwards. If
                    there are more than one chains in the input PDB file then
                    only the chain defined by omit_chain will be omitted NOTE:
                    Zero for start and end and "" for chain is the same as
                    choosing everything
   omit_offset_list= 0 0 0 0 0 0 To carry out one iterative build omit, with a
                     region defined in grid units, enter
                     nxs,nxe,nys,nye,nzs,nze in omit_offset_list.
   omit_on_rebuild= Yes *No True False You can specify whether to use an omit
                    map for building the model on rebuild cycles. Default is
                    Yes if you start with a model, No if you are building a
                    model from scratch. The omit map is calculated every
                    n_cycle_rebuild_omit cycles
   omit_region_specification= *composite_omit omit_around_pdb You can specify
                              what region an omit
                              (simple/sa-omit/iterative-build-omit) map is to
                              be calculated for. Composite omit will create a
                              map over the entire asymmetric unit by dividing
                              the asymmetric unit into overlapping boxes,
                              calculating omit maps for each, and splicing all
                              the results together into a single composite
                              omit map. You can tell the Wizard how many omit
                              boxes to try and set up with the keyword
                              "n_box_target" (but it will not necessarily
                              choose your number because it has to be nicely
                              divisible into boxes that fit your asymmetric
                              unit). Omit around PDB will omit around the
                              region defined by the PDB file(s) you enter for
                              omit_box_pdb (or around the residues in that PDB
                              file that you specify). If you specify
                              omit_around_pdb then you must enter a pdb file
                              to omit around.
   omit_res_end_list= None You can choose to omit just a portion of your model
                      keywords omit_res_start_list 3 omit_res_end_list 4
                      omit_chain_list chain1 (use " " for blank) The residues
                      from 3 to 4 of chain1 will be omitted. You can specify
                      more than one region by using the Parameter Group
                      Options button to add lines. If you specify more than
                      one region, a separate omit run will be carried out for
                      each one and then the maps will be put together
                      afterwards. If there are more than one chains in the
                      input PDB file then only the chain defined by omit_chain
                      will be omitted NOTE: Zero for start and end and "" for
                      chain is the same as choosing everything
   omit_res_start_list= None