phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated Model Building and Rebuilding using AutoBuild

Author(s)
Purpose
Purpose of the AutoBuild Wizard
Usage
How the AutoBuild Wizard works
Automation and user control
Core modules in the AutoBuild Wizard
How to run the AutoBuild Wizard
What the AutoBuild wizard needs to run
...and optional files
Specifying which columns of data to use from input data files
Specifying other general parameters
Picking waters in AutoBuild
Keeping waters from your input file in AutoBuild
Specifying phenix.refine parameters
Specifying resolve/resolve_pattern parameters
Including ligand coordinates in AutoBuild
Specifying arbitrary commands and cif files for phenix.refine
Output files from AutoBuild
Standard building, rebuild_in_place, and multiple-models
Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel
Model editing during rebuilding with the Coot-PHENIX interface
Resolution limits in AutoBuild
Examples
Run AutoBuild automatically after AutoSol
Run AutoBuild beginning with experimental data
Merge in hires data
Make a SA-omit map around atoms in target.pdb
Make a simple composite omit map
Make an iterative-build omit map around atoms in target.pdb
Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb
Create one very good rebuilt model
Touch up a model
Create 20 very good rebuilt models that are as different as possible
Morph an MR model and rebuild it
Build an RNA chain
Build a DNA chain
Just make maps; don't do any building.
Just calculate a prime-and-switch map
Possible Problems
General limitations
Specific limitations and problems
Literature
Additional information
List of all AutoBuild keywords

Author(s)

  • AutoBuild Wizard: Tom Terwilliger
  • PHENIX GUI and PDS Server: Nigel W. Moriarty
  • phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams
  • RESOLVE: Tom Terwilliger
  • TEXTAL: Kreshna Gopal, Thomas Ioerger, Rita Pai, Tod Romo, James Sacchettini, Erik McKee, Lalji Kanbi
  • phenix.xtriage: Peter Zwart

Purpose

Purpose of the AutoBuild Wizard

The purpose of the AutoBuild Wizard is to provide a highly automated system for model rebuilding and completion. The Wizard design allows the user to specify data files and parameters through an interactive GUI, or alternatively through keyworded scripts. The AutoBuild Wizard begins with datafiles with structure factor amplitudes and uncertainties, along with either experimental phase information or a starting model, carries out cycles of model-building and refinement alternating with model-based density modification, and producing a relatively complete atomic model.

The AutoBuild Wizard uses RESOLVE, (optionally also TEXTAL), xtriage and phenix.refine to build an atomic model, refine it, and improve it with iterative density modification, refinement, and model-building

The Wizard begins with either experimental phases (i.e., from AutoSol) or with an atomic model that can be used to generate calculated phases. The AutoBuild Wizard produces a refined model that can be nearly complete if the data are strong and the resolution is about 2.5 A or better. At lower resolutions (2.5 - 3 A) the model may be less complete and at resolutions > 3A the model may be quite incomplete and not well refined.

The AutoBuild Wizard can be used to generate OMIT maps (simple omit, SA-omit, iterative-build omit) that can cover the entire unit cell or specific residues in a PDB file.

The AutoBuild Wizard can generate a set of models compatible with experimental data (multiple_models)

Usage

The AutoBuild Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here.

How the AutoBuild Wizard works

The AutoBuild Wizard begins with experimental structure factor amplitudes, along with either experimental or model-based estimates of crystallographic phases. The phase information is improved by using statistical density modification to improve the correlation of NCS-related density in the map (if present) and to improve the match of the distribution of electron densities in the map with those expected from a model map. This improved map is then used to build and refine an atomic model.

In subsequent cycles, the models from previous cycles are used as a source of phase information in statistical density modification, iteratively improving the quality of the map used for model-building.

Additionally, during the first few cycles additional phase information is obtained by detecting and enhancing (1) the presence of commonly-found local patterns of density in the map, and (2) the presence of density in the shape of helices and strands. The final model obtained is analyzed for residue-based map correlation and density at the coordinates of individual atoms, and an analysis including a summary of atoms and residues that are in strong, moderate, or weak density and out of density is provided.

Automation and user control

The AutoBuild Wizard has been designed for ease of use combined with maximal user control, with as many parameters set automatically by the Wizard as possible, but maintaining parameters accessible to the user through a GUI and through keyword-based scripts. The Wizard uses the input/output routines of the cctbx library, allowing data files of many different formats so that the user does not have to convert their data to any particular format before using the Wizard. Use of the phenix.refine refinement package in the AutoBuild Wizard allows a high degree of automation of refinement so that the neither user nor Wizard is required to specify parameters for refinement. The phenix.refine package automatically includes a bulk solvent model and automatically places solvent molecules.

Core modules in the AutoBuild Wizard

The five core modules in the AutoBuild Wizard are

  • (1) building a new model into an electron density map
  • (2) rebuilding an existing model
  • (3) refinement
  • (4) iterative model- building beginning from experimental phase information, and
  • (5) iterative model-building beginning from a model.

The standard procedures available in the AutoBuild Wizard that are based on these modules include:

  • (a) model-building and completion starting from experimental phases,
  • (b) rebuilding a model from scratch, with or without experimental phase information, and
  • (c) rebuilding a model in place, maintaining connectivity and sequence register.

Starting from a set of experimental phases and structure factor amplitudes, normally procedure (a) is carried out, and then the resulting model is rebuilt with procedure (b).

Starting from a model (e.g., from molecular replacement) and experimental structure factor amplitudes, procedure (c) is normally carried out if the starting model differs less than about 50% in sequence from the desired model, and otherwise procedure (b) is used.

How to run the AutoBuild Wizard

Running the AutoBuild Wizard is easy. For example, from the command-line you can type:

phenix.autobuild data=w1.sca seq.dat model=coords.pdb

The AutoBuild Wizard will carry out iterative model-building, density modification and refinement based on the data in w1.sca and the model in coords.pdb, editing the model as necessary to match the sequence in seq.dat.

What the AutoBuild wizard needs to run

  • (1) a data file, optionally with phases and HL coeffs and freeR flag (w1.sca or data=w1.sca)
  • (2) a sequence file (seq.dat or seq_file=seq.dat) or a model (coords.pdb or model=coords.pdb)

...and optional files

  • (3) coefficients for a starting map (map_file=resolve.mtz)
  • (4) a file for refinement (refinement_file=exptl_fobs_freeR_flags.mtz)
  • (5) a high-resolution datafile (hires_file=high_res.sca)

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file ref.mtz has columns FP SIGFP and FreeR then you might specify

refinement_file=ref.mtz
input_refinement_labels="FP SIGFP None None None None None None FreeR"

The keywords for labels and anticipated input labels (program labels) are:

input_labels (for data file): FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag
input_refinement_labels: FP SIGFP FreeR_flag
input_map_labels: FP PHIB FOM
input_hires_labels: FP SIGFP FreeR_flag

You can find out all the possible label strings in a data file that you might use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

NOTES: if your data files contain a mixture of amplitude and intensity data then only the amplitude data is available. If you have only intensity data in a data file and want to select specific columns, then you need to specify the column names as they are after importing the data and conversion to amplitudes (see below under General Limitations for details).

Specifying other general parameters

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

data=w1.sca       # data file
model=coords.pdb  # starting model
seq_file=seq.dat  # sequence file
map_file=map_coeffs.mtz # coefficients for a starting map for building
resolution=3     # dmin of 3 A
s_annealing=True  # use simulated annealing refinement at start of each cycle
n_cycle_build_max=5  # max number of build cycles (starting from experimental phases)
n_cycle_rebuild_max=5  # max number of rebuild cycles (starting from a model)

Picking waters in AutoBuild

By default AutoBuild will instruct phenix.refine to pick waters using its standard procedure. This means that if the resolution of the data is high enough (typically 3 A) then waters are placed.

You can tell AutoBuild not to have phenix.refine pick waters with the command:

place_waters=False
If you want to place waters at a lower resolution, you will need to reset the low-resolution cutoff for placing waters in phenix.refine. You would do that in a "refinement_params.eff" file containing lines like these (see below for passing parameters to phenix.refine with an ".eff" file):
refinement {
  ordered_solvent {
    low_resolution = 2.8
  }
} 

Keeping waters from your input file in AutoBuild

You can tell AutoBuild to keep the waters in your input file when you are using rebuild_in_place (the default is to toss them and replace them with new ones). You can say,

keep_input_waters=True
place_waters=No
NOTE: If you specify keep_input_waters=True you should also specify either "place_waters=No" or "keep_pdb_atoms=No" . This is because if place_waters=Yes and keep_pdb_atoms=Yes then phenix.refine will add waters and then the wizard will keep the new waters from the new PDB file created by phenix.refine preferentially over the ones in your input file.

Specifying phenix.refine parameters

You can control phenix.refine parameters that are not specified directly by AutoBuild using a refinement parameters (.eff) file:

refine_eff_file=refinement_params.eff    # set any phenix.refine params not set by AutoBuild
This file might contain a twin-law for refinement:
refinement {
  twinning {
    twin_law = "-k, -h, -l"
  }
}

You can put any phenix.refine parameters in this file, but a few parameters that are set directly by AutoBuild override your inputs from the refine_eff_file. These parameters are listed below.

Refinement parameters that must be set using AutoBuild Wizard keywords (overwriting any values provided by user in input_eff_file)

phenix.refine keyword

Wizard keyword(s) and notes

refinement.main.number_of_macro_cycles

ncycle_refine

refinement.main.simulated_annealing

s_annealing (only applies to 1st refinement in rebuild. SA in any other refinements controlled by input_eff_file, if any)

refinement.ncs.find_automatically

refine_with_ncs=True turns on automatic ncs search

refinement.main.ncs

refine_with_ncs=True turns on ncs

refinement.ncs.coordinate_sigma

Normally not set by Wizard. However if the Wizard keyword ncs_refine_coord_sigma_from_rmsd is True then the ncs coordinate sigma is equal to ncs_refine_coord_sigma_from_rmsd_ratio times the rmsd among ncs copies

refinement.main.random_seed

i_ran_seed sets the random seed at the beginning of a Wizard... this affects refinement.main.random_seed but does not set it to the value of i_ran_seed (because i_ran_seed gets updated by several different routines)

refinement.main.ordered_solvent

place_waters=True will set ordered_solvent to True. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement.

refinement.main.ordered_solvent

place_waters_in_combine=True will set ordered_solvent to True, only applying this to the final combination step of multiple-model generation. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement.

refinement.ordered_solvent.low_resolution

ordered_solvent_low_resolution=3.0 (default) will set the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) to 3 A. If the resolution used for refinement is larger than the value of ordered_solvent_low_resolution then ordered solvent is not added.

refinement.main.use_experimental_phases

use_mlhl=True will set refinement.main.use_experimental_phases to True

refinement.refine.strategy

The Wizard keywords refine refine_b refine_xyz all affect refinement.refine.strategy. If refine=True then refinement is carried out. If refine_b=True (default) isotropic displacement factors are refined. If refine_xyz=True (default) coordinates are refined.

refinement.main.occupancy_max

max_occ=1.0 sets the value of refinement.main.occupancy_max to 1.0. Default is to do nothing and use the default from phenix.refine (1.0)

refinement.refine.occupancies.individual

The combination of Wizard keywords of semet=True and refine_se_occ=True will add "(name SE)" to the value of refinement.refine.occupancies.individual. You can add to your .eff file other names of atoms to have occupancies refined as well.

refinement.main.high_resolution

Either of the Wizard keywords refinement_resolution and resolution will set the value of refinement.main.high_resolution, with refinement_resolution being used if available.

refinement.pdb_interpretation.link_distance_cutoff

link_distance_cutoff

The following parameters controlling phenix.refine output are set directly in AutoBuild and cannot be set by the user

  • refinement.output.write_eff_file
  • refinement.output.write_geo_file
  • refinement.output.write_def_file
  • refinement.output.write_maps
  • refinement.output.write_map_coefficients

Specifying resolve/resolve_pattern parameters

Similarly, you can control resolve and resolve_pattern parameters. For these parameters, your inputs will not be overridden by AutoBuild. The format is a little tricky: you have to put two sets of quotes around the command like this:

resolve_command="'resolution 200 3'"    # NOTE ' and " quotes
This will put the text
resolution 200 3
at the end of every temporary command file created to run resolve. (This is why it is not overridden by AutoBuild commands; they will all come before your commands in the resolve command file.) Note that some commands in resolve may be incompatible with this usage.

Including ligand coordinates in AutoBuild

If your input PDB file contains ligands (anything other than solvent that is not protein if your chain_type=PROTEIN, for example) then by default these ligands will be kept, used in refinement, and written out to your output PDB file. Any solvent molecules will by default be discarded. You can change this behavior by changing the keywords from these defaults:

keep_input_ligands=True
keep_input_waters=False
The AutoBuild Wizard will use phenix.elbow to generate geometries for any ligands that are not recognized.

You can also tell AutoBuild to add the contents of any PDB files that you wish to supply to the current version of the structure just before refinement, so all the refined models produced contain whatever AutoBuild has built, plus the contents of these PDB files. This can be done through the GUI, the command-line, or a script. In the command-line version you do this with:

input_lig_file_list=my_ligand.pdb

NOTE: The files in input_lig_file_list will be edited to make them all HETATM records to tell AutoBuild to ignore these residues in rebuilding.

NOTE You may need to tell phenix.refine about the geometry of your ligands. You will get an error message if the ligand is not recognized and an automatic run of phenix.elbow does not succeed in generating your ligand. In that case you will want to run phenix.elbow to create a cif definition file for this ligand:

phenix.elbow my_ligand.pdb --id=LIG
where LIG is the 3-letter ID code that you use in my_ligand.pdb to identify your ligand. If the automatic run does not work you may need to give phenix.elbow additional information to generate your ligand.

Once phenix.elbow has generated your ligand you can use the keyword "cif_def_file_list" to tell AutoBuild about this ligand:

cif_def_file_list=elbow.LIG.my_ligand.pdb.cif

Specifying arbitrary commands and cif files for phenix.refine

You can tell AutoBuild to apply any set of cif definitions to the model during refinement by using a combination of specification files and the commands cif_def_file_list and refine_eff_file_list:

refine_eff_file_list=link.eff cif_def_file_list=link.cif
This example comes from the phenix.refine manual page in which a link is specified in a cif definition file link.cif:
 data_mod_5pho
#
loop_
_chem_mod_atom.mod_id
_chem_mod_atom.function
_chem_mod_atom.atom_id
_chem_mod_atom.new_atom_id
_chem_mod_atom.new_type_symbol
_chem_mod_atom.new_type_energy
_chem_mod_atom.new_partial_charge
 5pho     add      .      O5T    O    OH      .
loop_
_chem_mod_bond.mod_id
_chem_mod_bond.function
_chem_mod_bond.atom_id_1
_chem_mod_bond.atom_id_2
_chem_mod_bond.new_type
_chem_mod_bond.new_value_dist
_chem_mod_bond.new_value_dist_esd
 5pho     add      O5T     P         coval        1.520    0.020

and this is applied with a parameters file link.eff:

 refinement.pdb_interpretation.apply_cif_modification
{
  data_mod = 5pho
  residue_selection = resname GUA and name O5T
}

You can have any number of cif files and parameters files.

Output files from AutoBuild

When you run AutoBuild the output files will be in a subdirectory with your run number:

AutoBuild_run_1_/   # subdirectory with results
  • A summary file listing the results of the run and the other files produced:
    AutoBuild_summary.dat  # overall summary
    
  • A warnings file listing any warnings about the run
    AutoBuild_warnings.dat  # any warnings
    
  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    AutoBuild_Facts.dat   # all Facts about the run
    
  • Final refined model
    overall_best.pdb
    
    NOTE: The "overall_best.pdb" file is always the current best model. Similarly "overall_best_denmod_map_coeffs.mtz" is always the best map_coefficients file. The AutoBuild_summary.dat file lists the names of the current best set of files. The contents of "overall_best.pdb" and of the best model listed in AutoBuild_summary.dat will be the same.
  • Final map coefficients used to build refined model. Use FP PHIM FOMM in maps. Normally this is a density-modified map from resolve. See also the map coefficients from phenix.refine below.
    overall_best_denmod_map_coeffs.mtz
    
  • Final sigmaA-weighted 2mFo-DFc and Fo-Fc map coefficients from phenix.refine based on overall_best.pdb final model. The map coefficients are 2FOFCWT PH2FOFCWT for the 2mFo-DFc map and FOFC and PHFOFC for the Fo-Fc difference map. See also the map coefficients from density modification above.
    overall_best_refine_map_coeffs.mtz
    
  • MTZ file with FP, phases and HL coeffs if present, and freeR_flags used in refinement
    exptl_fobs_phases_freeR_flags.mtz
    
  • Final log file for model-building
    overall_best.log
    
  • Final log file for refinement
    overall_best.log_refine
    
  • Evaluation of fit of model to map
    overall_best.log_eval
    
  • Summary of NCS information
    ncs_info.ncs
    

Standard building, rebuild_in_place, and multiple-models

The AutoBuild Wizard has two overall methods for building a model. The first method (standard build) is to build a model from scratch. This involves identification of where helices (and strands, for proteins) are located, extension using fragment libraries, connection of segments, identification of side-chains, and sequence alignment. These methods are augmented in the standard building procedure by loop-fitting and building model outside of the region that has already been built. The second method (rebuild_in_place) takes an existing model and rebuilds it without adding or deleting any residues and without changing the connectivity of the chain. The way this works is a segment of the model is deleted and then is filled-in again by rebuilding from the remaining ends. This is repeated for overlapping segments covering the entire model. The multiple-models approach really has two levels of multiple models. At the first level, several (multiple_models_group_number, default is number_of_parallel_models) models are built (using rebuild_in_place) and are then recombined into a single good model. At the next level, this whole process may be done more than once (multiple_models_number times), yielding several very good models. By default, if you ask for rebuild_in_place, then you will get a single very good model, created by running rebuild_in_place several times and recombining the models.

Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel

The AutoBuild Wizard is set up to take advantage of multi-processor machines or batch queues by splitting the work into separate tasks. See Tutorial 4: Iterative model-building, density modification and refinement starting from experimental phases and Tutorial 6: Automatically rebuilding a structure solved by Molecular Replacement for a description of the method used by the AutoBuild Wizard to run build jobs as sub-processes and to combine the results into single models. Here are the key factors that determine how splitting model-building into batches and running them on one or more processors works:

  • nbatch is the number of batches of work. As long as nbatch is fixed then the results of running the Wizard will be the same, no matter how many processors are used. It is most efficient however to have nbatch be at least as large as nproc, the number of processors. Otherwise some processors may end up doing nothing. The default is nbatch=3. The value of nbatch is used to set other defaults (such as number_of_parallel_models).
  • nproc is the number of processors to split the work among
  • number_of_parallel_models is the number of models to build at once. The default is to set number_of_parallel_models=nbatch. This affects both standard building (number_of_parallel_models sets how many initial models to build) and rebuild_in_place (number_of_parallel_models determines whether a single model is built or a set of models are built and recombined into a single model).

Model editing during rebuilding with the Coot-PHENIX interface

The AutoBuild Wizard allows you to edit a model and give it back to the Wizard during the iterative model-building, density modification and refinement process. The Wizard will consider the model that you give it along with the models that it generates automatically, and will choose the parts of your model that fit the density better than other models. You can edit a model using the PHENIX-Coot interface. This interface is accessible through the GUI and via the command-line. Using the GUI, when a model has been produced by the AutoBuild Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start Coot with your current map and model. If you are running from the command-line, you can open a new window and type:

phenix.autobuild coot 
which will do the same (provided the necessary map and model are ready). When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. You can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. The precise use of the model that you save depends on the type of model-building that is being carried out by the AutoBuild Wizard. If you are using rebuild_in_place then the main-chain and side-chains of the model are considered as replacements for the current working model. Any ligands or unrecognized residues are (by default) not rebuilt but are included in refinement. By default, solvent in the model is ignored. If you are not using rebuild_in_place, only the main-chain conformation is considered, and the side-chains are ignored. Ligands (but not solvent) in the model are (by default) kept and included in refinement. As the AutoBuild Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX.

Resolution limits in AutoBuild

There are several resolution limits used in AutoBuild. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name

Description

How default value is set

resolution

Overall resolution. Used as high-resolution limit for density modification. Used as default for refinement resolution and model-building resolution if they are not set.

Resolution of input datafile. If a hires datafile is provided, the resolution of that data is used.

refinement_resolution

Resolution for refinement

value of "resolution"

resolution_build

Resolution for model-building

value of "resolution"

overall_resolution

Resolution to truncate all data. This should only be used if you need to truncate the data in order to get the Wizard to run. It causes the Wizard to ignore all data at higher resolution than overall_resolution. It is normally better to use the resolution keyword to define the resolution limits, as that will keep all the data in the output and working files.

None

multiple_models_starting_resolution

Resolution for the initial rebuilding of a model in the multiple-models procedure. Normally a low resolution to generate diversity.

4 A by default

Examples

Run AutoBuild automatically after AutoSol

phenix.autobuild after_autosol

Run AutoBuild beginning with experimental data

phenix.autobuild data=solve_1.mtz seq_file=seq.dat

Merge in hires data

phenix.autobuild data=solve_2.mtz hires_file=w1.sca  seq_file=seq.dat

Make a SA-omit map around atoms in target.pdb

phenix.autobuild data=data.mtz model=coords.pdb omit_box_pdb=target.pdb   composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make a simple composite omit map

phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=simple_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make an iterative-build omit map around atoms in target.pdb

phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=target.pdb \
   composite_omit_type=iterative_build_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb

phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=coords.pdb \
   omit_res_start_list=3 omit_res_end_list=4 omit_chain_list=A   \
   composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note 1: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Create one very good rebuilt model

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
  include_input_model=True  \
  multiple_models_number=1 n_cycle_rebuild_max=5
The final model will be in the file MULTIPLE_MODELS/all_models.pdb (this file will contain just one model). Note that this procedure will keep the sequence that is present in coords.pdb. If you supply a sequence file it will edit the sequence of coords.pdb to match your sequence file and discard any residues that do not match. (If you want to input a sequence file but not edit the sequence in coords.pdb and not discard any non-matching residues, then specify also edit_pdb=False.) Note also that if include_input_model=True then no randomization cycle will be carried out and multiple_models_starting_resolution is ignored.

Touch up a model

phenix.autobuild data=data.mtz model=coords.pdb \
touch_up=True worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8
You can rebuild just the worst parts of your model by settting touch_up=True. You can decide what parts to rebuild based on a minimum model-map correlation (by residue). You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both.

Create 20 very good rebuilt models that are as different as possible

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
   multiple_models_number=20 n_cycle_rebuild_max=5
The 20 models will be in the file MULTIPLE_MODELS/all_models.pdb. This procedure is useful for generating an ensemble of models that are each individually consistent with the data, and yet are diverse. The variation among these models is an indication of the uncertainty in each of the models. Note that the ensemble of models is not a representation of the ensemble of structures that is truly present in the crystal.

Morph an MR model and rebuild it

phenix.autobuild data=data.mtz model=MR.pdb \
morph=True rebuild_in_place=False seq_file=seq.dat
You can have autobuild morph your input model, distorting it to match the density-modified map that is produced from your model and data. This can be used to make an improved starting model in cases where the MR model is very different than the structure that is to be solved. For the morphing to work, the two structures must be topologically similar and differ mostly by movements of domains or motifs such as a group of helices or a sheet. The morphing process consists of identifying a coordinate shift to apply to each N (or P for nucleic acids) atom that maximizes the local density correlation between the model and the map. This is smoothed and applied to the structure to generate a morphed structure.

Build an RNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=RNA

Build a DNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=DNA

Just make maps; don't do any building.

phenix.autobuild data=data.mtz model=coords.pdb maps_only=True     

Just calculate a prime-and-switch map

phenix.autobuild data=data.mtz solvent_fraction=.6 \
   ps_in_rebuild=True model=coords.pdb maps_only=True 
The output prime-and-switch map will be in the file prime_and_switch.mtz.

Possible Problems

General limitations

  • The AutoBuild wizard edits input PDB files to remove multiple conformations. It will also renumber residues if the file contains residues with insertion codes. All references to residue numbers (e.g. rebuild_res_start_list) refer to the edited, renumbered model. This model can be found in the AutoBuild_run_1_ (or appropriate) directory as "edited_pdb.pdb".
  • The AutoBuild wizard expects residue numbers to not decrease along a chain. It will stop if residue 250 in chain B is found between residues 116 and 117 in the same chain, for example. To get around this, use insertion codes (make residue 250 residue 116A instead).
  • The AutoBuild model-building can only build one type of chain at a time (default chain_type='PROTEIN'; other choices are RNA and DNA). If you supply a PDB file containing more than one type of chain for rebuilding, then all the residues that are not that type of chain are treated as ligands and are (by default, keep_input_ligands=True) included in refinement but not in rebuilding. Any input solvent molecules are (by default, keep_input_waters=False) ignored.

    You can include more than one type of chain in rebuilding by supplying one type of chains as ligands with input_lig_file_list and rebuilding another type:

    chain_type=PROTEIN  # build only protein
    input_lig_file_list=MyDNA.pdb  # just read in DNA coordinates and include in refinement
    
    In this case only protein chains will be built, but the DNA coordinates in MyDNA.pdb will be included in all refinements and will be written out to the final coordinate file. You may wish to add the keyword:
    keep_pdb_atoms=False  #keep the ligand atoms if model (pdb) and ligand overlap
    which will tell AutoBuild that the ligand (DNA) atoms are to be kept if the model that is being built (protein) overlaps with it. (The default is to keep the model that is being built and to discard any ligand atoms that overlap). This whole process is likely to require substantial editing of the PDB files by hand because when you build DNA, a lot of chains are going to be built into the protein region, and when you build protein, it is going to be accidentally built into the DNA.
  • Any file in input_lig_file_list containing ATOM records will have them replaced with HETATM records. This is so that the rebuild_in_place algorithm does not try to use them in rebuilding.
  • The ligand generation routine in phenix.elbow will not generate heme groups at this point. Most other ligands can be automatically generated.
  • If your input data file contains both intensity data and amplitude data, only the amplitude data is exposed in the AutoBuild Wizard. If you want to use the intensity data then you have to create a file that does not have amplitude data in it.
  • If your input data file has only intensity data and you wish to specify which columns of data the AutoBuild Wizard is to use, then you have to specify the names that the columns will have AFTER importing the data and conversion to amplitudes, not the original column names. These column names may not be obvious. Here is how to find out what they will be. Do a quick dummy run like this with XXX as labels:
    phenix.autobuild w2.sca coords.pdb input_labels="XXX XXX"
    
    The Wizard will print out a list of available labels like this:
    Sorry, the label XXX does not exist as an amplitude array in
    the input_data_file ImportRawData_run_8_/w2_PHX.mtz
    ...available labels are: ['w2', 'SIGw2', 'None']
    
    Then you know that the correct command is:
    phenix.autobuild w2.sca coords.pdb input_labels="w2 SIGw2"
    
  • The AutoBuild Wizard cannot build modified residues. If you supply a model with modified residues, these will be taken out of the chain and treated as ligands, and the chain will be broken at that point. By default the modified residues will be added to your model just before refinement and a cif definitions file will be automatically generated for these residues. You can also add these residues with the input_lig_file_list procedure if you want.
  • The AutoBuild Wizard will not build very short chains unless you set the variable group_ca_length (default=4 for building a model from scratch) to a smaller number. The shortest chain that will be built is group_ca_length. If you use rebuild_in_place, then the default shortest chain allowed is 1 residue, so any part of a model you supply is rebuilt.

Specific limitations and problems

  • The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoBuild wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.
  • The AutoBuild Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, N. W. Moriarty, P. H. Zwart, L.-W. Hung, R. J. Read, and P. D. Adams Acta Cryst. D64, 61-69 (2008)
[pdf]
Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, P. D. Adams, N. W. Moriarty P. H. Zwart, R. J. Read, D. Turk and L.-W. Hung Acta Cryst. D63, 597-610 (2007)
[pdf]
Using prime-and-switch phasing to reduce model bias in molecular replacement. T. C. Terwilliger Acta Cryst. D60, 2144-2149 (2004)
[pdf]
Improving macromolecular atomic models at moderate resolution by automated iterative model building, statistical density modification and refinement. T.C. Terwilliger. Acta Cryst. D59, 1174-1182 (2003)
[pdf]
Statistical density modification using local pattern matching. T.C. Terwilliger. Acta Cryst. D59, 1688-1701 (2003)
[pdf]
Automated side-chain model building and sequence assignment by template matching. T.C. Terwilliger. Acta Cryst. D59, 45-49 (2003)
[pdf]
Automated main-chain model building by template matching and iterative fragment extension. T.C. Terwilliger. Acta Cryst. D59, 38-44 (2003)
[pdf]
Rapid automatic NCS identification using heavy-atom substructures T.C. Terwilliger. Acta Cryst. D58, 2213-2215 (2002)
[pdf]
Statistical density modification with non-crystallographic symmetry T.C. Terwilliger. Acta Cryst. D58, 2082-2086 (2002)
[pdf]
Maximum likelihood density modification T. C. Terwilliger Acta Cryst. D56 , 965-972 (2000)
[pdf]
Maximum-likelihood density modification with pattern recognition of structural motifs. T. C. Terwilliger Acta Cryst. D57 , 1755-1762 (2001)
[pdf]
Map-likelihood phasing T. C. Terwilliger Acta Cryst. D57 , 1763-1775 (2001)
[pdf]

Additional information

List of all AutoBuild keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
autobuild
   data= None Datafile (alias for input_data_file) This file can be a .sca or
         mtz or other standard file. The Wizard will guess the column
         identification. You can specify the column labels to use with:
         input_labels='FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag'
         Substitute any labels you do not have with None. If you only have
         myFP and mysigFP you can just say input_labels='myFP mysigFP'.
         (Command-line only)
   model= None PDB file with starting model (alias for input_pdb_file) NOTE:
          If your PDB file has been previously refined, then please make sure
          that you provide the free R flags that were used in that refinement.
          These can come from the data file or from the refinement_file.
          (Command-line only).
   seq_file= Auto Sequence file (alias for input_seq_file). The format is
             plain text, with chains separated by a line starting with > ,
             any blanks and unrecognized characters are ignored. You need only
             input 1 copy of each unique chain. (Command-line only)
   map_file= Auto MTZ file containing starting map (alias for input_map_file)
             This file must be a mtz file. The Wizard will guess the column
             identification. You can specify the column labels to use with:
             input_map_labels='FP PHIB FOM' Substitute any labels you do not
             have with None. If you only have myFP and myPHIB you can just say
             input_map_labels='myFP myPHIB'. (Command-line only)
   refinement_file= Auto File for refinement (alias for input_refinement_file)
                    This file can be a .sca or mtz or other standard file.
                    This file will be merged with your data file, with any
                    phase information coming from your data file. If this file
                    has free R flags, they will be used, otherwise if the data
                    file has them, those will be used, otherwise they will be
                    generated. The Wizard will guess the column
                    identification. You can specify the column labels to use
                    with: input_refinement_labels='FP SIGFP FreeR_flag'
                    Substitute any labels you do not have with None. If you
                    only have myFP and mysigFP you can just say
                    input_refinement_labels='myFP mysigFP'. (Command-line
                    only).
   hires_file= Auto File with high-resolution data (alias for
               input_hires_file) This file can be a .sca or mtz or other
               standard file. The Wizard will guess the column identification.
               You can specify the column labels to use with:
               input_hires_labels='FP SIGFP'. (Command-line only)
   special_keywords
      write_run_directory_to_file= None Writes the full name of a run
                                   directory to the specified file. This can
                                   be used as a call-back to tell a script
                                   where the output is going to go.
                                   (Command-line only)
   run_control
      coot= None Set coot to True and optionally run=[run-number] to run Coot
            with the current model and map for run run-number. In some wizards
            (AutoBuild) you can edit the model and give it back to PHENIX to
            use as part of the model-building process. If you just say coot
            then the facts for the highest-numbered existing run will be
            shown. (Command-line only)
      ignore_blanks= None ignore_blanks allows you to have a command-line
                     keyword with a blank value like "input_lig_file_list="
      stop= None You can stop the current wizard with "stopwizard" or "stop".
            If you type "phenix.autobuild run=3 stop" then this will stop run
            3 of autobuild. (Command-line only)
      display_facts= None Set display_facts to True and optionally
                     run=[run-number] to display the facts for run run-number.
                     If you just say display_facts then the facts for the
                     highest-numbered existing run will be shown.
                     (Command-line only)
      display_summary= None Set display_summary to True and optionally
                       run=[run-number] to show the summary for run
                       run-number. If you just say display_summary then the
                       summary for the highest-numbered existing run will be
                       shown. (Command-line only)
      carry_on= None Set carry_on to True to carry on with highest-numbered
                run from where you left off. (Command-line only)
      run= None Set run to n to continue with run n where you left off.
           (Command-line only)
      copy_run= None Set copy_run to n to copy run n to a new run and continue
                where you left off. (Command-line only)
      display_runs= None List all runs for this wizard. (Command-line only)
      delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
      display_labels= None display_labels=test.mtz will list all the labels
                      that identify data in test.mtz. You can use the label
                      strings that are produced in AutoSol to identify which
                      data to use from a datafile like this: peak.data="F+
                      SIGF+ F- SIGF-" # the entire string in quotes counts
                      here You can use the individual labels from these
                      strings as identifiers for data columns in AutoSol and
                      AutoBuild like this: input_refinement_labels="FP SIGFP
                      FreeR_flags" # each individual label counts
      dry_run= False Just read in and check parameter names
      params_only= False Just read in and return parameter defaults
      display_all= False Just read in and display parameter defaults
   crystal_info
      cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta
            gamma
      chain_type= *Auto PROTEIN DNA RNA  You can specify whether to build
                  protein, DNA, or RNA chains. At present you can only build
                  one of these in a single run. If you have both DNA and
                  protein, build one first, then run AutoBuild again,
                  supplying the prebuilt model in the "input_lig_file_list"
                  and build the other. NOTE: default for this keyword is Auto,
                  which means "carry out normal process to guess this
                  keyword". The process is to look at the sequence file and/or
                  input pdb file to see what the chain type is. If there are
                  more than one type, the type with the larger number of
                  residues is guessed. If you want to force the chain_type,
                  then set it to PROTEIN RNA or DNA.
      dmax= 500.0 Low-resolution limit 
      overall_resolution= 0.0 If overall_resolution is set, then all data
                          beyond this is ignored. NOTE: this is only suggested
                          if you have a very big cell and need to truncate the
                          data to allow the wizard to run at all. Normally you
                          should use 'resolution' and 'resolution_build' and
                          'refinement_resolution' to set the high-resolution
                          limit 
      resolution= 0.0 High-resolution limit.Used as resolution limit for
                  density modification and as general default high-resolution
                  limit. If resolution_build or refinement_resolution are set
                  then they override this for model-building or refinement. If
                  overall_resolution is set then data beyond that resolution
                  is ignored completely. 
      sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
      solvent_fraction= None Solvent fraction in crystals (0 to 1).
   decision_making
      acceptable_r= 0.25 Used to decide whether the model is acceptable enough
                    to quit if it is not improving much. A good value is 0.25
      dist_close= None  If main-chain atom rmsd is less than dist_close then
                  crossover between chains in different models is allowed at
                  this point.  If you input a negative number the defaults
                  will be used
      dist_close_overlap= 1.5 Model or ligand coordinates but not both are
                          kept when model and ligand coordinates are within
                          dist_close_overlap and ligands in
                          input_lig_file_list are being added to the current
                          model. NOTE: you might want to decrease this if your
                          ligand atoms get removed by the wizard. Default=1.5
                          A
      group_ca_length= 4 In resolve building you can specify how short a
                       fragment to keep. Normally 4 or 5 residues should be
                       the minimum.
      group_length= 2 In resolve building you can specify how many fragments
                    must be joined to make a connected group that is kept.
                    Normally 2 fragments should be the minimum.
      include_molprobity= False You can choose to include the clash score from
                          MolProbity as one of the scoring criteria in
                          comparing and merging models. The score is combined
                          with the model-map correlation CC by summing in a
                          weighted clashscore. If clashscore for a residue has
                          a value < ok_molp_score then its value is
                          (clashscore-ok_molp_score)*scale_molp_score,
                          otherwise its value is zero.
      loop_cc_min= 0.4 You can specify the minimum correlation of density from
                   a loop with the map.
      min_cc_res_rebuild= 0.5 You can rebuild just the worst parts of your
                          model by settting touch_up=True. You can decide what
                          parts to rebuild based on a minimum model-map
                          correlation (by residue). You can decide how much to
                          rebuild using worst_percent_res_rebuild or with
                          min_cc_res_rebuild, or both.
      min_seq_identity_percent= 50.0  The sequence in your input PDB file will
                                be adjusted to match the sequence in your
                                sequence file (if any).  If there are
                                insertions/deletions in your model and the
                                wizard does not seem to identify them, you can
                                split up your PDB file by adding records like
                                this:  BREAK  You can specify the minimum
                                sequence identity between your sequence file
                                and a segment from your input PDB file to
                                consider the sequences to be matched. Default
                                is 50.0%. You might want a higher number to
                                make sure that deletions in the sequence are
                                noticed.
      ok_molp_score= None You can choose to include the clash score from
                     MolProbity as one of the scoring criteria in comparing
                     and merging models. The score is combined with the
                     model-map correlation CC by summing in a weighted
                     clashscore. If clashscore for a residue has a value <
                     ok_molp_score (the threshold defined by ok_molp_score)
                     then its value is
                     (clashscore-ok_molp_score)*scale_molp_score, otherwise
                     its value is zero.
      r_switch= 0.4 R-value criteria for deciding whether to use R-value or
                residues built A good value is 0.40
      scale_molp_score= None You can choose to include the clash score from
                        MolProbity as one of the scoring criteria in comparing
                        and merging models. The score is combined with the
                        model-map correlation CC by summing in a weighted
                        clashscore. If clashscore for a residue has a value <
                        ok_molp_score then its value is
                        (clashscore-ok_molp_score)*scale_molp_score, otherwise
                        its value is zero.
      semi_acceptable_r= 0.3 Used to decide whether the model is acceptable
                         enough to skip rebuilding the model from scratch and
                         focus on adding loops and extending it. A good value
                         is 0.35
   density_modification
      hl= False You can choose whether to calculate hl coeffs when doing
          density modification ('Yes') or not to do so ('No'). Default is No.
      mask_type= *histograms probability wang Choose method for obtaining
                 probability that a point is in the protein vs solvent region.
                 Default is "histograms". If you have a SAD dataset with a
                 heavy atom such as Pt or Au then you may wish to choose
                 "wang" because the histogram method is sensitive to very high
                 peaks. Options are: histograms: compare local rms of map and
                 local skew of map to values from a model map and estimate
                 probabilities. This one is usually the best. probability:
                 compare local rms of map to distribution for all points in
                 this map and estimate probabilities. In a few cases this one
                 is much better than histograms. wang: take points with
                 highest local rms and define as protein.
      modify_outside_delta_solvent= 0.05 You can set the initial solvent
                                    content to be a little lower than
                                    calculated when you are running
                                    modify_outside_model Usually 0.05 is fine.
      modify_outside_model= False You can choose whether to modify the density
                            in the "protein" region outside the region
                            specified in your current model by matching
                            histograms with the region that is specified by
                            that model. This can help by raising the density
                            in this protein region up to a value similar to
                            that where atoms are already placed.
      thorough_denmod= *Auto Yes No True False Choose whether you want to go
                       for thorough density modification when no model is used
                       ("No" speeds it up and for a terrible map is sometimes
                       better)
      truncate_ha_sites_in_resolve= *Auto Yes No True False You can choose to
                                    truncate the density near heavy-atom sites
                                    at a maximum of 2.5 sigma. This is useful
                                    in cases where the heavy-atom sites are
                                    very strong, and rarely hurts in cases
                                    where they are not. The heavy-atom sites
                                    are specified with "input_ha_file"
      use_resolve_fragments= True This script normally uses information from
                             fragment identification as part of density
                             modification for the first few cycles of
                             model-building. Fragments are identified during
                             model-building. The fragments are used, with
                             weighting according to the confidence in their
                             placement, in density modification as targets for
                             density values.
      use_resolve_pattern= True Local pattern identification is normally used
                           as part of density modification during the first
                           few cycles of model building.
   general
      after_autosol= False You can specify that you want to continue on
                     starting with the highest-scoring run of AutoSol.
      background= True When you specify nproc=nn, you can run the jobs in
                  background (default if nproc is greater than 1) or
                  foreground (default if nproc=1).  If you set
                  run_command=qsub (or otherwise submit to a batch queue),
                  then you should set background=False, so that the batch
                  queue can keep track of your runs. There is no need to use
                  background=True in this case because all the runs go as
                  controlled by your batch system. If you use run_command=csh
                  (or similar, csh is default) then normally you will use
                  background=True so that all the jobs run simultaneously.
      base_path= None You can specify the base path for files (default is
                 current working directory)
      clean_up= False At the end of the entire run the TEMP directories will
                be removed if clean_up is True. The default is No, keep these
                directories. If you want to remove them after your run is
                finished use a command like "phenix.autobuild run=1
                clean_up=True"
      coot_name= coot If your version of coot is called something else, then
                 you can specify that here.
      debug= False  You can have the wizard stop with error messages about the
             code if you use debug. NOTE: you cannot use Pause with debug.
      extra_verbose= False Facts and possible commands will be printed every
                     cycle if Yes
      i_ran_seed= 289564  Random seed (positive integer) for model-building
                  and simulated annealing refinement
      max_wait_time= 100.0 You can specify the length of time (seconds) to
                     wait when testing the run_command. If you have a cluster
                     where jobs do not start right away you may need a longer
                     time to wait.
      nbatch= 3 You can specify the number of processors to use (nproc) and
              the number of batches to divide the data into for parallel jobs.
              Normally you will set nproc to the number of processors
              available and leave nbatch alone. If you leave nbatch as None it
              will be set automatically, with a value depending on the Wizard.
              This is recommended. The value of nbatch can affect the results
              that you get, as the jobs are not split into exact replicates,
              but are rather run with different random numbers. If you want to
              get the same results, keep the same value of nbatch.
      nproc= 1 You can specify the number of processors to use (nproc) and the
             number of batches to divide the data into for parallel jobs.
             Normally you will set nproc to the number of processors available
             and leave nbatch alone. If you leave nbatch as None it will be
             set automatically, with a value depending on the Wizard. This is
             recommended. The value of nbatch can affect the results that you
             get, as the jobs are not split into exact replicates, but are
             rather run with different random numbers. If you want to get the
             same results, keep the same value of nbatch.
      quick= False Run everything quickly (number_of_parallel_models=1
             n_cycle_build_max=1 n_cycle_rebuild_max=1)
      resolve_command_list= None  Commands for resolve. One per line in the
                            form:  keyword value  value can be optional 
                            Examples:  coarse_grid  resolution 200 2.0  hklin
                            test.mtz  NOTE: for command-line usage you need to
                            enclose the whole set of commands in double quotes
                            (") and each individual command in single quotes
                            (') like this: resolve_command_list="'no_build'
                            'b_overall 23' "
      resolve_pattern_command_list= None  Commands for resolve_pattern. One
                                    per line in the form:  keyword value 
                                    value can be optional  Examples: 
                                    resolution 200 2.0  hklin test.mtz  NOTE:
                                    for command-line usage you need to enclose
                                    the whole set of commands in double quotes
                                    (") and each individual command in single
                                    quotes (') like this:
                                    resolve_pattern_command_list="'resolution
                                    200 20' 'hklin test.mtz' "
      resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
                    ("","_giant","_huge","_extra_huge")
      run_command= csh When you specify nproc=nn, you can run the subprocesses
                   as jobs in background with csh (default) or submit them to
                   a queue with the command of your choice (i.e., qsub ). If
                   you have a multi-processor machine, use csh. If you have a
                   cluster, use qsub or the equivalent command for your
                   system.  NOTE: If you set run_command=qsub (or otherwise
                   submit to a batch queue), then you should set
                   background=False, so that the batch queue can keep track of
                   your runs. There is no need to use background=True in this
                   case because all the runs go as controlled by your batch
                   system. If you use run_command=csh (or similar, csh is
                   default) then normally you will use background=True so that
                   all the jobs run simultaneously.
      skip_xtriage= False You can bypass xtriage if you want. This will
                    prevent you from applying anisotropy corrections, however.
      temp_dir= None Define a temporary directory (it must exist)
      title= Run 1 AutoBuild Sun Dec 7 17:46:23 2008  Enter any text you like
             to help identify what you did in this run
      top_output_dir= None This is used in subprocess calls of wizards and to
                      tell the Wizard where to look for the STOPWIZARD file. 
      verbose= False Command files and other verbose output will be printed
   input_files
      cif_def_file_list= None  You can enter any number of CIF definition
                         files.  These are normally used to tell phenix.refine
                         about the geometry of a ligand or unusual residue. 
                         You usually will use these in combination with "PDB
                         file with metals/ligands" (keyword
                         "input_lig_file_list" ) which allows you to attach
                         the contents of any PDB file you like to your model
                         just before it gets refined.  You can use
                         phenix.elbow to generate these if you do not have a
                         CIF file and one is requested by phenix.refine
      input_data_file= None Enter the a file with input structure factor data.
                       For structure factor data only (e.g., FP SIGFP) any
                       format is ok. If you have free R flags, phase
                       information or HL coefficients that you want to use
                       then an mtz file is required. If this file contains
                       phase information, this phase information should be
                       experimental (i.e., MAD/SAD/MIR etc), and should not be
                       density-modified phases (enter any files with
                       density-modified phases as input_map_file instead). 
                       NOTE: If you supply HL coefficients they will be used
                       in phase recombination. If you supply PHIB or PHIB and
                       FOM and not HL coefficients, then HL coefficients will
                       be derived from your PHIB and FOM and used in phase
                       recombination.  If you also specify a hires data file,
                       then FP and SIGFP will come from that data file (and
                       not this one)  If an input_refinement_file is
                       specified, then F, Sigma, FreeR_flag (if present) from
                       that file will be used for refinement instead of this
                       one.
      input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set
                     then density at sites specified with input_ha_file is
                     truncated to improve the density modification procedure.
      input_hires_labels= None Labels for input hires file (FP SIGFP
                          FreeR_flag)
      input_labels= None Labels for input data columns NOTE: Applies to input
                    data file for LigandFit and AutoBuild, but not to AutoMR.
                    For AutoMR use instead 'input_label_string'.
      input_lig_file_list= None This script adds the contents of these PDB
                           files to each model just prior to refinement. 
                           Normally you might use this to put in any
                           heavy-atoms that are in the refined structure (for
                           example the heavy atoms that were used in phasing),
                           or to add a ligand to your model.  If the atoms in
                           this PDB file are not recognized by phenix.refine,
                           then you can specify their geometries with a cif
                           definitions file using the keyword
                           "cif_def_files_list". You can easily generate cif
                           definitions for many ligands using phenix.elbow in
                           PHENIX. You can put anything you like in the files
                           in input_lig_file_list, but any atoms that fall
                           within 1.5 A of any atom in the current model will
                           be tossed (not written to the model).
      input_map_file= Auto Enter an mtz file with coefficients for map (if
                      different file or different coefficients than input
                      structure factor data ). This map will be used in the
                      first cycle of model-building.  NOTE: default for this
                      keyword is Auto, which means "carry out normal process
                      to guess this keyword". This means if you specify
                      "after_autosol" in AutoBuild, AutoBuild will
                      automatically take the value from AutoSol. If you do not
                      want this to happen, you can specify None which means
                      "No file"
      input_map_labels= None Labels for input map coefficient columns (FP PHIB
                        FOM) NOTE: FOM is optional (set to None if you wish)
      input_pdb_file= None You can enter a PDB file containing a starting
                      model of your structure NOTE: If you enter a PDB file
                      then the AutoBuild wizard will start right in with
                      rebuild steps, skipping the build process. If the model
                      is very poor than it may be better to leave it out as
                      the build process (which includes pattern recognition
                      and recognition of helical and strand fragments) is
                      optimized for improving poor maps, while the rebuild
                      process is optimized for better maps that can be
                      produced by having a partial model.
      input_refinement_file= Auto Data file to use for refinement. The data in
                             this file should not be corrected for anisotropy.
                             It will be combined with experimental phase
                             information (if any) from input_data_file for
                             refinement. If you leave this blank, then the
                             data in the input_data_file will be used in
                             refinement. If no anisotropy correction is
                             applied to the data you do not need to specify a
                             datafile for refinement. If an anisotropy
                             correction is applied to the data files, then you
                             should enter an uncorrected datafile for
                             refinement.  Any standard format is fine;
                             normally only F and sigF will be used. Bijvoet
                             pairs and duplicates will be averaged. If an mtz
                             file is provided then a free R flag can be read
                             in as well. Any HL coeffs and phase information
                             in this file is ignored. NOTE: default for this
                             keyword is Auto, which means "carry out normal
                             process to guess this keyword". This means if you
                             specify "after_autosol" in AutoBuild, AutoBuild
                             will automatically take the value from AutoSol.
                             If you do not want this to happen, you can
                             specify None which means "No file"
      input_refinement_labels= None Labels for input refinement file columns
                               (FP SIGFP FreeR_flag)
      input_seq_file= Auto Enter name of file with 1-letter code of protein
                      sequence NOTES: 1. lines starting with > are ignored
                      and separate chains  2. FASTA format is fine  3. If
                      there are multiple copies of a chain, just enter one
                      copy.  4. If you enter a PDB file for rebuilding and it
                      has the sequence you want, then the sequence file is not
                      necessary.   NOTE: You can also enter the name of a PDB
                      file that contains SEQRES records, and the sequence from
                      the SEQRES records will be read, written to
                      seq_from_seqres_records.dat, and used as your input
                      sequence.  NOTE: for AutoBuild you can specify
                      start_chains_list on the first line of your sequence
                      file: >> start_chains_list 23 11 5 NOTE: default
                      for this keyword is Auto, which means "carry out normal
                      process to guess this keyword". This means if you
                      specify "after_autosol" in AutoBuild, AutoBuild will
                      automatically take the value from AutoSol. If you do not
                      want this to happen, you can specify None which means
                      "No file"
      keep_input_ligands= True You can choose whether to (by default) let the
                          wizard keep ligands by separating them out from the
                          rest of your model and adding them back to your
                          rebuilt model, or alternatively to remove all
                          ligands from your input pdb file before
                          rebuild_in_place.
      keep_input_waters= False You can choose whether to keep input waters
                         (solvent) when using rebuild_in_place. If you keep
                         them, then you should specify either
                         "place_waters=No" or "keep_pdb_atoms=No" because if
                         place_waters=Yes and keep_pdb_atoms=Yes then
                         phenix.refine will add waters and then the wizard
                         will keep the new waters from the new PDB file
                         created by phenix.refine preferentially over the ones
                         in your input file.
      keep_pdb_atoms= True You can choose whether to keep the model
                      coordinates when model and ligand coordinates are within
                      dist_close_overlap and ligands in input_lig_file_list
                      are being added to the current model. Default=Yes
      refine_eff_file_list= None  You can enter any number of refinement
                            parameter files.  These are normally used to tell
                            phenix.refine defaults to apply, as well as
                            creating specialized definitions such as unusual
                            amino acid residues and linkages.  These
                            parameters override the normal phenix.refine
                            defaults. They themselves can be overridden by
                            parameters set by the Wizard and by you,
                            controlling the Wizard. NOTE: Any parameters set
                            by AutoBuild directly (such as
                            number_of_macro_cycles, high_resolution, etc...)
                            will not be taken from this parameters file. This
                            is useful only for adding extra parameters not
                            normally set by AutoBuild.
   maps
      maps_only= False You can choose whether to skip all model-building and
                 just calculate maps and write out the results. This also runs
                 just 1 cycle and turns on HL coefficients.
      n_xyz_list= None You can specify the grid to use for map calculations.
   model_building
      allow_negative_residues= False Normally the wizard does not allow
                               negative residue numbers, and all residues with
                               negative numbers are rejected when they are
                               read in. You can allow them if you wish.
      base_model= None  You can enter a PDB file with coordinates to be used
                  as a starting point for model-building. These coordinates
                  will be included in the same way as fragments placed by
                  searching for helices and strand in initial model-building. 
                 Note the difference from the use of models in
                  consider_main_chain_list, which are merged with models after
                  they are built. NOTE: Only use this if you want to keep the
                  input model and just add to it.
      build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build
                  models with RESOLVE and TEXTAL or either one, and how many
                  different models to build with RESOLVE. The more you build,
                  the more likely to get a complete model.  Note that
                  rebuild_in_place can only be carried out with RESOLVE
                  model-building
      cc_helix_min= None Minimum CC of helical density to map at low
                    resolution when using helices_strands_only
      cc_strand_min= None Minimum CC of strand density to map when using
                     helices_strands_only
      consider_main_chain_list= None This keyword lets you name any number of
                                PDB files to consider as templates for
                                model-building. Every time models are built,
                                the contents of these files will be merged
                                with them and the best parts will be kept.
                                NOTE: this only uses the main-chain atoms of
                                your PDB files.
      dist_connect_max_helices= None Set maximum distance between ends of
                                helices and other ends to try and connect them
                                in insert_helices.
      edit_pdb= True You can choose to edit the input PDB file in
                rebuild_in_place to match the input sequence (default=True).
                NOTE: residues with residue numbers higher than
                'highest_resno' are assumed to not have a known sequence and
                will not be edited. By default the value of 'highest_resno' is
                the highest residue number from the sequence file, after
                adding it to the starting residue number from
                start_chains_list. You can also set it directly
      helices_strands_only= False You can choose to use a quick model-building
                            method that only builds secondary structure. At
                            low resolution this may be both quicker and more
                            accurate than trying to build the entire structure
                            If you are running the AutoSol Wizard, normally
                            you should choose 'Yes' and use the quick
                            model-building. Then when your structure is solved
                            by AutoSol, go on to AutoBuild and build a more
                            complete model (this time normally using
                            helices_strands_only=False).
      helices_strands_start= False You can choose to use a quick
                             model-building method that builds secondary
                             structure as a way to get started...then model
                             completion is done as usual. (Contrast with
                             helices_strands_only which only does secondary
                             structure)
      highest_resno= None  Highest residue number to be considered "placed" in
                     sequence for rebuild_in_place
      include_input_model= True  The keyword include_input_model defines
                           whether the input model (if any) is to be crossed
                           with models that are derived from it, and the best
                           parts of each kept. Note that if
                           multiple_models=True and include_input_model=True
                           then no initial cycle of randomization will be
                           carried out and the keyword
                           multiple_models_starting_resolution is ignored. In
                           most cases you should use include_input_model=True 
                          If you want to generate maximum diversity with
                           multiple-models then you may wish to use
                           include_input_model=False. Also if you want to
                           decrease the amount of bias from your starting
                           model you may wish to use
                           include_input_model=False.
      input_compare_file= NONE If you are rebuilding a model or already think
                          you know what the model should be, you can include a
                          comparison file in rebuilding. The model is not used
                          for anything except to write out information on
                          coordinate differences in the output log files. 
                         NOTE: this feature does not always work correctly.
      merge_models= False  You can choose to only merge any input models and
                    write out the resulting model. The best parts of each
                    model will be kept based on model-map correlation.
                    Normally used along with number_of_parallel_models=1
      morph= False You can choose whether to distort your input model in order
             to match the current working map. This may be useful for MR
             models that are quite distant from the correct structure.
      morph_cycles= 2 Number of iterations of morphing each time it is run.
      morph_rad= 7.0 Smoothing radius for morphing. The density from your
                 model and from the map are calculated with the radius
                 rad_morph, then they are adjusted to overlap optimally
      n_ca_enough_helices= None Set maximum number of CA to add to ends of
                           helices and other ends to try and connect them in
                           insert_helices.
      offsets_list= 53 7 23 You can specify an offset for the orientation of
                    the helix and strand templates in building. This is used
                    in generating different starting models.
      ps_in_rebuild= False You can choose to use a prime-and-switch resolve
                     map in all cycles of rebuilding instead of a
                     density-modified map. This is normally used in
                     combination with maps_only to generate a prime-and-switch
                     map.
      refine= True This script normally refines the model during building. Say
              No to skip refinement
      resolution_build= 0.0 Enter the high-resolution limit for
                        model-building. If 0.0, the value of resolution is
                        used as a default. 
      restart_cycle_after_morph= 5 Morphing (if morph=True) will go only up to
                                 this cycle, and then the morphed PDB file
                                 will be used as a starting PDB file from then
                                 on, removing all previous models.
      retrace_before_build= False  You can choose to retrace your model n_mini
                            times and use a map based on these retraced models
                            to start off model-building. This is the default
                            for rebuilding models if you are not using
                            rebuild_in_place. You can also specify
                            n_iter_rebuild, the number of cycles of
                            retrace-density-modify-build before starting the
                            main build.
      reuse_chain_prev_cycle= True You can choose to allow model-building to
                              include atoms from each cycle in the model the
                              next cycle or not
      richardson_rotamers= *Auto Yes No True False  You can choose to use the
                           rotamer library from SC Lovell, JM Word, JS
                           Richardson and DC Richardson (2000) " The
                           Penultimate Rotamer Library" Proteins: Structure
                           Function and Genetics 40 389-408. if you wish.
                           Typically this works well in RESOLVE model-building
                           for nearly-final models but not as well earlier in
                           the process . Default (Auto) is to use these
                           rotamers for rebuild_in_place but not otherwise.
      rms_random_frag= None  Rms random position change added to residues on
                       ends of fragments when extending them  If you enter a
                       negative number, defaults will be used.
      rms_random_loop= None  Rms random position change added to residues on
                       ends of loops in tries for building loops  If you enter
                       a negative number, defaults will be used.
      semet= False You can specify that the dataset that is used for
             refinement is a selenomethionine dataset, and that the model
             should be the SeMet version of the protein, with all SD of MET
             replaced with Se of MSE.
      start_chains_list= None  You can specify the starting residue number for
                         each of the unique chains in your structure. If you
                         use a sequence file then the unique chains are
                         extracted and the order must match the order of your
                         starting residue numbers. For example, if your
                         sequence file has chains A and B (identical) and
                         chains C and D (identical to each other, but
                         different than A and B) then you can enter 2 numbers,
                         the starting residues for chains A and C. NOTE: you
                         need to specify an input sequence file for
                         start_chains_list to be applied.
      trace_as_lig= False You can specify that in building steps the ends of
                    chains are to be extended using the LigandFit algorithm.
                    This is default for nucleic acid model-building.
      track_libs= False You can keep track of what libraries each atom in a
                  built structure comes from.
      two_fofc_in_rebuild= False You can choose to use a sigmaa-weighted
                           2Fo-Fc map in all cycles of rebuilding instead of a
                           density-modified map. If the model is poor this can
                           sometimes allow model-building in place to work
                           even when it will not for density-modified maps.
      use_any_side= True  You can choose to have resolve model-building place
                    the best-fitting side chain at each position, even if the
                    sequence is not matched to the map.
      use_cc_in_combine_extend= False You can choose to use the correlation of
                                density rather than density at atomic
                                positions to score models in combine_extend
      use_met_in_align= *Auto Yes No True False You can use the heavy-atom
                        positions in input_ha_file as markers for Met SD
                        positions.
   multiple_models
      combine_only= False Once you have created a set of initial models you
                    can merge them together into a final set. This option is
                    useful if you have split up the creation of multiple
                    models into different directories, and then you have
                    copied all the initial models to one directory for
                    combining.
      multiple_models= False You can build a set of models, all compatible
                       with your data. You can specify how many models with
                       multiple_models_number. If you are using
                       rebuild_in_place you can specify whether to generate
                       starting models or not with multiple_models_starting.
      multiple_models_first= 1 Specify which model to build first
      multiple_models_group_number= 5 You can build several initial models and
                                    merge them. Normally 5 initial models is
                                    fine.
      multiple_models_last= 20 Specify which model to end with
      multiple_models_number= 20 Specify how many models to build.
      multiple_models_starting= True You can specify how to generate starting
                                models for multiple models. If you are using
                                rebuild_in_place and you specify "Yes" then
                                the Wizard will rebuild your starting model at
                                the resolution specified in
                                multiple_models_starting_resolution. If you
                                are not using rebuild_in_place the Wizard will
                                always build a starting model at the current
                                resolution.
      multiple_models_starting_resolution= 4.0 You can set the resolution for
                                           rebuilding an initial model. A
                                           value of 0.0 will use the
                                           resolution of the dataset.
      place_waters_in_combine= True You can choose whether phenix.refine
                               automatically places ordered solvent (waters)
                               during the last cycle of multiple-model
                               generation. This is separate from place_waters,
                               which applies to all other cycles.
   ncs
      find_ncs=