| Python-based Hierarchical ENvironment for Integrated Xtallography |
| Documentation Home |
Automated Model Building and Rebuilding using AutoBuild
Author(s)
PurposePurpose of the AutoBuild WizardThe purpose of the AutoBuild Wizard is to provide a highly automated system for model rebuilding and completion. The Wizard design allows the user to specify data files and parameters through an interactive GUI, or alternatively through keyworded scripts. The AutoBuild Wizard begins with datafiles with structure factor amplitudes and uncertainties, along with either experimental phase information or a starting model, carries out cycles of model-building and refinement alternating with model-based density modification, and producing a relatively complete atomic model. The AutoBuild Wizard uses RESOLVE, (optionally also TEXTAL), xtriage and phenix.refine to build an atomic model, refine it, and improve it with iterative density modification, refinement, and model-building The Wizard begins with either experimental phases (i.e., from AutoSol) or with an atomic model that can be used to generate calculated phases. The AutoBuild Wizard produces a refined model that can be nearly complete if the data are strong and the resolution is about 2.5 A or better. At lower resolutions (2.5 - 3 A) the model may be less complete and at resolutions > 3A the model may be quite incomplete and not well refined. The AutoBuild Wizard can be used to generate OMIT maps (simple omit, SA-omit, iterative-build omit) that can cover the entire unit cell or specific residues in a PDB file. The AutoBuild Wizard can generate a set of models compatible with experimental data (multiple_models) UsageThe AutoBuild Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here. How the AutoBuild Wizard worksThe AutoBuild Wizard begins with experimental structure factor amplitudes, along with either experimental or model-based estimates of crystallographic phases. The phase information is improved by using statistical density modification to improve the correlation of NCS-related density in the map (if present) and to improve the match of the distribution of electron densities in the map with those expected from a model map. This improved map is then used to build and refine an atomic model. In subsequent cycles, the models from previous cycles are used as a source of phase information in statistical density modification, iteratively improving the quality of the map used for model-building. Additionally, during the first few cycles additional phase information is obtained by detecting and enhancing (1) the presence of commonly-found local patterns of density in the map, and (2) the presence of density in the shape of helices and strands. The final model obtained is analyzed for residue-based map correlation and density at the coordinates of individual atoms, and an analysis including a summary of atoms and residues that are in strong, moderate, or weak density and out of density is provided. Automation and user controlThe AutoBuild Wizard has been designed for ease of use combined with maximal user control, with as many parameters set automatically by the Wizard as possible, but maintaining parameters accessible to the user through a GUI and through keyword-based scripts. The Wizard uses the input/output routines of the cctbx library, allowing data files of many different formats so that the user does not have to convert their data to any particular format before using the Wizard. Use of the phenix.refine refinement package in the AutoBuild Wizard allows a high degree of automation of refinement so that the neither user nor Wizard is required to specify parameters for refinement. The phenix.refine package automatically includes a bulk solvent model and automatically places solvent molecules. Core modules in the AutoBuild WizardThe five core modules in the AutoBuild Wizard are
The standard procedures available in the AutoBuild Wizard that are based on these modules include:
Starting from a set of experimental phases and structure factor amplitudes, normally procedure (a) is carried out, and then the resulting model is rebuilt with procedure (b). Starting from a model (e.g., from molecular replacement) and experimental structure factor amplitudes, procedure (c) is normally carried out if the starting model differs less than about 50% in sequence from the desired model, and otherwise procedure (b) is used. How to run the AutoBuild WizardRunning the AutoBuild Wizard is easy. For example, from the command-line you can type: phenix.autobuild data=w1.sca seq.dat model=coords.pdb The AutoBuild Wizard will carry out iterative model-building, density modification and refinement based on the data in w1.sca and the model in coords.pdb, editing the model as necessary to match the sequence in seq.dat. What the AutoBuild wizard needs to run
...and optional files
Specifying which columns of data to use from input data filesIf one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing. For example, if your data file ref.mtz has columns FP SIGFP and FreeR then you might specify refinement_file=ref.mtz input_refinement_labels="FP SIGFP None None None None None None FreeR" The keywords for labels and anticipated input labels (program labels) are: input_labels (for data file): FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag input_refinement_labels: FP SIGFP FreeR_flag input_map_labels: FP PHIB FOM input_hires_labels: FP SIGFP FreeR_flag You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=w1.mtz # display all labels for w1.mtz NOTES: if your data files contain a mixture of amplitude and intensity data then only the amplitude data is available. If you have only intensity data in a data file and want to select specific columns, then you need to specify the column names as they are after importing the data and conversion to amplitudes (see below under General Limitations for details). Specifying other general parametersYou can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are: data=w1.sca # data file model=coords.pdb # starting model seq_file=seq.dat # sequence file map_file=map_coeffs.mtz # coefficients for a starting map for building resolution=3 # dmin of 3 A s_annealing=True # use simulated annealing refinement at start of each cycle n_cycle_build_max=5 # max number of build cycles (starting from experimental phases) n_cycle_rebuild_max=5 # max number of rebuild cycles (starting from a model) Picking waters in AutoBuildBy default AutoBuild will instruct phenix.refine to pick waters using its standard procedure. This means that if the resolution of the data is high enough (typically 3 A) then waters are placed. You can tell AutoBuild not to have phenix.refine pick waters with the command: place_waters=FalseIf you want to place waters at a lower resolution, you will need to reset the low-resolution cutoff for placing waters in phenix.refine. You would do that in a "refinement_params.eff" file containing lines like these (see below for passing parameters to phenix.refine with an ".eff" file): refinement {
ordered_solvent {
low_resolution = 2.8
}
}
Keeping waters from your input file in AutoBuildYou can tell AutoBuild to keep the waters in your input file when you are using rebuild_in_place (the default is to toss them and replace them with new ones). You can say, keep_input_waters=True place_waters=NoNOTE: If you specify keep_input_waters=True you should also specify either "place_waters=No" or "keep_pdb_atoms=No" . This is because if place_waters=Yes and keep_pdb_atoms=Yes then phenix.refine will add waters and then the wizard will keep the new waters from the new PDB file created by phenix.refine preferentially over the ones in your input file. Specifying phenix.refine parametersYou can control phenix.refine parameters that are not specified directly by AutoBuild using a refinement parameters (.eff) file: refine_eff_file=refinement_params.eff # set any phenix.refine params not set by AutoBuildThis file might contain a twin-law for refinement: refinement {
twinning {
twin_law = "-k, -h, -l"
}
}
You can put any phenix.refine parameters in this file, but a few parameters that are set directly by AutoBuild override your inputs from the refine_eff_file. These parameters are listed below. Refinement parameters that must be set using AutoBuild Wizard keywords (overwriting any values provided by user in input_eff_file) The following parameters controlling phenix.refine output are set directly in AutoBuild and cannot be set by the user
Specifying resolve/resolve_pattern parametersSimilarly, you can control resolve and resolve_pattern parameters. For these parameters, your inputs will not be overridden by AutoBuild. The format is a little tricky: you have to put two sets of quotes around the command like this: resolve_command="'resolution 200 3'" # NOTE ' and " quotesThis will put the text resolution 200 3at the end of every temporary command file created to run resolve. (This is why it is not overridden by AutoBuild commands; they will all come before your commands in the resolve command file.) Note that some commands in resolve may be incompatible with this usage. Including ligand coordinates in AutoBuildIf your input PDB file contains ligands (anything other than solvent that is not protein if your chain_type=PROTEIN, for example) then by default these ligands will be kept, used in refinement, and written out to your output PDB file. Any solvent molecules will by default be discarded. You can change this behavior by changing the keywords from these defaults: keep_input_ligands=True keep_input_waters=FalseThe AutoBuild Wizard will use phenix.elbow to generate geometries for any ligands that are not recognized. You can also tell AutoBuild to add the contents of any PDB files that you wish to supply to the current version of the structure just before refinement, so all the refined models produced contain whatever AutoBuild has built, plus the contents of these PDB files. This can be done through the GUI, the command-line, or a script. In the command-line version you do this with: input_lig_file_list=my_ligand.pdb NOTE: The files in input_lig_file_list will be edited to make them all HETATM records to tell AutoBuild to ignore these residues in rebuilding. NOTE You may need to tell phenix.refine about the geometry of your ligands. You will get an error message if the ligand is not recognized and an automatic run of phenix.elbow does not succeed in generating your ligand. In that case you will want to run phenix.elbow to create a cif definition file for this ligand: phenix.elbow my_ligand.pdb --id=LIGwhere LIG is the 3-letter ID code that you use in my_ligand.pdb to identify your ligand. If the automatic run does not work you may need to give phenix.elbow additional information to generate your ligand. Once phenix.elbow has generated your ligand you can use the keyword "cif_def_file_list" to tell AutoBuild about this ligand: cif_def_file_list=elbow.LIG.my_ligand.pdb.cif Specifying arbitrary commands and cif files for phenix.refineYou can tell AutoBuild to apply any set of cif definitions to the model during refinement by using a combination of specification files and the commands cif_def_file_list and refine_eff_file_list: refine_eff_file_list=link.eff cif_def_file_list=link.cifThis example comes from the phenix.refine manual page in which a link is specified in a cif definition file link.cif: data_mod_5pho # loop_ _chem_mod_atom.mod_id _chem_mod_atom.function _chem_mod_atom.atom_id _chem_mod_atom.new_atom_id _chem_mod_atom.new_type_symbol _chem_mod_atom.new_type_energy _chem_mod_atom.new_partial_charge 5pho add . O5T O OH . loop_ _chem_mod_bond.mod_id _chem_mod_bond.function _chem_mod_bond.atom_id_1 _chem_mod_bond.atom_id_2 _chem_mod_bond.new_type _chem_mod_bond.new_value_dist _chem_mod_bond.new_value_dist_esd 5pho add O5T P coval 1.520 0.020 and this is applied with a parameters file link.eff: refinement.pdb_interpretation.apply_cif_modification
{
data_mod = 5pho
residue_selection = resname GUA and name O5T
}
You can have any number of cif files and parameters files. Output files from AutoBuildWhen you run AutoBuild the output files will be in a subdirectory with your run number: AutoBuild_run_1_/ # subdirectory with results
Standard building, rebuild_in_place, and multiple-modelsThe AutoBuild Wizard has two overall methods for building a model. The first method (standard build) is to build a model from scratch. This involves identification of where helices (and strands, for proteins) are located, extension using fragment libraries, connection of segments, identification of side-chains, and sequence alignment. These methods are augmented in the standard building procedure by loop-fitting and building model outside of the region that has already been built. The second method (rebuild_in_place) takes an existing model and rebuilds it without adding or deleting any residues and without changing the connectivity of the chain. The way this works is a segment of the model is deleted and then is filled-in again by rebuilding from the remaining ends. This is repeated for overlapping segments covering the entire model. The multiple-models approach really has two levels of multiple models. At the first level, several (multiple_models_group_number, default is number_of_parallel_models) models are built (using rebuild_in_place) and are then recombined into a single good model. At the next level, this whole process may be done more than once (multiple_models_number times), yielding several very good models. By default, if you ask for rebuild_in_place, then you will get a single very good model, created by running rebuild_in_place several times and recombining the models. Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallelThe AutoBuild Wizard is set up to take advantage of multi-processor machines or batch queues by splitting the work into separate tasks. See Tutorial 4: Iterative model-building, density modification and refinement starting from experimental phases and Tutorial 6: Automatically rebuilding a structure solved by Molecular Replacement for a description of the method used by the AutoBuild Wizard to run build jobs as sub-processes and to combine the results into single models. Here are the key factors that determine how splitting model-building into batches and running them on one or more processors works:
Model editing during rebuilding with the Coot-PHENIX interfaceThe AutoBuild Wizard allows you to edit a model and give it back to the Wizard during the iterative model-building, density modification and refinement process. The Wizard will consider the model that you give it along with the models that it generates automatically, and will choose the parts of your model that fit the density better than other models. You can edit a model using the PHENIX-Coot interface. This interface is accessible through the GUI and via the command-line. Using the GUI, when a model has been produced by the AutoBuild Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start Coot with your current map and model. If you are running from the command-line, you can open a new window and type: phenix.autobuild cootwhich will do the same (provided the necessary map and model are ready). When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. You can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. The precise use of the model that you save depends on the type of model-building that is being carried out by the AutoBuild Wizard. If you are using rebuild_in_place then the main-chain and side-chains of the model are considered as replacements for the current working model. Any ligands or unrecognized residues are (by default) not rebuilt but are included in refinement. By default, solvent in the model is ignored. If you are not using rebuild_in_place, only the main-chain conformation is considered, and the side-chains are ignored. Ligands (but not solvent) in the model are (by default) kept and included in refinement. As the AutoBuild Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX. Resolution limits in AutoBuildThere are several resolution limits used in AutoBuild. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set: ExamplesRun AutoBuild automatically after AutoSolphenix.autobuild after_autosol Run AutoBuild beginning with experimental dataphenix.autobuild data=solve_1.mtz seq_file=seq.dat Merge in hires dataphenix.autobuild data=solve_2.mtz hires_file=w1.sca seq_file=seq.dat Make a SA-omit map around atoms in target.pdbphenix.autobuild data=data.mtz model=coords.pdb omit_box_pdb=target.pdb composite_omit_type=sa_omitCoefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz). Make a simple composite omit mapphenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=simple_omitCoefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz). Make an iterative-build omit map around atoms in target.pdbphenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=target.pdb \ composite_omit_type=iterative_build_omitCoefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz). Make a sa-omit map around residues 3 and 4 in chain A of coords.pdbphenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=coords.pdb \ omit_res_start_list=3 omit_res_end_list=4 omit_chain_list=A \ composite_omit_type=sa_omitCoefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note 1: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz). Create one very good rebuilt modelphenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \ include_input_model=True \ multiple_models_number=1 n_cycle_rebuild_max=5The final model will be in the file MULTIPLE_MODELS/all_models.pdb (this file will contain just one model). Note that this procedure will keep the sequence that is present in coords.pdb. If you supply a sequence file it will edit the sequence of coords.pdb to match your sequence file and discard any residues that do not match. (If you want to input a sequence file but not edit the sequence in coords.pdb and not discard any non-matching residues, then specify also edit_pdb=False.) Note also that if include_input_model=True then no randomization cycle will be carried out and multiple_models_starting_resolution is ignored. Touch up a modelphenix.autobuild data=data.mtz model=coords.pdb \ touch_up=True worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8You can rebuild just the worst parts of your model by settting touch_up=True. You can decide what parts to rebuild based on a minimum model-map correlation (by residue). You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both. Create 20 very good rebuilt models that are as different as possiblephenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \ multiple_models_number=20 n_cycle_rebuild_max=5The 20 models will be in the file MULTIPLE_MODELS/all_models.pdb. This procedure is useful for generating an ensemble of models that are each individually consistent with the data, and yet are diverse. The variation among these models is an indication of the uncertainty in each of the models. Note that the ensemble of models is not a representation of the ensemble of structures that is truly present in the crystal. Morph an MR model and rebuild itphenix.autobuild data=data.mtz model=MR.pdb \ morph=True rebuild_in_place=False seq_file=seq.datYou can have autobuild morph your input model, distorting it to match the density-modified map that is produced from your model and data. This can be used to make an improved starting model in cases where the MR model is very different than the structure that is to be solved. For the morphing to work, the two structures must be topologically similar and differ mostly by movements of domains or motifs such as a group of helices or a sheet. The morphing process consists of identifying a coordinate shift to apply to each N (or P for nucleic acids) atom that maximizes the local density correlation between the model and the map. This is smoothed and applied to the structure to generate a morphed structure. Build an RNA chainphenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=RNA Build a DNA chainphenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=DNA Just make maps; don't do any building.phenix.autobuild data=data.mtz model=coords.pdb maps_only=True Just calculate a prime-and-switch mapphenix.autobuild data=data.mtz solvent_fraction=.6 \ ps_in_rebuild=True model=coords.pdb maps_only=TrueThe output prime-and-switch map will be in the file prime_and_switch.mtz. Possible ProblemsGeneral limitations
Specific limitations and problems
Literature
Additional informationList of all AutoBuild keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
autobuild
data= None Datafile (alias for input_data_file) This file can be a .sca or
mtz or other standard file. The Wizard will guess the column
identification. You can specify the column labels to use with:
input_labels='FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag'
Substitute any labels you do not have with None. If you only have
myFP and mysigFP you can just say input_labels='myFP mysigFP'.
(Command-line only)
model= None PDB file with starting model (alias for input_pdb_file) NOTE:
If your PDB file has been previously refined, then please make sure
that you provide the free R flags that were used in that refinement.
These can come from the data file or from the refinement_file.
(Command-line only).
seq_file= Auto Sequence file (alias for input_seq_file). The format is
plain text, with chains separated by a line starting with > ,
any blanks and unrecognized characters are ignored. You need only
input 1 copy of each unique chain. (Command-line only)
map_file= Auto MTZ file containing starting map (alias for input_map_file)
This file must be a mtz file. The Wizard will guess the column
identification. You can specify the column labels to use with:
input_map_labels='FP PHIB FOM' Substitute any labels you do not
have with None. If you only have myFP and myPHIB you can just say
input_map_labels='myFP myPHIB'. (Command-line only)
refinement_file= Auto File for refinement (alias for input_refinement_file)
This file can be a .sca or mtz or other standard file.
This file will be merged with your data file, with any
phase information coming from your data file. If this file
has free R flags, they will be used, otherwise if the data
file has them, those will be used, otherwise they will be
generated. The Wizard will guess the column
identification. You can specify the column labels to use
with: input_refinement_labels='FP SIGFP FreeR_flag'
Substitute any labels you do not have with None. If you
only have myFP and mysigFP you can just say
input_refinement_labels='myFP mysigFP'. (Command-line
only).
hires_file= Auto File with high-resolution data (alias for
input_hires_file) This file can be a .sca or mtz or other
standard file. The Wizard will guess the column identification.
You can specify the column labels to use with:
input_hires_labels='FP SIGFP'. (Command-line only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
crystal_info
cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta
gamma
chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
protein, DNA, or RNA chains. At present you can only build
one of these in a single run. If you have both DNA and
protein, build one first, then run AutoBuild again,
supplying the prebuilt model in the "input_lig_file_list"
and build the other. NOTE: default for this keyword is Auto,
which means "carry out normal process to guess this
keyword". The process is to look at the sequence file and/or
input pdb file to see what the chain type is. If there are
more than one type, the type with the larger number of
residues is guessed. If you want to force the chain_type,
then set it to PROTEIN RNA or DNA.
dmax= 500.0 Low-resolution limit
overall_resolution= 0.0 If overall_resolution is set, then all data
beyond this is ignored. NOTE: this is only suggested
if you have a very big cell and need to truncate the
data to allow the wizard to run at all. Normally you
should use 'resolution' and 'resolution_build' and
'refinement_resolution' to set the high-resolution
limit
resolution= 0.0 High-resolution limit.Used as resolution limit for
density modification and as general default high-resolution
limit. If resolution_build or refinement_resolution are set
then they override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution
is ignored completely.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
solvent_fraction= None Solvent fraction in crystals (0 to 1).
decision_making
acceptable_r= 0.25 Used to decide whether the model is acceptable enough
to quit if it is not improving much. A good value is 0.25
dist_close= None If main-chain atom rmsd is less than dist_close then
crossover between chains in different models is allowed at
this point. If you input a negative number the defaults
will be used
dist_close_overlap= 1.5 Model or ligand coordinates but not both are
kept when model and ligand coordinates are within
dist_close_overlap and ligands in
input_lig_file_list are being added to the current
model. NOTE: you might want to decrease this if your
ligand atoms get removed by the wizard. Default=1.5
A
group_ca_length= 4 In resolve building you can specify how short a
fragment to keep. Normally 4 or 5 residues should be
the minimum.
group_length= 2 In resolve building you can specify how many fragments
must be joined to make a connected group that is kept.
Normally 2 fragments should be the minimum.
include_molprobity= False You can choose to include the clash score from
MolProbity as one of the scoring criteria in
comparing and merging models. The score is combined
with the model-map correlation CC by summing in a
weighted clashscore. If clashscore for a residue has
a value < ok_molp_score then its value is
(clashscore-ok_molp_score)*scale_molp_score,
otherwise its value is zero.
loop_cc_min= 0.4 You can specify the minimum correlation of density from
a loop with the map.
min_cc_res_rebuild= 0.5 You can rebuild just the worst parts of your
model by settting touch_up=True. You can decide what
parts to rebuild based on a minimum model-map
correlation (by residue). You can decide how much to
rebuild using worst_percent_res_rebuild or with
min_cc_res_rebuild, or both.
min_seq_identity_percent= 50.0 The sequence in your input PDB file will
be adjusted to match the sequence in your
sequence file (if any). If there are
insertions/deletions in your model and the
wizard does not seem to identify them, you can
split up your PDB file by adding records like
this: BREAK You can specify the minimum
sequence identity between your sequence file
and a segment from your input PDB file to
consider the sequences to be matched. Default
is 50.0%. You might want a higher number to
make sure that deletions in the sequence are
noticed.
ok_molp_score= None You can choose to include the clash score from
MolProbity as one of the scoring criteria in comparing
and merging models. The score is combined with the
model-map correlation CC by summing in a weighted
clashscore. If clashscore for a residue has a value <
ok_molp_score (the threshold defined by ok_molp_score)
then its value is
(clashscore-ok_molp_score)*scale_molp_score, otherwise
its value is zero.
r_switch= 0.4 R-value criteria for deciding whether to use R-value or
residues built A good value is 0.40
scale_molp_score= None You can choose to include the clash score from
MolProbity as one of the scoring criteria in comparing
and merging models. The score is combined with the
model-map correlation CC by summing in a weighted
clashscore. If clashscore for a residue has a value <
ok_molp_score then its value is
(clashscore-ok_molp_score)*scale_molp_score, otherwise
its value is zero.
semi_acceptable_r= 0.3 Used to decide whether the model is acceptable
enough to skip rebuilding the model from scratch and
focus on adding loops and extending it. A good value
is 0.35
density_modification
hl= False You can choose whether to calculate hl coeffs when doing
density modification ('Yes') or not to do so ('No'). Default is No.
mask_type= *histograms probability wang Choose method for obtaining
probability that a point is in the protein vs solvent region.
Default is "histograms". If you have a SAD dataset with a
heavy atom such as Pt or Au then you may wish to choose
"wang" because the histogram method is sensitive to very high
peaks. Options are: histograms: compare local rms of map and
local skew of map to values from a model map and estimate
probabilities. This one is usually the best. probability:
compare local rms of map to distribution for all points in
this map and estimate probabilities. In a few cases this one
is much better than histograms. wang: take points with
highest local rms and define as protein.
modify_outside_delta_solvent= 0.05 You can set the initial solvent
content to be a little lower than
calculated when you are running
modify_outside_model Usually 0.05 is fine.
modify_outside_model= False You can choose whether to modify the density
in the "protein" region outside the region
specified in your current model by matching
histograms with the region that is specified by
that model. This can help by raising the density
in this protein region up to a value similar to
that where atoms are already placed.
thorough_denmod= *Auto Yes No True False Choose whether you want to go
for thorough density modification when no model is used
("No" speeds it up and for a terrible map is sometimes
better)
truncate_ha_sites_in_resolve= *Auto Yes No True False You can choose to
truncate the density near heavy-atom sites
at a maximum of 2.5 sigma. This is useful
in cases where the heavy-atom sites are
very strong, and rarely hurts in cases
where they are not. The heavy-atom sites
are specified with "input_ha_file"
use_resolve_fragments= True This script normally uses information from
fragment identification as part of density
modification for the first few cycles of
model-building. Fragments are identified during
model-building. The fragments are used, with
weighting according to the confidence in their
placement, in density modification as targets for
density values.
use_resolve_pattern= True Local pattern identification is normally used
as part of density modification during the first
few cycles of model building.
general
after_autosol= False You can specify that you want to continue on
starting with the highest-scoring run of AutoSol.
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
i_ran_seed= 289564 Random seed (positive integer) for model-building
and simulated annealing refinement
max_wait_time= 100.0 You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 3 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
quick= False Run everything quickly (number_of_parallel_models=1
n_cycle_build_max=1 n_cycle_rebuild_max=1)
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single quotes
(') like this: resolve_command_list="'no_build'
'b_overall 23' "
resolve_pattern_command_list= None Commands for resolve_pattern. One
per line in the form: keyword value
value can be optional Examples:
resolution 200 2.0 hklin test.mtz NOTE:
for command-line usage you need to enclose
the whole set of commands in double quotes
(") and each individual command in single
quotes (') like this:
resolve_pattern_command_list="'resolution
200 20' 'hklin test.mtz' "
resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
("","_giant","_huge","_extra_huge")
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 AutoBuild Sun Dec 7 17:46:23 2008 Enter any text you like
to help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
verbose= False Command files and other verbose output will be printed
input_files
cif_def_file_list= None You can enter any number of CIF definition
files. These are normally used to tell phenix.refine
about the geometry of a ligand or unusual residue.
You usually will use these in combination with "PDB
file with metals/ligands" (keyword
"input_lig_file_list" ) which allows you to attach
the contents of any PDB file you like to your model
just before it gets refined. You can use
phenix.elbow to generate these if you do not have a
CIF file and one is requested by phenix.refine
input_data_file= None Enter the a file with input structure factor data.
For structure factor data only (e.g., FP SIGFP) any
format is ok. If you have free R flags, phase
information or HL coefficients that you want to use
then an mtz file is required. If this file contains
phase information, this phase information should be
experimental (i.e., MAD/SAD/MIR etc), and should not be
density-modified phases (enter any files with
density-modified phases as input_map_file instead).
NOTE: If you supply HL coefficients they will be used
in phase recombination. If you supply PHIB or PHIB and
FOM and not HL coefficients, then HL coefficients will
be derived from your PHIB and FOM and used in phase
recombination. If you also specify a hires data file,
then FP and SIGFP will come from that data file (and
not this one) If an input_refinement_file is
specified, then F, Sigma, FreeR_flag (if present) from
that file will be used for refinement instead of this
one.
input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set
then density at sites specified with input_ha_file is
truncated to improve the density modification procedure.
input_hires_labels= None Labels for input hires file (FP SIGFP
FreeR_flag)
input_labels= None Labels for input data columns NOTE: Applies to input
data file for LigandFit and AutoBuild, but not to AutoMR.
For AutoMR use instead 'input_label_string'.
input_lig_file_list= None This script adds the contents of these PDB
files to each model just prior to refinement.
Normally you might use this to put in any
heavy-atoms that are in the refined structure (for
example the heavy atoms that were used in phasing),
or to add a ligand to your model. If the atoms in
this PDB file are not recognized by phenix.refine,
then you can specify their geometries with a cif
definitions file using the keyword
"cif_def_files_list". You can easily generate cif
definitions for many ligands using phenix.elbow in
PHENIX. You can put anything you like in the files
in input_lig_file_list, but any atoms that fall
within 1.5 A of any atom in the current model will
be tossed (not written to the model).
input_map_file= Auto Enter an mtz file with coefficients for map (if
different file or different coefficients than input
structure factor data ). This map will be used in the
first cycle of model-building. NOTE: default for this
keyword is Auto, which means "carry out normal process
to guess this keyword". This means if you specify
"after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not
want this to happen, you can specify None which means
"No file"
input_map_labels= None Labels for input map coefficient columns (FP PHIB
FOM) NOTE: FOM is optional (set to None if you wish)
input_pdb_file= None You can enter a PDB file containing a starting
model of your structure NOTE: If you enter a PDB file
then the AutoBuild wizard will start right in with
rebuild steps, skipping the build process. If the model
is very poor than it may be better to leave it out as
the build process (which includes pattern recognition
and recognition of helical and strand fragments) is
optimized for improving poor maps, while the rebuild
process is optimized for better maps that can be
produced by having a partial model.
input_refinement_file= Auto Data file to use for refinement. The data in
this file should not be corrected for anisotropy.
It will be combined with experimental phase
information (if any) from input_data_file for
refinement. If you leave this blank, then the
data in the input_data_file will be used in
refinement. If no anisotropy correction is
applied to the data you do not need to specify a
datafile for refinement. If an anisotropy
correction is applied to the data files, then you
should enter an uncorrected datafile for
refinement. Any standard format is fine;
normally only F and sigF will be used. Bijvoet
pairs and duplicates will be averaged. If an mtz
file is provided then a free R flag can be read
in as well. Any HL coeffs and phase information
in this file is ignored. NOTE: default for this
keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild
will automatically take the value from AutoSol.
If you do not want this to happen, you can
specify None which means "No file"
input_refinement_labels= None Labels for input refinement file columns
(FP SIGFP FreeR_flag)
input_seq_file= Auto Enter name of file with 1-letter code of protein
sequence NOTES: 1. lines starting with > are ignored
and separate chains 2. FASTA format is fine 3. If
there are multiple copies of a chain, just enter one
copy. 4. If you enter a PDB file for rebuilding and it
has the sequence you want, then the sequence file is not
necessary. NOTE: You can also enter the name of a PDB
file that contains SEQRES records, and the sequence from
the SEQRES records will be read, written to
seq_from_seqres_records.dat, and used as your input
sequence. NOTE: for AutoBuild you can specify
start_chains_list on the first line of your sequence
file: >> start_chains_list 23 11 5 NOTE: default
for this keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not
want this to happen, you can specify None which means
"No file"
keep_input_ligands= True You can choose whether to (by default) let the
wizard keep ligands by separating them out from the
rest of your model and adding them back to your
rebuilt model, or alternatively to remove all
ligands from your input pdb file before
rebuild_in_place.
keep_input_waters= False You can choose whether to keep input waters
(solvent) when using rebuild_in_place. If you keep
them, then you should specify either
"place_waters=No" or "keep_pdb_atoms=No" because if
place_waters=Yes and keep_pdb_atoms=Yes then
phenix.refine will add waters and then the wizard
will keep the new waters from the new PDB file
created by phenix.refine preferentially over the ones
in your input file.
keep_pdb_atoms= True You can choose whether to keep the model
coordinates when model and ligand coordinates are within
dist_close_overlap and ligands in input_lig_file_list
are being added to the current model. Default=Yes
refine_eff_file_list= None You can enter any number of refinement
parameter files. These are normally used to tell
phenix.refine defaults to apply, as well as
creating specialized definitions such as unusual
amino acid residues and linkages. These
parameters override the normal phenix.refine
defaults. They themselves can be overridden by
parameters set by the Wizard and by you,
controlling the Wizard. NOTE: Any parameters set
by AutoBuild directly (such as
number_of_macro_cycles, high_resolution, etc...)
will not be taken from this parameters file. This
is useful only for adding extra parameters not
normally set by AutoBuild.
maps
maps_only= False You can choose whether to skip all model-building and
just calculate maps and write out the results. This also runs
just 1 cycle and turns on HL coefficients.
n_xyz_list= None You can specify the grid to use for map calculations.
model_building
allow_negative_residues= False Normally the wizard does not allow
negative residue numbers, and all residues with
negative numbers are rejected when they are
read in. You can allow them if you wish.
base_model= None You can enter a PDB file with coordinates to be used
as a starting point for model-building. These coordinates
will be included in the same way as fragments placed by
searching for helices and strand in initial model-building.
Note the difference from the use of models in
consider_main_chain_list, which are merged with models after
they are built. NOTE: Only use this if you want to keep the
input model and just add to it.
build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build
models with RESOLVE and TEXTAL or either one, and how many
different models to build with RESOLVE. The more you build,
the more likely to get a complete model. Note that
rebuild_in_place can only be carried out with RESOLVE
model-building
cc_helix_min= None Minimum CC of helical density to map at low
resolution when using helices_strands_only
cc_strand_min= None Minimum CC of strand density to map when using
helices_strands_only
consider_main_chain_list= None This keyword lets you name any number of
PDB files to consider as templates for
model-building. Every time models are built,
the contents of these files will be merged
with them and the best parts will be kept.
NOTE: this only uses the main-chain atoms of
your PDB files.
dist_connect_max_helices= None Set maximum distance between ends of
helices and other ends to try and connect them
in insert_helices.
edit_pdb= True You can choose to edit the input PDB file in
rebuild_in_place to match the input sequence (default=True).
NOTE: residues with residue numbers higher than
'highest_resno' are assumed to not have a known sequence and
will not be edited. By default the value of 'highest_resno' is
the highest residue number from the sequence file, after
adding it to the starting residue number from
start_chains_list. You can also set it directly
helices_strands_only= False You can choose to use a quick model-building
method that only builds secondary structure. At
low resolution this may be both quicker and more
accurate than trying to build the entire structure
If you are running the AutoSol Wizard, normally
you should choose 'Yes' and use the quick
model-building. Then when your structure is solved
by AutoSol, go on to AutoBuild and build a more
complete model (this time normally using
helices_strands_only=False).
helices_strands_start= False You can choose to use a quick
model-building method that builds secondary
structure as a way to get started...then model
completion is done as usual. (Contrast with
helices_strands_only which only does secondary
structure)
highest_resno= None Highest residue number to be considered "placed" in
sequence for rebuild_in_place
include_input_model= True The keyword include_input_model defines
whether the input model (if any) is to be crossed
with models that are derived from it, and the best
parts of each kept. Note that if
multiple_models=True and include_input_model=True
then no initial cycle of randomization will be
carried out and the keyword
multiple_models_starting_resolution is ignored. In
most cases you should use include_input_model=True
If you want to generate maximum diversity with
multiple-models then you may wish to use
include_input_model=False. Also if you want to
decrease the amount of bias from your starting
model you may wish to use
include_input_model=False.
input_compare_file= NONE If you are rebuilding a model or already think
you know what the model should be, you can include a
comparison file in rebuilding. The model is not used
for anything except to write out information on
coordinate differences in the output log files.
NOTE: this feature does not always work correctly.
merge_models= False You can choose to only merge any input models and
write out the resulting model. The best parts of each
model will be kept based on model-map correlation.
Normally used along with number_of_parallel_models=1
morph= False You can choose whether to distort your input model in order
to match the current working map. This may be useful for MR
models that are quite distant from the correct structure.
morph_cycles= 2 Number of iterations of morphing each time it is run.
morph_rad= 7.0 Smoothing radius for morphing. The density from your
model and from the map are calculated with the radius
rad_morph, then they are adjusted to overlap optimally
n_ca_enough_helices= None Set maximum number of CA to add to ends of
helices and other ends to try and connect them in
insert_helices.
offsets_list= 53 7 23 You can specify an offset for the orientation of
the helix and strand templates in building. This is used
in generating different starting models.
ps_in_rebuild= False You can choose to use a prime-and-switch resolve
map in all cycles of rebuilding instead of a
density-modified map. This is normally used in
combination with maps_only to generate a prime-and-switch
map.
refine= True This script normally refines the model during building. Say
No to skip refinement
resolution_build= 0.0 Enter the high-resolution limit for
model-building. If 0.0, the value of resolution is
used as a default.
restart_cycle_after_morph= 5 Morphing (if morph=True) will go only up to
this cycle, and then the morphed PDB file
will be used as a starting PDB file from then
on, removing all previous models.
retrace_before_build= False You can choose to retrace your model n_mini
times and use a map based on these retraced models
to start off model-building. This is the default
for rebuilding models if you are not using
rebuild_in_place. You can also specify
n_iter_rebuild, the number of cycles of
retrace-density-modify-build before starting the
main build.
reuse_chain_prev_cycle= True You can choose to allow model-building to
include atoms from each cycle in the model the
next cycle or not
richardson_rotamers= *Auto Yes No True False You can choose to use the
rotamer library from SC Lovell, JM Word, JS
Richardson and DC Richardson (2000) " The
Penultimate Rotamer Library" Proteins: Structure
Function and Genetics 40 389-408. if you wish.
Typically this works well in RESOLVE model-building
for nearly-final models but not as well earlier in
the process . Default (Auto) is to use these
rotamers for rebuild_in_place but not otherwise.
rms_random_frag= None Rms random position change added to residues on
ends of fragments when extending them If you enter a
negative number, defaults will be used.
rms_random_loop= None Rms random position change added to residues on
ends of loops in tries for building loops If you enter
a negative number, defaults will be used.
semet= False You can specify that the dataset that is used for
refinement is a selenomethionine dataset, and that the model
should be the SeMet version of the protein, with all SD of MET
replaced with Se of MSE.
start_chains_list= None You can specify the starting residue number for
each of the unique chains in your structure. If you
use a sequence file then the unique chains are
extracted and the order must match the order of your
starting residue numbers. For example, if your
sequence file has chains A and B (identical) and
chains C and D (identical to each other, but
different than A and B) then you can enter 2 numbers,
the starting residues for chains A and C. NOTE: you
need to specify an input sequence file for
start_chains_list to be applied.
trace_as_lig= False You can specify that in building steps the ends of
chains are to be extended using the LigandFit algorithm.
This is default for nucleic acid model-building.
track_libs= False You can keep track of what libraries each atom in a
built structure comes from.
two_fofc_in_rebuild= False You can choose to use a sigmaa-weighted
2Fo-Fc map in all cycles of rebuilding instead of a
density-modified map. If the model is poor this can
sometimes allow model-building in place to work
even when it will not for density-modified maps.
use_any_side= True You can choose to have resolve model-building place
the best-fitting side chain at each position, even if the
sequence is not matched to the map.
use_cc_in_combine_extend= False You can choose to use the correlation of
density rather than density at atomic
positions to score models in combine_extend
use_met_in_align= *Auto Yes No True False You can use the heavy-atom
positions in input_ha_file as markers for Met SD
positions.
multiple_models
combine_only= False Once you have created a set of initial models you
can merge them together into a final set. This option is
useful if you have split up the creation of multiple
models into different directories, and then you have
copied all the initial models to one directory for
combining.
multiple_models= False You can build a set of models, all compatible
with your data. You can specify how many models with
multiple_models_number. If you are using
rebuild_in_place you can specify whether to generate
starting models or not with multiple_models_starting.
multiple_models_first= 1 Specify which model to build first
multiple_models_group_number= 5 You can build several initial models and
merge them. Normally 5 initial models is
fine.
multiple_models_last= 20 Specify which model to end with
multiple_models_number= 20 Specify how many models to build.
multiple_models_starting= True You can specify how to generate starting
models for multiple models. If you are using
rebuild_in_place and you specify "Yes" then
the Wizard will rebuild your starting model at
the resolution specified in
multiple_models_starting_resolution. If you
are not using rebuild_in_place the Wizard will
always build a starting model at the current
resolution.
multiple_models_starting_resolution= 4.0 You can set the resolution for
rebuilding an initial model. A
value of 0.0 will use the
resolution of the dataset.
place_waters_in_combine= True You can choose whether phenix.refine
automatically places ordered solvent (waters)
during the last cycle of multiple-model
generation. This is separate from place_waters,
which applies to all other cycles.
ncs
find_ncs= | |||||||||||||||||||||||||