phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated molecular replacement with AutoMR

Author(s)
Purpose
Purpose of the AutoMR Wizard
Usage
Summary of inputs and outputs for AutoMR
Output files from AutoMR
How to run the AutoMR Wizard
Components, copies, search models, and ensembles
What the AutoMR wizard needs to run
Specifying which columns of data to use from input data files
Examples
Standard AutoMR run with coords.pdb native.sca
Specifying data columns
Specifying a refinement file for AutoBuild
Passing any commands to AutoBuild
AutoMR searching for 2 components
Specifying molecular masses of 2 components
AutoMR searching for 2 components, but specifying the orientation of one of them
Combining MR and SAD phasing information (MRSAD)
Possible Problems
Specific limitations and problems
Literature
Additional information
List of all AutoMR keywords

Author(s)

  • Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
  • AutoMR Wizard: Tom Terwilliger, Laurent Storoni, Randy Read, and Airlie McCoy
  • PHENIX GUI and PDS Server: Nigel W. Moriarty
  • phenix.xtriage: Peter Zwart

Purpose

Purpose of the AutoMR Wizard

The AutoMR Wizard provides a convenient interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding.

The AutoMR Wizard begins with datafiles with structure factor amplitudes and uncertainties, a search model or models, and identifies placements of the search models that are compatible with the data.

Usage

The AutoMR Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here.

NOTE: You may find it easiest to run the GUI version of AutoMR when you are learning how to use it, and then to move to the command-line or script versions later, as the GUI version will take you through all the necessary steps of organizing your data.

Summary of inputs and outputs for AutoMR

Input data file. This file can be in most any format, and must contain either amplitudes or intensities and sigmas. You can specify what resolution to use for molecular replacement and separately what resolution to use for model rebuilding. If you specify "0.0" for resolution (recommended) then defaults will be used for molecular replacement (i.e. use data to 2.5A if available to solve structure, then carry out rigid body refinement of final solution with all data) and all the data will be used for model rebuilding.

Composition of the asymmetric unit. PHASER needs to know what the total mass in the asymmetric unit is (i.e. not just the mass of the search models). You can define this either by specifying one or more protein or nucleic acid sequence files, or by specifying protein or nucleic acid molecular masses, and telling the Wizard how many copies of each are present.

Space groups to search. You can request that all space groups with the same point group as the one you start out with be searched, and the best one be chosen. If you select this option then the best space group will be used for model rebuilding in AutoBuild.

Ensembles to search for. AutoMR builds up a model by finding a set of good positions and orientations of one "ensemble", and then using each of those placements as starting points for finding the next ensemble, until all the contents of the asymmetric unit are found and a consistent solution is obtained. You can specify any number of different ensembles to search for, and you can search for any number of copies of each ensemble. The order of searching for ensembles does make a difference. If possible, you want to search for the biggest, best-ordered, most accurate ensemble first. You specify the order when you list the ensembles to search for on the last main window of the AutoMR wizard.

Each ensemble can be specified by a single PDB file or a set of PDB files. The contents of one set of PDB files for an ensemble must all be oriented in the same way, as they will be put together and used as a group always in the molecular replacement process.

You will need to specify how similar you think each input PDB file that is part of an ensemble is to the structure that is in your crystal. You can specify either sequence identity, or expected rmsd. Note that if you use a homology model, you should give the sequence identity of the template from which the model was constructed, not the 100% identity of the model!

Output of AutoMR

Output files from AutoMR

When you run AutoMR the output files will be in a subdirectory with your run number:

AutoMR_run_1_/   # subdirectory with results

  • A summary file listing the results of the run and the other files produced:
    AutoMR_summary.dat  # overall summary
    

  • A warnings file listing any warnings about the run
    AutoMR_warnings.dat  # any warnings
    

  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    AutoMR_Facts.dat   # all Facts about the run
    

  • Molecular replacement model, structure factors, and map coefficients:
    MR.1.pdb
    MR.1.mtz
    MR.MAP_COEFFS.1.mtz
    
    The AutoMR wizard writes out MR.1.pdb and MR.1.mtz and MR.MAP_COEFFS.1.mtz well as output log files. The MR.1.pdb file will contain all the components of your MR solution. If there are multiple PDB files in an ensemble, the model with the lowest estimated rmsd is chosen to represent the whole ensemble and is written to MR.1.pdb. If there are multiple copies of a model, the chains are lettered sequentially A B C... The MR.1.mtz file contains the data from your input file to the full resolution available. The MR.MAP_COEFFS.1.mtz file contains sigmaA-weighted 2Fo-Fc map coefficients based on the rigid-body-refined model.

Model rebuilding. After PHASER molecular replacement the AutoMR Wizard loads the AutoBuild Wizard and sets the defaults based on the MR solution that has just been found. You can use the default values, or you may choose to use 2Fo-Fc maps instead of density-modified maps for rebuilding, or you may choose to start the model-rebuilding with the map coefficients from MR.MAP_COEFFS.1.mtz.

How to run the AutoMR Wizard

Running the AutoMR Wizard is easy. For example, from the command-line you can type:

phenix.automr native.sca search.pdb RMS=0.8 mass=23000 copies=1

The AutoMR Wizard will find the best location and orientation of the search model search.pdb in the unit cell based on the data in native.sca, assuming that the RMSD between the correct model and search.pdb is about 0.8 A, that the molecular mass of the true model is 23000 and that there is 1 copy of this model in the asymmetric unit. Once the AutoMR Wizard has found a solution, it will automatically call the AutoBuild Wizard and rebuild the model.

Components, copies, search models, and ensembles

  • Your structure is composed of one or more components such as a 20Kd subunit with sequence seq-of-20Kd-subunit.

  • There may be one or more copies of each component in your structure.

  • You can search for the location(s) of a component with a search model that consists of a single structure or an ensemble of structures.

What the AutoMR wizard needs to run

In a simple case where you have one search model and are looking for N copies of this model in your structure, you need:

  • (1) a datafile name (native.sca or data=native.sca)

  • (2) a search model (search_model.pdb or coords=search_model.pdb)

  • (3) how similar the search model is to your structure ( RMS=0.8 or identity=75)

  • (4) information about the contents of the asymmetric unit: (mass=23000 or seq_file=seq.dat) and (copies=1)

It may be advantageous to search using an ensemble of similar structures, rather than a single structure. If you have an ensemble of search models to search for, then specify it as

coords='model_1.pdb model_2.pdb model_3.pdb'

In this case you need to give the RMS or identity for each model: identity='45 40 35'. Each of the models in the ensemble must be in the same orientation as the others, so that the ensemble of models can be placed as a group in the unit cell.

If you are searching for more than one ensemble, or if there is more than one component in the a.u., then use the full syntax and specify them as (NOTE copies becomes copies_to_find or component_copies):

ensemble_1.coords=s1.pdb ensemble_1.RMS=0.8 ensemble_1.copies_to_find=1 \
   component_1.mass=23000 component_1.component_copies=1

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file data.mtz has columns F SIGF then you might specify

data=data.mtz
input_label_string="F SIGF"

You can find out all the possible label strings in a data file that you might use by typing:

phenix.autosol display_labels=data.mtz  # display all labels for data.mtz

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

data=w1.sca       # data file
model=coords.pdb  # starting model
seq_file=seq.dat  # sequence file

Examples

Standard AutoMR run with coords.pdb native.sca

Run AutoMR using coords.pdb as search model, native.sca as data, assume RMS between coords.pdb and true model is about 0.85 A, the sequence of true model is seq.dat and there is 1 copy in the unit cell:

phenix.automr coords.pdb native.sca RMS=0.85 seq.dat copies=1  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2

Specifying data columns

Run AutoMR as above, but specify the data columns explicitly:

phenix.automr coords.pdb RMS=0.85 seq.dat copies=1  \
    data=data.mtz input_label_string="F SIGF"  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2 
Note that the data columns are specified by a string that includes both F and SIGF : "F SIGF". The string must match some set of data labels that can be extracted automatically from your data file. You can find the possible values of this string as described above with
phenix.automr display_labels=data.mtz

Specifying a refinement file for AutoBuild

Run AutoMR as above, but specify a refinement file that is different from the file used for the MR search:

phenix.automr coords.pdb RMS=0.85 seq.dat copies=1  \
    data=data.mtz input_label_string="F SIGF"  \
    autobuild_input_refinement_file=refinement.mtz \
    autobuild_input_refinement_labels="FP SIGFP FreeR_flag"  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2 
Note that the commands input_refinement_file and input_refinement_labels are preceded by autobuild_ . These commands and others with this prefix are passed on to AutoBuild.

Passing any commands to AutoBuild

You can pass any AutoBuild commands on to AutoBuild, even if they are not already defined for you in AutoMR. Use the command autobuild_input_list_add to add a command, and then apply that command by adding "autobuild_" to the beginning of the command name. For example, to add the commands semet=True and refine=False:

phenix.automr coords.pdb RMS=0.85 seq.dat copies=1  \
    data=data.mtz input_label_string="F SIGF"  \
    autobuild_input_list_add='semet refine'  \
    autobuild_semet=True \
    autobuild_refine=False
Notes. This applies only to command-line operation of AutoMR. Note that any keywords that are used in both AutoBuild and AutoMR will apply to both if you specify them in autobuild_input_list_add. For example if you set the resolution in AutoBuild with autobuild_input_list_add=resolution and autobuild_resolution=2.6 then this resolution will apply to both AutoMR and AutoBuild.

AutoMR searching for 2 components

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures.

phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma"  \
 resolution=0.0 resolution_build=3.0                               \
 component_1.component_type=protein component_1.seq_file=beta.seq  \
 component_1.component_copies=1                                    \
 component_2.component_type=protein component_2.seq_file=blip.seq  \
 component_2.component_copies=1                                    \
 ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=1 \
 ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \
 n_cycle_rebuild_max=1

Specifying molecular masses of 2 components

Run AutoMR as in the previous example, except specify the components of the asymmetric unit with molecular masses (30000 and 20000), and define the search models with PDB files and percent sequence identity with the true structures (50% and 60%).

phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma"  \
 resolution=0.0 resolution_build=3.0                               \
 component_1.component_type=protein component_1.mass=30000  \
 component_1.component_copies=1                                    \
 component_2.component_type=protein component_2.mass=20000 \
 component_2.component_copies=1                                    \
 ensemble_1.coords=beta.pdb ensemble_1.identity=50 ensemble_1.copies_to_find=1 \
 ensemble_2.coords=blip.pdb ensemble_2.identity=60 ensemble_2.copies_to_find=1 \
 n_cycle_rebuild_max=1

AutoMR searching for 2 components, but specifying the orientation of one of them

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. Define the orientation and position of one component. Define the numbrer of copies to find for each component (0 for beta, which is fixed, 1 for blip).

phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma"  \
 resolution=0.0 resolution_build=3.0                               \
 component_1.component_type=protein component_1.seq_file=beta.seq  \
 component_1.component_copies=1                                    \
 component_2.component_type=protein component_2.seq_file=blip.seq  \
 component_2.component_copies=1                                    \
 ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=0 \
 ensemble_1.ensembleID="beta" \
 ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \
 ensemble_2.ensembleID="blip" \
 n_cycle_rebuild_max=1 \
 fixed_ensembleID_list="beta" \
 fixed_euler_list="199.84,41.535,184.15"\
 fixed_frac_list="-0.49736,-0.15895,-0.28067"
Note: you have to define an ensemble for the fixed molecule (beta in this example).

Combining MR and SAD phasing information (MRSAD)

You can combine MR information with SAD phases (see J. P. Schuermann and J. J. Tanner Acta Cryst. (2003). D59, 1731-1736 ) conveniently in PHENIX by running the three wizards AutoMR, AutoSol, and AutoBuild one after the other. Here is a set of three simple commands to do that:

First run AutoMR to find the molecular replacement solution, but don't rebuild it yet:

phenix.automr gene-5.pdb infl.sca copies=1 \
  RMS=1.5 mass=9800 rebuild_after_mr=False

Now your MR solution is in AutoMR_run_1_/MR.1.pdb and phases are in AutoMR_run_1_/MR.1.mtz. Use these phases as input to AutoSol, along with some weak SAD data, still not building any new models:

 phenix.autosol data=infl.sca \
 input_phase_file=AutoMR_run_1_/MR.1.mtz input_phase_labels="F PHIC FOM"   \
seq_file=sequence.dat build=False

note that we have specified the data columns for F PHI and FOM in the input_phase_file. For input_phase_file you must specify all three of these (if you leave out FOM it will set it to zero).

AutoSol will write an MTZ file with experimental phases to phaser_xx.mtz where xx depends on how many solutions are considered during the run. The next command for running AutoBuild you will need to edit depending on the value of xx:

 phenix.autobuild data=AutoSol_run_1_/phaser_2.mtz \
  model=AutoMR_run_1_/MR.1.pdb seq_file=sequence.dat rebuild_in_place=False

AutoBuild will now take the phases from your AutoSol run and combine them with model-based information from your AutoMR MR solution, and will carry out iterative density modification, model-building and refinement to rebuild your model.

Note that you may wish to set rebuild_in_place=True, depending on how good your MR model is.

Possible Problems

Specific limitations and problems

  • The AutoBuild Wizard can build PROTEIN, RNA, or DNA, but it can only build one at a time. If your MR model contains more than one type of chain, then you will need to run AutoBuild separately from AutoMR and when you run AutoBuild, specify one of them with input_lig_file_list and the type of chain to build with chain_type:

     
    input_lig_file_list=ProteinPartofMRmodel.pdb
    chain_type=DNA
    

  • If you use an ensemble as a search model, the output structure will contain just the first member of the ensemble, so you may wish to put the member that is likely to be the most similar to the true structure as the first one in your ensemble.

  • If you run AutoMR from the GUI and continue on to AutoBuild, and then select "Start run over (delete everything for this run)" it will delete your AutoBuild and your AutoMR run and start your AutoMR run all over.

  • The AutoMR Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Phaser crystallographic software. A. J. McCoy, R. W. Grosse-Kunstleve, P. D. Adams, M. D. Winn, L. C. Storoni and R. J. Read J. Appl. Cryst. 40, 658-674 (2007)
[pdf]
Likelihood-enhanced fast translation functions. A.J. McCoy, R.W. Grosse-Kunstleve, L.C. Storoni & R.J. Read Acta Cryst. D61, 458-464 (2005)
[pdf]
Likelihood-enhanced fast rotation functions. L.C. Storoni, A.J. McCoy and R.J. Read. Acta Cryst. D60, 432-438 (2004)
[pdf]

Additional information

List of all AutoMR keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
automr
   write_run_directory_to_file= None Writes the full name of a run directory
                                to the specified file. This can be used as a
                                call-back to tell a script where the output is
                                going to go. (Command-line only)
   coot= None Set coot to True and optionally run=[run-number] to run Coot
         with the current model and map for run run-number. In some wizards
         (AutoBuild) you can edit the model and give it back to PHENIX to use
         as part of the model-building process. If you just say coot then the
         facts for the highest-numbered existing run will be shown.
         (Command-line only)
   ignore_blanks= None ignore_blanks allows you to have a command-line keyword
                  with a blank value like "input_lig_file_list="
   stop= None You can stop the current wizard with "stopwizard" or "stop". If
         you type "phenix.autobuild run=3 stop" then this will stop run 3 of
         autobuild. (Command-line only)
   display_facts= None Set display_facts to True and optionally
                  run=[run-number] to display the facts for run run-number. If
                  you just say display_facts then the facts for the
                  highest-numbered existing run will be shown. (Command-line
                  only)
   display_summary= None Set display_summary to True and optionally
                    run=[run-number] to show the summary for run run-number.
                    If you just say display_summary then the summary for the
                    highest-numbered existing run will be shown. (Command-line
                    only)
   carry_on= None Set carry_on to True to carry on with highest-numbered run
             from where you left off. (Command-line only)
   run= None Set run to n to continue with run n where you left off.
        (Command-line only)
   copy_run= None Set copy_run to n to copy run n to a new run and continue
             where you left off. (Command-line only)
   display_runs= None List all runs for this wizard. (Command-line only)
   delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
   display_labels= None display_labels=test.mtz will list all the labels that
                   identify data in test.mtz. You can use the label strings
                   that are produced in AutoSol to identify which data to use
                   from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" #
                   the entire string in quotes counts here You can use the
                   individual labels from these strings as identifiers for
                   data columns in AutoSol and AutoBuild like this:
                   input_refinement_labels="FP SIGFP FreeR_flags" # each
                   individual label counts
   dry_run= False Just read in and check parameter names
   build= True Run AutoBuild immediately after AutoMR (Command-line only)
   data= None Datafile (any standard format) (Command-line only)
   copies= None Set both copies_to_find and component_copies with copies. This
           is the number of copies of this search model to find, and also the
           number of copies of this sequence or mass in the asymmetric unit.
           (Command-line only)
   autobuild_two_fofc_in_rebuild= None Actively sets two_fofc_in_rebuild in
                                  AutoBuild. NOTE: value is not checked
   autobuild_include_input_model= None Actively sets include_input_model in
                                  AutoBuild. NOTE: value is not checked
   autobuild_n_cycle_rebuild_min= None Actively sets n_cycle_rebuild_min in
                                  AutoBuild. NOTE: value is not checked
   autobuild_n_cycle_rebuild_max= None Actively sets n_cycle_rebuild_max in
                                  AutoBuild. NOTE: value is not checked
   autobuild_debug= None Actively sets debug in AutoBuild. NOTE: value is not
                    checked
   autobuild_n_cycle_build_min= None Actively sets n_cycle_build_min in
                                AutoBuild. NOTE: value is not checked
   autobuild_n_cycle_build_max= None Actively sets n_cycle_build_max in
                                AutoBuild. NOTE: value is not checked
   autobuild_rebuild_in_place= None Actively sets rebuild_in_place in
                               AutoBuild. NOTE: value is not checked
   autobuild_thorough_denmod= None Actively sets thorough_denmod in AutoBuild.
                              NOTE: value is not checked
   autobuild_i_ran_seed= None Actively sets i_ran_seed in AutoBuild. NOTE:
                         value is not checked
   autobuild_start_chains_list= None Actively sets start_chains_list in
                                AutoBuild. NOTE: value is not checked
   autobuild_input_refinement_file= None Actively sets input_refinement_file
                                    in AutoBuild. NOTE: value is not checked
   autobuild_input_refinement_labels= None Actively sets
                                      input_refinement_labels in AutoBuild.
                                      NOTE: value is not checked
   autobuild_input_labels= None Actively sets input_labels in AutoBuild. NOTE:
                           value is not checked
   autobuild_resolve_command_list= None Actively sets resolve_command_list in
                                   AutoBuild. NOTE: value is not checked
   autobuild_resolve_pattern_command_list= None Actively sets
                                           resolve_pattern_command_list in
                                           AutoBuild. NOTE: value is not
                                           checked
   autobuild_nproc= None Actively sets nproc in AutoBuild. NOTE: value is not
                    checked
   autobuild_nbatch= None Actively sets nbatch in AutoBuild. NOTE: value is
                     not checked
   ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
   copies_to_find= None Number of copies of this ensemble to find in a.u.
                   (Command-line only)
   coords= None model(s) for this ensemble. (Command-line only)
   identity= None percent identity(ies) of model(s) in this ensemble to
             structure (alternative is RMS). (Command-line only)
   RMS= None RMSD(s) of model(s) to structure (alternative is identity).
        (Command-line only)
   seq_file= None protein seq_file for this component. (Command-line only)
   component_type= *protein nucleic_acid protein or nucleic acid.
                   (Command-line only)
   mass= None molecular mass (kDa) of this component. (Command-line only)
   component_copies= None Number of copies of this component in the a.u.
                     (required). (Command-ine only)
   all_plausible_sg_list= None Choose which space groups to search
   autobuild_input_list_add= None You can add keywords to those that AutoMR
                             passes on to AutoBuild (command-line only) The
                             format for this command is:
                             autobuild_input_list_add='semet refine' Then you
                             can set any of the variables you specify by
                             adding the prefix "autobuild_" to the name of
                             your variable: autobuild_semet=False
                             autobuild_refine=True This will now set
                             'semet'=False and refine=True in AutoBuild
   background= *Yes No True False When you specify nproc=nn, you can run the
               jobs in background (default if nproc is greater than 1) or
               foreground (default if nproc=1). If you set run_command=qsub
               (or otherwise submit to a batch queue), then you should set
               background=False, so that the batch queue can keep track of
               your runs. There is no need to use background=True in this case
               because all the runs go as controlled by your batch system. If
               you use run_command=csh (or similar, csh is default) then
               normally you will use background=True so that all the jobs run
               simultaneously.
   build_type= *RESOLVE_AND_TEXTAL RESOLVE TEXTAL You can choose to build
               models with RESOLVE and TEXTAL or either one, and how many
               different models to build with RESOLVE. The more you build, the
               more likely to get a complete model. Note that rebuild_in_place
               can only be carried out with RESOLVE model-building
   cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta gamma
   chain_type= *Auto PROTEIN DNA RNA You can specify whether to build protein,
               DNA, or RNA chains. At present you can only build one of these
               in a single run. If you have both DNA and protein, build one
               first, then run AutoBuild again, supplying the prebuilt model
               in the "input_lig_file_list" and build the other. NOTE: default
               for this keyword is Auto, which means "carry out normal process
               to guess this keyword". The process is to look at the sequence
               file and/or input pdb file to see what the chain type is. If
               there are more than one type, the type with the larger number
               of residues is guessed. If you want to force the chain_type,
               then set it to PROTEIN RNA or DNA.
   clean_up= Yes *No True False At the end of the entire run the TEMP
             directories will be removed if clean_up is True. The default is
             No, keep these directories. If you want to remove them after your
             run is finished use a command like "phenix.autobuild run=1
             clean_up=True"
   composition_num_list= 1 Enter number of copies of this component
   coot_name= coot If your version of coot is called something else, then you
              can specify that here.
   debug= Yes *No True False You can have the wizard stop with error messages
          about the code if you use debug. NOTE: you cannot use Pause with
          debug.
   do_anisotropy_correction= *Yes No True False Choose whether you want to
                             apply anisotropy correction
   extra_verbose= Yes *No True False Facts and possible commands will be
                  printed every cycle if Yes
   fixed_ensembleID_list= None Enter the ID (set with ensemble_1.ensembleID or
                          equivalent) of the component that is to be fixed.
                          NOTE 1: Each ensemble in fixed_ensembleID_list must
                          be defined. NOTE 2: you can enter more than one
                          fixed component if you want. If you do, then enter
                          fixed_euler_list in multiples of 3 numbers and also
                          fixed_frac_list in multiples of 3 numbers.
   fixed_euler_list= 0.0 0.0 0.0 Enter Euler angles (from AutoMR or Phaser)
                     for fixed component defined with fixed_ensembleID_list.
                     NOTE 2: you can enter more than one fixed component if
                     you want. If you do, then enter fixed_euler_list in
                     multiples of 3 numbers and also fixed_frac_list in
                     multiples of 3 numbers.
   fixed_frac_list= 0.0 0.0 0.0 Enter fractional offset (location) for fixed
                    component (from AutoMR or Phaser) for fixed component
                    defined with fixed_ensembleID_list. NOTE 2: you can enter
                    more than one fixed component if you want. If you do, then
                    enter fixed_euler_list in multiples of 3 numbers and also
                    fixed_frac_list in multiples of 3 numbers.
   i_ran_seed= 84670 Random seed (positive integer) for model-building and
               simulated annealing refinement
   include_input_model= Yes *No True False The keyword include_input_model
                        defines whether the input model (if any) is to be
                        crossed with models that are derived from it, and the
                        best parts of each kept. Note that if
                        multiple_models=True and include_input_model=True then
                        no initial cycle of randomization will be carried out
                        and the keyword multiple_models_starting_resolution is
                        ignored. In most cases you should use
                        include_input_model=True If you want to generate
                        maximum diversity with multiple-models then you may
                        wish to use include_input_model=False. Also if you
                        want to decrease the amount of bias from your starting
                        model you may wish to use include_input_model=False.
   input_data_file= None Enter the a file with input structure factor data.
                    For structure factor data only (e.g., FP SIGFP) any format
                    is ok. If you have free R flags, phase information or HL
                    coefficients that you want to use then an mtz file is
                    required. If this file contains phase information, this
                    phase information should be experimental (i.e.,
                    MAD/SAD/MIR etc), and should not be density-modified
                    phases (enter any files with density-modified phases as
                    input_map_file instead). If you also specify a hires data
                    file, then FP and SIGFP will come from that data file (and
                    not this one) If an input_refinement_file is specified,
                    then F, Sigma, FreeR_flag (if present) from that file will
                    be used for refinement instead of this one.
   input_label_string= None Choose the set of labels that represent the data
                       and sigma columns for your data. NOTE: Applies to input
                       data file for AutoMR. See also 'input_labels', which
                       applies to input data file for AutoBuild.
   input_labels= None Labels for input data columns NOTE: Applies to input
                 data file for LigandFit and AutoBuild, but not to AutoMR. For
                 AutoMR use instead 'input_label_string'.
   input_pdb_file= None You can enter a PDB file containing a starting model
                   of your structure NOTE: If you enter a PDB file then the
                   AutoBuild wizard will start right in with rebuild steps,
                   skipping the build process. If the model is very poor than
                   it may be better to leave it out as the build process
                   (which includes pattern recognition and recognition of
                   helical and strand fragments) is optimized for improving
                   poor maps, while the rebuild process is optimized for
                   better maps that can be produced by having a partial model.
   input_refinement_file= None Data file to use for refinement. The data in
                          this file should not be corrected for anisotropy. It
                          will be combined with experimental phase information
                          (if any) from input_data_file for refinement. If you
                          leave this blank, then the data in the
                          input_data_file will be used in refinement. If no
                          anisotropy correction is applied to the data you do
                          not need to specify a datafile for refinement. If an
                          anisotropy correction is applied to the data files,
                          then you should enter an uncorrected datafile for
                          refinement. Any standard format is fine; normally
                          only F and sigF will be used. Bijvoet pairs and
                          duplicates will be averaged. If an mtz file is
                          provided then a free R flag can be read in as well.
                          Any HL coeffs and phase information in this file is
                          ignored. NOTE: default for this keyword is Auto,
                          which means "carry out normal process to guess this
                          keyword". This means if you specify "after_autosol"
                          in AutoBuild, AutoBuild will automatically take the
                          value from AutoSol. If you do not want this to
                          happen, you can specify None which means "No file"
   input_refinement_labels= None Labels for input refinement file columns (FP
                            SIGFP FreeR_flag)
   input_seq_file= None Enter name of file with 1-letter code of protein
                   sequence NOTES: 1. lines starting with >>> are
                   ignored and separate chains 2. FASTA format is fine 3. If
                   there are multiple copies of a chain, just enter one copy.
                   4. If you enter a PDB file for rebuilding and it has the
                   sequence you want, then the sequence file is not necessary.
                   NOTE: You can also enter the name of a PDB file that
                   contains SEQRES records, and the sequence from the SEQRES
                   records will be read, written to
                   seq_from_seqres_records.dat, and used as your input
                   sequence. NOTE: for AutoBuild you can specify
                   start_chains_list on the first line of your sequence file:
                   >>> start_chains_list 23 11 5 NOTE: default for
                   this keyword is Auto, which means "carry out normal process
                   to guess this keyword". This means if you specify
                   "after_autosol" in AutoBuild, AutoBuild will automatically
                   take the value from AutoSol. If you do not want this to
                   happen, you can specify None which means "No file"
   input_seq_file_list= None The keyword input_seq_file_list is used in AutoMR
                        to specify the molecular masses of the components of
                        the unit cell using a set of sequence files. Usually
                        you should input the sequences of the actual
                        components of the unit cell here (one sequence file
                        for each component). NOTE: If no input_seq_file is
                        specified, then the sequences from input_seq_file_list
                        are used to create a new file "composite_seq.dat" with
                        all their sequences and this is used as the
                        input_seq_file. NOTE: the format of each file in
                        input_seq_file_list is the 1-letter code of the
                        protein sequence (separate chains with >>>)
   max_wait_time= 100.0 You can specify the length of time (seconds) to wait
                  when testing the run_command. If you have a cluster where
                  jobs do not start right away you may need a longer time to
                  wait.
   min_seq_identity_percent= 50.0 The sequence in your input PDB file will be
                             adjusted to match the sequence in your sequence
                             file (if any). If there are insertions/deletions
                             in your model and the wizard does not seem to
                             identify them, you can split up your PDB file by
                             adding records like this: BREAK You can specify
                             the minimum sequence identity between your
                             sequence file and a segment from your input PDB
                             file to consider the sequences to be matched.
                             Default is 50.0%. You might want a higher number
                             to make sure that deletions in the sequence are
                             noticed.
   n_cycle_build_max= None Maximum number of cycles for iterative
                      model-building, starting from experimental phases
                      without a model. Even if a satisfactory model is not
                      found, a maximum of n_cycle_build_max cycles will be
                      carried out.
   n_cycle_build_min= None Minimum number of cycles for iterative
                      model-building, starting from experimental phases
                      without a model. Even if a satisfactory model is found,
                      n_cycle_build_min cycles will be carried out.
   n_cycle_rebuild_max= None Maximum number of cycles for iterative
                        model-rebuilding, starting from a model. Even if a
                        satisfactory model is not found, a maximum of
                        n_cycle_rebuild_max cycles will be carried out.
   n_cycle_rebuild_min= None Mininum number of cycles for iterative
                        model-rebuilding, starting from a model. Even if a
                        satisfactory model is found, n_cycle_rebuild_min
                        cycles will be carried out.
   nbatch= 1 You can specify the number of processors to use (nproc) and the
           number of batches to divide the data into for parallel jobs.
           Normally you will set nproc to the number of processors available
           and leave nbatch alone. If you leave nbatch as None it will be set
           automatically, with a value depending on the Wizard. This is
           recommended. The value of nbatch can affect the results that you
           get, as the jobs are not split into exact replicates, but are
           rather run with different random numbers. If you want to get the
           same results, keep the same value of nbatch.
   nproc= 1 You can specify the number of processors to use (nproc) and the
          number of batches to divide the data into for parallel jobs.
          Normally you will set nproc to the number of processors available
          and leave nbatch alone. If you leave nbatch as None it will be set
          automatically, with a value depending on the Wizard. This is
          recommended. The value of nbatch can affect the results that you
          get, as the jobs are not split into exact replicates, but are rather
          run with different random numbers. If you want to get the same
          results, keep the same value of nbatch.
   overlap_allowed= None Solutions with no C-alpha clashes will be accepted.
                    If the best packing has some clashes, solutions with that
                    number of clashes will be accepted, as long as this does
                    not exceed the maximum allowed. You can choose to increase
                    the maximum if the packing is tight and your search
                    molecule is not exactly the same as the molecule in the
                    cell. If you leave it blank then Phaser will decide for
                    you.
   rebuild_after_mr= *Yes No True False You can choose to go right on to the
                     AutoBuild wizard with the rebuild-in-place option after
                     running molecular replacement.
   rebuild_in_place= *Auto Yes No True False You can choose to rebuild your
                     model while fixing the sequence alignment by iteratively
                     rebuilding segments within the model. This is done
                     n_rebuild_in_place times, then the models are recombined,
                     taking the best-fitting parts of each. Crossovers allowed
                     where main-chain atom rmsd is less than dist_close. Note
                     that the sequence of the input model must match the
                     supplied sequence closely enough to allow a clear
                     alignment. Also this method does not build any new chain,
                     it just moves the existing model around. Normally this
                     procedure is useful if the model is greater than 95%
                     identical with the target sequence. You can include
                     information directly from the starting model if you want
                     with the keyword include_input_model. Then this model
                     will be recombined with the models that are built based
                     on it. Note that this requires that the input model have
                     a sequence that is identical to the model to be rebuilt.
                     You can also rebuild just a portion of the model with the
                     keywords keywords rebuild_res_start_list 3
                     rebuild_res_end_list 4 rebuild_chain_list chain1 (use " "
                     for blank) The residues from 3 to 4 of chain1 will be
                     rebuilt. You can specify more than one region by using
                     the Parameter Group Options button to add lines NOTE: if
                     a region cannot be rebuilt the original coordinates will
                     be preserved for that region.
   resolution= 0.0 Enter the high-resolution limit for MR search. All the data
               input will be written out regardless of your choice. By
               default, the final rigid-body refinement will use all data.
   resolution_build= 0.0 Enter the high-resolution limit for model-building.
                     If 0.0, the value of resolution is used as a default.
   run_command= csh When you specify nproc=nn, you can run the subprocesses as
                jobs in background with csh (default) or submit them to a
                queue with the command of your choice (i.e., qsub ). If you
                have a multi-processor machine, use csh. If you have a
                cluster, use qsub or the equivalent command for your system.
                NOTE: If you set run_command=qsub (or otherwise submit to a
                batch queue), then you should set background=False, so that
                the batch queue can keep track of your runs. There is no need
                to use background=True in this case because all the runs go as
                controlled by your batch system. If you use run_command=csh
                (or similar, csh is default) then normally you will use
                background=True so that all the jobs run simultaneously.
   selection_criteria_rot= *Percent_of_best Number_of_solutions Z_score All 
                          Choose a criterion for keeping rotation solutions at
                           each stage. The choices are: Percent of Best Score:
                           AutoMR looks down the list of LLG scores and only
                           keeps the ones that differ from the mean by more
                           than the chosen percentage, compared to the top
                           solution. Enter your desired percentage into the
                           entry field (default=75%) Number of Solutions: Keep
                           the N top solutions (you can set N; default=1)
                           Z-score: Keep all the solutions with a Z-score
                           greater than X (you can set X; default=6). All:
                           Keep everything and go on holiday while Phaser
                           crunches through it all (definitely not
                           recommended!)
   selection_criteria_rot_value= 75 Choose a value for your criterion for
                                 keeping rotation solutions at each stage.
                                 Percent of Best Score: AutoMR looks down the
                                 list of LLG scores and only keeps the ones
                                 that differ from the mean by more than the
                                 chosen percentage, compared to the top
                                 solution. Enter your desired percentage into
                                 the entry field (default=75%) Number of
                                 Solutions: Keep the N top solutions (you can
                                 set N; default=1) Z-score: Keep all the
                                 solutions with a Z-score greater than X (you
                                 can set X; default=6). All: Keep everything
                                 and go on holiday while Phaser crunches
                                 through it all (definitely not recommended!)
   semet= Yes *No True False You can specify that the dataset that is used for
          refinement is a selenomethionine dataset, and that the model should
          be the SeMet version of the protein, with all SD of MET replaced
          with Se of MSE.
   sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
   skip_xtriage= Yes *No True False You can bypass xtriage if you want. This
                 will prevent you from applying anisotropy corrections,
                 however.
   start_chains_list= None You can specify the starting residue number for
                      each of the unique chains in your structure. If you use
                      a sequence file then the unique chains are extracted and
                      the order must match the order of your starting residue
                      numbers. For example, if your sequence file has chains A
                      and B (identical) and chains C and D (identical to each
                      other, but different than A and B) then you can enter 2
                      numbers, the starting residues for chains A and C. NOTE:
                      you need to specify an input sequence file for
                      start_chains_list to be applied.
   temp_dir= None Define a temporary directory (it must exist)
   thorough_denmod= *Auto Yes No True False Choose whether you want to go for
                    thorough density modification when no model is used ("No"
                    speeds it up and for a terrible map is sometimes better)
   title= Run 1 AutoMR Mon May 26 12:09:04 2008 Enter any text you like to
          help identify what you did in this run
   top_output_dir= None This is used in subprocess calls of wizards and to
                   tell the Wizard where to look for the STOPWIZARD file.
   two_fofc_in_rebuild= Yes *No True False You can choose to use a
                        sigmaa-weighted 2Fo-Fc map in all cycles of rebuilding
                        instead of a density-modified map. If the model is
                        poor this can sometimes allow model-building in place
                        to work even when it will not for density-modified
                        maps.
   use_all_plausible_sg= Yes *No True False Normally you will want to search
                         all space groups with the same point group as you may
                         not know which is correct from your data. You can
                         select which of these to choose using 'Choose
                         variable to set' and selecting
                         'all_plausible_sg_list'
   verbose= Yes *No True False Command files and other verbose output will be
            printed
   weight_list= 0.0 Molecular weight of component (Da; e.g. 30000)
   weight_seq_list= None Choose whether to define composition through
                    molecular weight or sequence
   ensemble_1
      ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
      copies_to_find= None Number of copies of this ensemble to find in a.u.
                      (Command-line only)
      coords= None model(s) for this ensemble. (Command-line only)
      identity= None percent identity(ies) of model(s) in this ensemble to
                structure (alternative is RMS). (Command-line only)
      RMS= None RMSD(s) of model(s) to structure (alternative is identity).
           (Command-line only)
   ensemble_2
      ensembleID= ensemble_2 ID for this ensemble. (Command-line only)
      copies_to_find= None Number of copies of this ensemble to find in a.u.
                      (Command-line only)
      coords= None model(s) for this ensemble. (Command-line only)
      identity= None percent identity(ies) of model(s) in this ensemble to
                structure (alternative is RMS). (Command-line only)
      RMS= None RMSD(s) of model(s) to structure (alternative is identity).
           (Command-line only)
   ensemble_3
      ensembleID= ensemble_3 ID for this ensemble. (Command-line only)
      copies_to_find= None Number of copies of this ensemble to find in a.u.
                      (Command-line only)
      coords= None model(s) for this ensemble. (Command-line only)
      identity= None percent identity(ies) of model(s) in this ensemble to
                structure (alternative is RMS). (Command-line only)
      RMS= None RMSD(s) of model(s) to structure (alternative is identity).
           (Command-line only)
   ensemble_4
      ensembleID= ensemble_4 ID for this ensemble. (Command-line only)
      copies_to_find= None Number of copies of this ensemble to find in a.u.
                      (Command-line only)
      coords= None model(s) for this ensemble. (Command-line only)
      identity= None percent identity(ies) of model(s) in this ensemble to
                structure (alternative is RMS). (Command-line only)
      RMS= None RMSD(s) of model(s) to structure (alternative is identity).
           (Command-line only)
   ensemble_5
      ensembleID= ensemble_5 ID for this ensemble. (Command-line only)
      copies_to_find= None Number of copies of this ensemble to find in a.u.
                      (Command-line only)
      coords= None model(s) for this ensemble. (Command-line only)
      identity= None percent identity(ies) of model(s) in this ensemble to
                structure (alternative is RMS). (Command-line only)
      RMS= None RMSD(s) of model(s) to structure (alternative is identity).
           (Command-line only)
   component_1
      seq_file= None protein seq_file for this component. (Command-line only)
      component_type= *protein nucleic_acid protein or nucleic acid.
                      (Command-line only)
      mass= None molecular mass (kDa) of this component. (Command-line only)
      component_copies= None Number of copies of this component in the a.u.
                        (required). (Command-ine only)
   component_2
      seq_file= None protein seq_file for this component. (Command-line only)
      component_type= *protein nucleic_acid protein or nucleic acid.
                      (Command-line only)
      mass= None molecular mass (kDa) of this component. (Command-line only)
      component_copies= None Number of copies of this component in the a.u.
                        (required). (Command-ine only)
   component_3
      seq_file= None protein seq_file for this component. (Command-line only)
      component_type= *protein nucleic_acid protein or nucleic acid.
                      (Command-line only)
      mass= None molecular mass (kDa) of this component. (Command-line only)
      component_copies= None Number of copies of this component in the a.u.
                        (required). (Command-ine only)
   component_4
      seq_file= None protein seq_file for this component. (Command-line only)
      component_type= *protein nucleic_acid protein or nucleic acid.
                      (Command-line only)
      mass= None molecular mass (kDa) of this component. (Command-line only)
      component_copies= None Number of copies of this component in the a.u.
                        (required). (Command-ine only)
   component_5
      seq_file= None protein seq_file for this component. (Command-line only)
      component_type= *protein nucleic_acid protein or nucleic acid.
                      (Command-line only)
      mass= None molecular mass (kDa) of this component. (Command-line only)
      component_copies= None Number of copies of this component in the a.u.
                        (required). (Command-ine only)