phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated molecular replacement with AutoMR

Author(s)
Purpose
Purpose of the AutoMR Wizard
Usage
Summary of inputs and outputs for AutoMR
Output files from AutoMR
How to run the AutoMR Wizard
Components, copies, search models, and ensembles
What the AutoMR wizard needs to run
Running from a parameters file
Specifying which columns of data to use from input data files
Examples
Standard AutoMR run with coords.pdb native.sca
Specifying data columns
Specifying a refinement file for AutoBuild
AutoMR searching for 2 components
Specifying molecular masses of 2 components
AutoMR searching for 2 components, but specifying the orientation of one of them
Possible Problems
Specific limitations and problems
Literature
Additional information
List of all AutoMR keywords

Author(s)

  • Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
  • AutoMR Wizard: Tom Terwilliger, Laurent Storoni, Randy Read, and Airlie McCoy
  • PHENIX GUI: Nathaniel Echols
  • phenix.xtriage: Peter Zwart

Purpose

Purpose of the AutoMR Wizard

The AutoMR Wizard provides a convenient interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding.

The AutoMR Wizard begins with datafiles with structure factor amplitudes and uncertainties, a search model or models, and identifies placements of the search models that are compatible with the data.

Usage

The AutoMR Wizard can be run from the PHENIX GUI, from the command-line, and from parameters files. All three versions are identical except in the way that they take commands from the user. See Using the PHENIX Wizards for details of how to run a Wizard. The command-line version will be described here.

NOTE: You may find it easiest to run the GUI version of AutoMR when you are learning how to use it, and then to move to the command-line or script versions later, as the GUI version will take you through all the necessary steps of organizing your data.

Summary of inputs and outputs for AutoMR

Input data file. This file can be in most any format, and must contain either amplitudes or intensities and sigmas. You can specify what resolution to use for molecular replacement and separately what resolution to use for model rebuilding. If you specify "0.0" for resolution (recommended) then defaults will be used for molecular replacement (i.e. use data to 2.5A if available to solve structure, then carry out rigid body refinement of final solution with all data) and all the data will be used for model rebuilding.

Composition of the asymmetric unit. PHASER needs to know what the total mass in the asymmetric unit is (i.e. not just the mass of the search models). You can define this either by specifying one or more protein or nucleic acid sequence files, or by specifying protein or nucleic acid molecular masses, and telling the Wizard how many copies of each are present.

Space groups to search. You can request that all space groups with the same point group as the one you start out with be searched, and the best one be chosen. If you select this option then the best space group will be used for model rebuilding in AutoBuild.

Ensembles to search for. AutoMR builds up a model by finding a set of good positions and orientations of one "ensemble", and then using each of those placements as starting points for finding the next ensemble, until all the contents of the asymmetric unit are found and a consistent solution is obtained. You can specify any number of different ensembles to search for, and you can search for any number of copies of each ensemble. The order of searching for ensembles makes a difference, but Phaser chooses a sensible default search order based on the size and assumed accuracy of the different ensembles. In difficult cases you could try permuting the search order.

Each ensemble can be specified by a single PDB file or a set of PDB files. The contents of one set of PDB files for an ensemble must all be oriented in the same way, as they will be put together and used as a group always in the molecular replacement process.

You will need to specify how similar you think each input PDB file that is part of an ensemble is to the structure that is in your crystal. You can specify either sequence identity, or expected rmsd. Note that if you use a homology model, you should give the sequence identity of the template from which the model was constructed, not the 100% identity of the model!

Output of AutoMR

Output files from AutoMR

When you run AutoMR the output files will be in a subdirectory with your run number:

AutoMR_run_1_/   # subdirectory with results

  • A summary file listing the results of the run and the other files produced:
    AutoMR_summary.dat  # overall summary
    

  • A warnings file listing any warnings about the run
    AutoMR_warnings.dat  # any warnings
    

  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    AutoMR_Facts.dat   # all Facts about the run
    

  • Molecular replacement model, structure factors, and map coefficients:
    MR.1.pdb
    MR.1.mtz
    
    The AutoMR wizard writes out MR.1.pdb and MR.1.mtz as well as output log files. The MR.1.pdb file will contain all the components of your MR solution. If there are multiple PDB files in an ensemble, the model with the lowest estimated rmsd is chosen to represent the whole ensemble and is written to MR.1.pdb. If there are multiple copies of a model, the chains are lettered sequentially A B C... The MR.1.mtz file contains the data from your input file to the full resolution available, as well as sigmaA-weighted 2Fo-Fc map coefficients based on the rigid-body-refined model.

Model rebuilding. After PHASER molecular replacement the AutoMR Wizard loads the AutoBuild Wizard and sets the defaults based on the MR solution that has just been found. You can use the default values, or you may choose to use 2Fo-Fc maps instead of density-modified maps for rebuilding, or you may choose to start the model-rebuilding with the map coefficients from MR.1.mtz.

How to run the AutoMR Wizard

Running the AutoMR Wizard is easy. For example, from the command-line you can type:

phenix.automr native.sca search.pdb RMS=0.8 mass=23000 copies=1

The AutoMR Wizard will find the best location and orientation of the search model search.pdb in the unit cell based on the data in native.sca, assuming that the RMSD between the correct model and search.pdb is about 0.8 A, that the molecular mass of the true model is 23000 and that there is 1 copy of this model in the asymmetric unit. Once the AutoMR Wizard has found a solution, it will automatically call the AutoBuild Wizard and rebuild the model.

Components, copies, search models, and ensembles

  • Your structure is composed of one or more components such as a 20Kd subunit with sequence seq-of-20Kd-subunit.

  • There may be one or more copies of each component in your structure.

  • You can search for the location(s) of a component with a search model that consists of a single structure or an ensemble of structures.

What the AutoMR wizard needs to run

In a simple case where you have one search model and are looking for N copies of this model in your structure, you need:

  • (1) a datafile name (native.sca or data=native.sca)

  • (2) a search model (search_model.pdb or coords=search_model.pdb)

  • (3) how similar the search model is to your structure ( RMS=0.8 or identity=75)

  • (4) information about the contents of the asymmetric unit: (mass=23000 or seq_file=seq.dat) and (copies=1)

It may be advantageous to search using an ensemble of similar structures, rather than a single structure. If you have an ensemble of search models to search for, then specify it as

coords="model_1.pdb" coords="model_2.pdb" coords="model_3.pdb"

In this case you need to give the RMS or identity for each model: identity='45 40 35'. Each of the models in the ensemble must be in the same orientation as the others, so that the ensemble of models can be placed as a group in the unit cell. You may also use phenix.ensembler to generate a single multi-model PDB file containing the entire ensemble. In this case you should specify a single overall RMS or identity for the ensemble.

Running from a parameters file

You can run phenix.automr from a parameters file. This is often convenient because you can generate a default one with:

phenix.automr --show_defaults > my_automr.eff
and then you can just edit this file to match your needs and run it with:
phenix.automr  my_automr.eff

If you are searching for more than one ensemble, or if there is more than one component in the a.u., then use a parameters file and specify them like this (put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = None
    RMS = "0.85"
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 1
    coords = mol2.pdb
    identity = None
    RMS = "0.90"
  }
  component {
    seq_file = "seq1.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
  component {
    seq_file = "seq2.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
}

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file data.mtz has columns F SIGF then you might specify

data=data.mtz
input_label_string="F SIGF"

You can find out all the possible label strings in a data file that you might use by typing:

phenix.autosol display_labels=data.mtz  # display all labels for data.mtz

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Using the PHENIX Wizards for how to do this. Some of the most common parameters are:

data=w1.sca       # data file
model=coords.pdb  # starting model
seq_file=seq.dat  # sequence file

Examples

Standard AutoMR run with coords.pdb native.sca

Run AutoMR using coords.pdb as search model, native.sca as data, assume RMS between coords.pdb and true model is about 0.85 A, the sequence of true model is seq.dat and there is 1 copy in the asymmetric unit:

phenix.automr coords.pdb native.sca RMS=0.85 seq.dat copies=1  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2

Specifying data columns

Run AutoMR as above, but specify the data columns explicitly:

phenix.automr coords.pdb RMS=0.85 seq.dat copies=1  \
    data=data.mtz input_label_string="F SIGF"  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2 
Note that the data columns are specified by a string that includes both F and SIGF : "F SIGF". The string must match some set of data labels that can be extracted automatically from your data file. You can find the possible values of this string as described above with
phenix.automr display_labels=data.mtz

Specifying a refinement file for AutoBuild

Run AutoMR as above, but specify a refinement file that is different from the file used for the MR search:

phenix.automr coords.pdb RMS=0.85 seq.dat copies=1  \
    data=data.mtz input_label_string="F SIGF"  \
    input_refinement_file=refinement.mtz \
    input_refinement_labels="FP SIGFP FreeR_flag"  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2 
Note that the commands input_refinement_file and input_refinement_labels are in the scope "autobuild_variables" . These commands and others with this prefix are passed on to AutoBuild.

AutoMR searching for 2 components

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. This is all done by creating a parameters file with all the control information in it. Put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  data = "w1.sca"
  build = False
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = None
    RMS = "0.85"
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 1
    coords = mol2.pdb
    identity = None
    RMS = "0.90"
  }
  component {
    seq_file = "seq1.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
  component {
    seq_file = "seq2.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
}

Specifying molecular masses of 2 components

Run AutoMR as in the previous example, except specify the components of the asymmetric unit with molecular masses (30000 and 20000), and define the search models with PDB files and percent sequence identity with the true structures (50% and 60%). This is again all done by creating a parameters file with all the control information in it. Put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  data = "w1.sca"
  seq_file = seq.dat
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = 50
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 1
    coords = mol2.pdb
    identity = 60
  }
  component {
    component_type = *protein nucleic_acid
    mass = 30000
    component_copies = 1
  }
  component {
    component_type = *protein nucleic_acid
    mass = 40000
    component_copies = 1
  }
  autobuild_variables{
    n_cycle_rebuild_max = 1
  }
}

AutoMR searching for 2 components, but specifying the orientation of one of them

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. Define the orientation and position of one component. Define the number of copies to find for each component (0 for beta, which is fixed, 1 for blip). This is again all done by creating a parameters file with all the control information in it. Put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  data = "w1.sca"
  seq_file = seq.dat
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = 50
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 0
    coords = mol2.pdb
    identity = 60
  }
  component {
    component_type = *protein nucleic_acid
    mass = 30000
    component_copies = 1
  }
  component {
    component_type = *protein nucleic_acid
    mass = 40000
    component_copies = 1
  }
  autobuild_variables{
    n_cycle_rebuild_max = 1
  }
 fixed_ensembles {
 fixed_ensembleID_list="mol2" 
 fixed_euler_list = 199.84 41.535 184.15
 fixed_frac_list = -0.49736 -0.15895 -0.28067
 }
}
Note: you have to define an ensemble for the fixed molecule (mol2 in this example) and that you search for 0 copies of this molecule.

Possible Problems

Specific limitations and problems

  • The AutoBuild Wizard can build PROTEIN, RNA, or DNA, but it can only build one at a time. If your MR model contains more than one type of chain, then you will need to run AutoBuild separately from AutoMR and when you run AutoBuild, specify one of them with input_lig_file_list and the type of chain to build with chain_type:

     
    input_lig_file_list=ProteinPartofMRmodel.pdb
    chain_type=DNA
    

  • The keywords "cell" and "sg" have been replaced with "unit_cell" and "space_group" to make the keywords the same as in other phenix applications.

  • The syntax for searches with more than one ensemble and more than one component have changed in PHENIX version 1.4.

  • If you use an ensemble as a search model, the output structure will contain just the first member of the ensemble, so you may wish to put the member that is likely to be the most similar to the true structure as the first one in your ensemble.

  • If you run AutoMR from the GUI and continue on to AutoBuild, and then select "Start run over (delete everything for this run)" it will delete your AutoBuild and your AutoMR run and start your AutoMR run all over.

  • AutoMR no longer is able to pass arbitrary commands to AutoBuild (this was discontinued to allow for a fixed set of inputs for AutoBuild).

  • The AutoMR Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Phaser crystallographic software. A. J. McCoy, R. W. Grosse-Kunstleve, P. D. Adams, M. D. Winn, L. C. Storoni and R. J. Read J. Appl. Cryst. 40, 658-674 (2007)
[pdf]
Likelihood-enhanced fast translation functions. A.J. McCoy, R.W. Grosse-Kunstleve, L.C. Storoni & R.J. Read Acta Cryst. D61, 458-464 (2005)
[pdf]
Likelihood-enhanced fast rotation functions. L.C. Storoni, A.J. McCoy and R.J. Read. Acta Cryst. D60, 432-438 (2004)
[pdf]

Additional information

List of all AutoMR keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
automr
   build= True Run AutoBuild immediately after AutoMR
   data= None Datafile (any standard format) Structure factor amplitudes will
         be taken from this file.
   copies= None Set both copies_to_find and component_copies with copies. This
           is the number of copies of this search model to find, and also the
           number of copies of this sequence or mass in the asymmetric unit.
   ensembleID= None Optional ID for ensemble
   copies_to_find= None Number of copies of this ensemble to find in a.u.
   coords= None model(s) for this ensemble.
   identity= None Percent identity(ies) of model(s) in this ensemble to
             structure (alternative is RMS). Should be a space- or
             comma-separated list of numbers between 0 and 100.
   RMS= None RMSD(s) of model(s) to structure (alternative is identity).
        Should be a space- or comma-separated list of numbers, typically
        between 0.8 and 2.0, but occasionally higher.
   seq_file= None protein seq_file for this component.
   component_type= *protein nucleic_acid protein or nucleic acid.
   mass= None molecular mass (Da) of this component.
   component_copies= None Number of copies of this component in the a.u.
                     (required). (Command-ine only)
   crystal_info
      unit_cell= None Enter cell parameter a b c alpha beta gamma
      chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
                  protein, DNA, or RNA chains. At present you can only build
                  one of these in a single run. If you have both DNA and
                  protein, build one first, then run AutoBuild again,
                  supplying the prebuilt model in the
                  "input_lig_file_list" and build the other. NOTE:
                  default for this keyword is Auto, which means " carry
                  out normal process to guess this keyword". The process
                  is to look at the sequence file and/or input pdb file to see
                  what the chain type is. If there are more than one type, the
                  type with the larger number of residues is guessed. If you
                  want to force the chain_type, then set it to PROTEIN RNA or
                  DNA.
      resolution= 0 Enter the high-resolution limit for MR search. All the
                  data input will be written out regardless of your choice. By
                  default, the final rigid-body refinement will use all data.
      space_group= None Space Group symbol (i.e., C2221 or C 2 2 21)
   input_files
      input_label_string= None Choose the set of labels that represent the F
                          and sigma columns for your data. NOTE: Applies to
                          input data file for AutoMR. See also 'input_labels',
                          which applies to input data file for AutoBuild.
   ensemble An ensemble is one or more models, already superimposed, to be
            used as a search model. You need to specify the coordinates (list
            of PDB files), how many copies to find, and either the percent
            identity of the search model(s) to the actual structure OR
            estimates of the RMS coordinate differences.
      ensembleID= "ensemble_1" ID for this ensemble.
      copies_to_find= None Number of copies of this ensemble to find in a.u.
      coords= None model(s) for this ensemble.
      identity= None percent identity(ies) of model(s) in this ensemble to
                structure (alternative is RMS).
      RMS= None RMSD(s) of model(s) to structure (alternative is identity).
      added_ensemble= False Used internally to flag if this ensemble was added
                      automatically
   component The components are the contents of the asymmetric unit. They help
             Phaser figure out how much scattering is there. You need to
             specify for each component (1) whether it is protein or nucleic
             acid, (2) the molecular mass of the component OR the sequence,
             and (3) how many of them there are,
      componentID= None ID for this component.
      seq_file= None protein seq_file for this component.
      component_type= *protein nucleic_acid protein or nucleic acid.
      mass= None molecular mass (Da) of this component.
      component_copies= None Number of copies of this component in the a.u.
                        (required). (Command-ine only)
      added_component= False Used internally to flag if component was added
                       automatically
   decision_making
      min_seq_identity_percent= 50 The sequence in your input PDB file will be
                                adjusted to match the sequence in your
                                sequence file (if any). If there are
                                insertions/deletions in your model and the
                                wizard does not seem to identify them, you can
                                split up your PDB file by adding records like
                                this: BREAK You can specify the minimum
                                sequence identity between your sequence file
                                and a segment from your input PDB file to
                                consider the sequences to be matched. Default
                                is 50.0%. You might want a higher number to
                                make sure that deletions in the sequence are
                                noticed.
      overlap_allowed= None Solutions will be accepted by default if fewer
                       than 5 percent of residues are involved in clashes. You
                       can choose to increase the percent clashes if the
                       packing is tight and your search molecule is not
                       exactly the same as the molecule in the cell.
      selection_criteria_rot= *Percent_of_best Number_of_solutions Z_score All
                              Choose a criterion for keeping rotation
                              solutions at each stage. The choices are:
                              Percent of Best Score: AutoMR looks down the
                              list of LLG scores and only keeps the ones that
                              differ from the mean by more than the chosen
                              percentage, compared to the top solution. Number
                              of Solutions: Keep the N top solutions (you can
                              set N; default=1) Z-score: Keep all the
                              solutions with a Z-score greater than X (you can
                              set X; default=6). All: Keep everything and go
                              on holiday while Phaser crunches through it all
                              (definitely not recommended!)
      selection_criteria_rot_value= 75 Choose a value for your criterion for
                                    keeping rotation solutions at each stage.
                                    Percent of Best Score: AutoMR looks down
                                    the list of LLG scores and only keeps the
                                    ones that differ from the mean by more
                                    than the chosen percentage, compared to
                                    the top solution. Enter your desired
                                    percentage into the entry field
                                    (default=75%) Number of Solutions: Keep
                                    the N top solutions (you can set N;
                                    default=1) Z-score: Keep all the solutions
                                    with a Z-score greater than X (you can set
                                    X; default=6). All: Keep everything and go
                                    on holiday while Phaser crunches through
                                    it all (definitely not recommended!)
      fast_search_mode= None Run phaser with selection_criteria_rot_value and
                        then if no obvious solution, repeat with cutoff
                        lowered by search_down_percent. If None, then use
                        Phaser default.
      search_down_percent= 25 Used if fast_search_mode=True. Run phaser with
                           selection_criteria_rot_value and then if no obvious
                           solution, repeat with cutoff lowered by
                           search_down_percent
      do_anisotropy_correction= True Choose whether you want to apply
                                anisotropy correction
      all_plausible_sg_list= None Choose which space groups are plausible
      use_all_plausible_sg= False Often you will want to search all space
                            groups with the same point group as you may not
                            know which is correct from your data. Choose
                            use_all_plausible_sg=True to do this. NOTE: You
                            can also select which of space groups to consider
                            using all_plausible_sg_list'
      check_inverse_hand= True Normally you do not know the hand of your space
                          group Choosing check_inverse_hand (default) will
                          tell Phaser to try both the space group you specify
                          and its inverse NOTE: compare with
                          use_all_plausible_sg=True which further expands the
                          search to all space groups with the same point-group
      permute_search_order= False You can ask Phaser to try all permutations
                            of the order in which ensembles are searched
      number_of_output_models= 1 Number of solutions to output
      disable_check= False You can disable the consistency check for ensemble
                     mass
   fixed_ensembles If you already know the placement of one or more molecules
                   you can specify them as fixed ensembles.
      fixed_ensembleID_list= None Enter the ID (set with ensemble_1.ensembleID
                             or equivalent) of the component that is to be
                             fixed. NOTE 1: Each ensemble in
                             fixed_ensembleID_list must be defined. NOTE 2:
                             you can enter more than one fixed component if
                             you want. If you do, then enter fixed_euler_list
                             in multiples of 3 numbers and also
                             fixed_frac_list in multiples of 3 numbers. As a
                             short-cut if you just want a fixed model you can
                             skip all this and just use fixed_model=xxxx.pdb
      fixed_euler_list= 0.0 0.0 0.0 Enter Euler angles (from AutoMR or Phaser)
                        for fixed component defined with
                        fixed_ensembleID_list. NOTE 2: you can enter more than
                        one fixed component if you want. If you do, then enter
                        fixed_euler_list in multiples of 3 numbers and also
                        fixed_frac_list in multiples of 3 numbers.
      fixed_frac_list= 0.0 0.0 0.0 Enter fractional offset (location) for
                       fixed component (from AutoMR or Phaser) for fixed
                       component defined with fixed_ensembleID_list. NOTE 2:
                       you can enter more than one fixed component if you
                       want. If you do, then enter fixed_euler_list in
                       multiples of 3 numbers and also fixed_frac_list in
                       multiples of 3 numbers.
      fixed_frac_list_is_fractional= True Normally fixed_frac_list is
                                     fractional coordinates. You can say
                                     fixed_frac_list_is_fractional=False to
                                     instead use orthogonal angstroms to
                                     specify the locations of your ensembles.
   model_building If you specify "build=True" then AutoBuild will be
                  run right after molecular replacement. You may wish to set
                  "rebuild_in_place" to True or False if you do not
                  wish for this to be chosen automatically (Rebuild-in-place
                  will not add or delete residues, if you set it to False it
                  will try to rebuild your model from scratch.
      build_type= *RESOLVE RESOLVE_AND_BUCCANEER You can choose to build
                  models with RESOLVE or RESOLVE and BUCCANEER and how many
                  different models to build with RESOLVE. The more you build,
                  the more likely to get a complete model. Note that
                  rebuild_in_place can only be carried out with RESOLVE
                  model-building. For BUCCANEER model building you need CCP4
                  version 6.1.2 or higher and BUCCANEER version 1.3.0 or
                  higher
      resolution_build= 0 Enter the high-resolution limit for model-building.
                        If 0.0, the value of resolution is used as a default.
      semet= False You can specify that the dataset that is used for
             refinement is a selenomethionine dataset, and that the model
             should be the SeMet version of the protein, with all SD of MET
             replaced with Se of MSE.
   autobuild_variables
      two_fofc_in_rebuild= None Actively sets two_fofc_in_rebuild in
                           AutoBuild. NOTE: value is not checked
      include_input_model= None Actively sets include_input_model in
                           AutoBuild. NOTE: value is not checked
      n_cycle_rebuild_min= None Actively sets n_cycle_rebuild_min in
                           AutoBuild. NOTE: value is not checked
      n_cycle_rebuild_max= None Actively sets n_cycle_rebuild_max in
                           AutoBuild. NOTE: value is not checked
      n_cycle_build_min= None Actively sets n_cycle_build_min in AutoBuild.
                         NOTE: value is not checked
      n_cycle_build_max= None Actively sets n_cycle_build_max in AutoBuild.
                         NOTE: value is not checked
      rebuild_in_place= None Actively sets rebuild_in_place in AutoBuild.
                        NOTE: value is not checked
      thorough_denmod= None Actively sets thorough_denmod in AutoBuild. NOTE:
                       value is not checked
      start_chains_list= None Actively sets start_chains_list in AutoBuild.
                         NOTE: value is not checked
      input_refinement_file= None Actively sets input_refinement_file in
                             AutoBuild. NOTE: value is not checked
      input_refinement_labels= None Actively sets input_refinement_labels in
                               AutoBuild. NOTE: value is not checked
      input_labels= None Actively sets input_labels in AutoBuild. NOTE: value
                    is not checked
      resolve_command_list= None Actively sets resolve_command_list in
                            AutoBuild. NOTE: value is not checked
      resolve_pattern_command_list= None Actively sets
                                    resolve_pattern_command_list in AutoBuild.
                                    NOTE: value is not checked
      morph= None Actively sets morph in AutoBuild. NOTE: value is not checked
      morph_rad= None Actively sets morph_rad in AutoBuild. NOTE: value is not
                 checked
   general
      nbatch= 1 You can specify the number of processors to use (nproc) and
              the number of batches to divide the data into for parallel jobs.
              Normally you will set nproc to the number of processors
              available and leave nbatch alone. If you leave nbatch as None it
              will be set automatically, with a value depending on the Wizard.
              This is recommended. The value of nbatch can affect the results
              that you get, as the jobs are not split into exact replicates,
              but are rather run with different random numbers. If you want to
              get the same results, keep the same value of nbatch.
      nproc= 1 You can specify the number of processors to use (nproc) and the
             number of batches to divide the data into for parallel jobs.
             Normally you will set nproc to the number of processors available
             and leave nbatch alone. If you leave nbatch as None it will be
             set automatically, with a value depending on the Wizard. This is
             recommended. The value of nbatch can affect the results that you
             get, as the jobs are not split into exact replicates, but are
             rather run with different random numbers. If you want to get the
             same results, keep the same value of nbatch.
      keep_files= overall_best*.pdb overall_best*.mtz List of files that are
                  not to be cleaned up. wildcards permitted
      coot_name= "coot" If your version of coot is called something else, then
                 you can specify that here.
      i_ran_seed= 72432 Random seed (positive integer) for model-building and
                  simulated annealing refinement
      raise_sorry= False You can have any failure end with a Sorry instead of
                   simply printout to the screen
      background= True When you specify nproc=nn, you can run the jobs in
                  background (default if nproc is greater than 1) or
                  foreground (default if nproc=1). If you set run_command=qsub
                  (or otherwise submit to a batch queue), then you should set
                  background=False, so that the batch queue can keep track of
                  your runs. There is no need to use background=True in this
                  case because all the runs go as controlled by your batch
                  system. If you use run_command='sh ' (or similar, sh is
                  default) then normally you will use background=True so that
                  all the jobs run simultaneously.
      check_wait_time= 1.0 You can specify the length of time (seconds) to
                       wait between checking for subprocesses to end
      max_wait_time= 1.0 You can specify the length of time (seconds) to wait
                     when looking for a file. If you have a cluster where jobs
                     do not start right away you may need a longer time to
                     wait. The symptom of too short a wait time is 'File not
                     found'
      wait_between_submit_time= 1.0 You can specify the length of time
                                (seconds) to wait between each job that is
                                submitted when running sub-processes. This can
                                be helpful on NFS-mounted systems when running
                                with multiple processors to avoid file
                                conflicts. The symptom of too short a
                                wait_between_submit_time is File exists:....
      cache_resolve_libs= True Use caching of resolve libraries to speed up
                          resolve
      resolve_size= 12 Size for solve/resolve
                    ("","_giant",
                    "_huge","_extra_huge" or a number
                    where 12=giant 18=huge
      check_run_command= False You can have the wizard check your run command
                         at startup
      run_command= "sh " When you specify nproc=nn, you can run the
                   subprocesses as jobs in background with sh (default) or
                   submit them to a queue with the command of your choice
                   (i.e., qsub ). If you have a multi-processor machine, use
                   sh. If you have a cluster, use qsub or the equivalent
                   command for your system. NOTE: If you set run_command=qsub
                   (or otherwise submit to a batch queue), then you should set
                   background=False, so that the batch queue can keep track of
                   your runs. There is no need to use background=True in this
                   case because all the runs go as controlled by your batch
                   system. If nproc is greater than 1 and you use
                   run_command='sh '(or similar, sh is default) then normally
                   you will use background=True so that all the jobs run
                   simultaneously.
      queue_commands= None You can add any commands that need to be run for
                      your queueing system. These are written before any other
                      commands in the file that is submitted to your queueing
                      system. For example on a PBS system you might say:
                      queue_commands='#PBS -N mr_rosetta' queue_commands='#PBS
                      -j oe' queue_commands='#PBS -l walltime=03:00:00'
                      queue_commands='#PBS -l nodes=1:ppn=1' NOTE: you can put
                      in the characters '' in any queue_commands line and this
                      will be replaced by a string of characters based on the
                      path to the run directory. The first character and last
                      two characters of each part of the path will be
                      included, separated by '_',up to 15 characters. For
                      example
                      'test_autobuild/WORK_5/AutoBuild_run_1_/TEMP0/RUN_1'
                      would be represented by: 'tld_W_5_A1__TP0_1'
      condor_universe= vanilla The universe for condor is usually vanilla.
                       However you might need to set it to local for your
                       cluster
      add_double_quotes_in_condor= True You might need to turn on or off
                                   double quotes in condor job submission
                                   scripts. These are already default
                                   elsewhere but may interfere with condor
                                   paths.
      condor= None Specifies if the group_run_command is submitting a job to a
              condor cluster. Set by default to True if
              group_run_command=condor_submit, otherwise False. For condor job
              submission mr_rosetta uses a customized script with condor
              commands. Also uses one_subprocess_level=True
      last_process_is_local= True If true, run the last process in a group in
                             background with sh as part of the job that is
                             submitting jobs. This prevents having the job
                             that is submitting jobs sit and wait for all the
                             others while doing nothing
      skip_r_factor= False You can skip R-factor calculation if refinement is
                     not done and maps_only=True
      skip_xtriage= False You can bypass xtriage if you want. This will
                    prevent you from applying anisotropy corrections, however.
      base_path= None You can specify the base path for files (default is
                 current working directory)
      temp_dir= None Define a temporary directory (it must exist)
      clean_up= None At the end of the entire run the TEMP directories will be
                removed if clean_up is True. Files listed in keep_files will
                not be deleted. If you want to remove files after your run is
                finished use a command like "phenix.autobuild run=1
                clean_up=True"
      print_citations= True Print citations at end of run
      solution_output_pickle_file= None At end of run, write solutions to this
                                   file in output directory if defined
      title= None Enter any text you like to help identify what you did in
             this run
      top_output_dir= None This is used in subprocess calls of wizards and to
                      tell the Wizard where to look for the STOPWIZARD file.
      wizard_directory_number= None This is used by the GUI to define the run
                               number for Wizards. It is the same as
                               desired_run_number NOTE: this value can only be
                               specified on the command line, as the directory
                               number is set before parameters files are read.
      verbose= False Command files and other verbose output will be printed
      extra_verbose= False Facts and possible commands will be printed every
                     cycle if True
      debug= False You can have the wizard stop with error messages about the
             code if you use debug. Additionally the output goes to the
             terminal if you specify "debug=True"
      require_nonzero= True Require non-zero values in data columns to
                       consider reading in.
      remove_path_word_list= None List of words identifying paths to remove
                             from PATH These can be used to shorten your PATH.
                             For example... cns ccp4 coot would remove all
                             paths containing these words except those also
                             containing phenix. Capitalization is ignored.
      fill= False Fill in all missing reflections to resolution res_fill.
            Applies to density modified maps. See also filled_2fofc_maps in
            autobuild.
      res_fill= None Resolution for filling in missing data (default = highest
                resolution of any datafile). Only applies to density modified
                maps. Default is fill to high resolution of data. Ignored if
                fill=False
      check_only= False Just read in and check initial parameters. Not for
                  general use
   special_keywords
      write_run_directory_to_file= None Writes the full name of a run
                                   directory to the specified file. This can
                                   be used as a call-back to tell a script
                                   where the output is going to go.
   run_control
      coot= None Not presently applicable to automr
      ignore_blanks= None ignore_blanks allows you to have a command-line
                     keyword with a blank value like
                     "input_lig_file_list="
      stop= None You can stop the current wizard with "stopwizard"
            or "stop". If you type "phenix.autobuild run=3
            stop" then this will stop run 3 of autobuild.
      display_facts= None Set display_facts to True and optionally
                     run=[run-number] to display the facts for run run-number.
                     If you just say display_facts then the facts for the
                     highest-numbered existing run will be shown.
      display_summary= None Set display_summary to True and optionally
                       run=[run-number] to show the summary for run
                       run-number. If you just say display_summary then the
                       summary for the highest-numbered existing run will be
                       shown.
      carry_on= None Set carry_on to True to carry on with highest-numbered
                run from where you left off.
      run= None Set run to n to continue with run n where you left off.
      copy_run= None Set copy_run to n to copy run n to a new run and continue
                where you left off.
      display_runs= None List all runs for this wizard.
      delete_runs= None List runs to delete: 1 2 3-5 9:12
      display_labels= None display_labels=test.mtz will list all the labels
                      that identify data in test.mtz. You can use the label
                      strings like this: input_label_string="FP SIGFP
                      " # each individual label counts
      dry_run= False Just read in and check parameter names
      params_only= False Just read in and return parameter defaults
      display_all= False Just read in and display parameter defaults
   non_user_parameters These are obsolete parameters and parameters that the
                       wizards use to communicate among themselves. Not
                       normally for general use.
      gui_output_dir= None Used only by the GUI
      sg= None Obsolete. Use space_group instead
      composition_num_list= 1 Number of copies of this component. Not for
                            general use.
      weight_list= 0.0 Molecular weight of component (Da; e.g. 30000) . Not
                   for general use.
      weight_seq_list= Prot_seq_file Choose whether to define composition
                       through molecular weight or sequence. Choices are
                       "MW_protein","MW_nucleic",
                       "Prot_seq_file","Nucl_seq_file" Not
                       for general use
      input_data_file= None Not normally used. Use "data=" instead
      input_pdb_file= None Not normally used. Use "coords=" instead
      input_seq_file= None Not normally used. Use "seq_file=" instead
      input_seq_file_list= None Not normally used. Use
                           "component.seq_file" instead
      rebuild_after_mr= True Not normally used. Use instead
                        "build=True"
      solution_key= MR Prefix for name of pdb files output
      fixed_model_file_name= None Optional fixed model. Shortcut for entering
                             an ensemble and defining it as a fixed ensemble.
      fixed_model_identity= None percent identity of fixed model to structure
                            (alternative is fixed_model_RMS).
      fixed_model_RMS= None RMSD of fixed model (use this or
                       fixed_model_identity)
      build_gui= True If checked, the AutoBuild GUI will automatically launch
                 after AutoMR is finished, with input files pre-loaded.