| Python-based Hierarchical ENvironment for Integrated Xtallography |
| Documentation Home |
Automated molecular replacement with AutoMR
Author(s)
PurposePurpose of the AutoMR WizardThe AutoMR Wizard provides a convenient interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding. The AutoMR Wizard begins with datafiles with structure factor amplitudes and uncertainties, a search model or models, and identifies placements of the search models that are compatible with the data. UsageThe AutoMR Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here. NOTE: You may find it easiest to run the GUI version of AutoMR when you are learning how to use it, and then to move to the command-line or script versions later, as the GUI version will take you through all the necessary steps of organizing your data. Summary of inputs and outputs for AutoMRInput data file. This file can be in most any format, and must contain either amplitudes or intensities and sigmas. You can specify what resolution to use for molecular replacement and separately what resolution to use for model rebuilding. If you specify "0.0" for resolution (recommended) then defaults will be used for molecular replacement (i.e. use data to 2.5A if available to solve structure, then carry out rigid body refinement of final solution with all data) and all the data will be used for model rebuilding. Composition of the asymmetric unit. PHASER needs to know what the total mass in the asymmetric unit is (i.e. not just the mass of the search models). You can define this either by specifying one or more protein or nucleic acid sequence files, or by specifying protein or nucleic acid molecular masses, and telling the Wizard how many copies of each are present. Space groups to search. You can request that all space groups with the same point group as the one you start out with be searched, and the best one be chosen. If you select this option then the best space group will be used for model rebuilding in AutoBuild. Ensembles to search for. AutoMR builds up a model by finding a set of good positions and orientations of one "ensemble", and then using each of those placements as starting points for finding the next ensemble, until all the contents of the asymmetric unit are found and a consistent solution is obtained. You can specify any number of different ensembles to search for, and you can search for any number of copies of each ensemble. The order of searching for ensembles does make a difference. If possible, you want to search for the biggest, best-ordered, most accurate ensemble first. You specify the order when you list the ensembles to search for on the last main window of the AutoMR wizard. Each ensemble can be specified by a single PDB file or a set of PDB files. The contents of one set of PDB files for an ensemble must all be oriented in the same way, as they will be put together and used as a group always in the molecular replacement process. You will need to specify how similar you think each input PDB file that is part of an ensemble is to the structure that is in your crystal. You can specify either sequence identity, or expected rmsd. Note that if you use a homology model, you should give the sequence identity of the template from which the model was constructed, not the 100% identity of the model! Output files from AutoMRWhen you run AutoMR the output files will be in a subdirectory with your run number: AutoMR_run_1_/ # subdirectory with results
Model rebuilding. After PHASER molecular replacement the AutoMR Wizard loads the AutoBuild Wizard and sets the defaults based on the MR solution that has just been found. You can use the default values, or you may choose to use 2Fo-Fc maps instead of density-modified maps for rebuilding, or you may choose to start the model-rebuilding with the map coefficients from MR.MAP_COEFFS.1.mtz. How to run the AutoMR WizardRunning the AutoMR Wizard is easy. For example, from the command-line you can type: phenix.automr native.sca search.pdb RMS=0.8 mass=23000 copies=1 The AutoMR Wizard will find the best location and orientation of the search model search.pdb in the unit cell based on the data in native.sca, assuming that the RMSD between the correct model and search.pdb is about 0.8 A, that the molecular mass of the true model is 23000 and that there is 1 copy of this model in the asymmetric unit. Once the AutoMR Wizard has found a solution, it will automatically call the AutoBuild Wizard and rebuild the model. Components, copies, search models, and ensembles
What the AutoMR wizard needs to runIn a simple case where you have one search model and are looking for N copies of this model in your structure, you need:
It may be advantageous to search using an ensemble of similar structures, rather than a single structure. If you have an ensemble of search models to search for, then specify it as coords='model_1.pdb model_2.pdb model_3.pdb' In this case you need to give the RMS or identity for each model: identity='45 40 35'. Each of the models in the ensemble must be in the same orientation as the others, so that the ensemble of models can be placed as a group in the unit cell. If you are searching for more than one ensemble, or if there is more than one component in the a.u., then use the full syntax and specify them as (NOTE copies becomes copies_to_find or component_copies): ensemble_1.coords=s1.pdb ensemble_1.RMS=0.8 ensemble_1.copies_to_find=1 \ component_1.mass=23000 component_1.component_copies=1 Specifying which columns of data to use from input data filesIf one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing. For example, if your data file data.mtz has columns F SIGF then you might specify data=data.mtz input_label_string="F SIGF" You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=data.mtz # display all labels for data.mtz You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are: data=w1.sca # data file model=coords.pdb # starting model seq_file=seq.dat # sequence file ExamplesStandard AutoMR run with coords.pdb native.scaRun AutoMR using coords.pdb as search model, native.sca as data, assume RMS between coords.pdb and true model is about 0.85 A, the sequence of true model is seq.dat and there is 1 copy in the unit cell: phenix.automr coords.pdb native.sca RMS=0.85 seq.dat copies=1 \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Specifying data columnsRun AutoMR as above, but specify the data columns explicitly: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Note that the data columns are specified by a string that includes both
F and SIGF : "F SIGF". The string must match some set of data labels
that can be extracted automatically from your data file.
You can find the possible values of this string as described above
with
phenix.automr display_labels=data.mtz Specifying a refinement file for AutoBuildRun AutoMR as above, but specify a refinement file that is different from the file used for the MR search: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
autobuild_input_refinement_file=refinement.mtz \
autobuild_input_refinement_labels="FP SIGFP FreeR_flag" \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Note that the commands input_refinement_file and input_refinement_labels are
preceded by autobuild_ . These commands and others with this prefix are
passed on to AutoBuild.
Passing any commands to AutoBuildYou can pass any AutoBuild commands on to AutoBuild, even if they are not already defined for you in AutoMR. Use the command autobuild_input_list_add to add a command, and then apply that command by adding "autobuild_" to the beginning of the command name. For example, to add the commands semet=True and refine=False: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
autobuild_input_list_add='semet refine' \
autobuild_semet=True \
autobuild_refine=False
Notes. This applies only to command-line operation of AutoMR.
Note that any keywords that are used in both AutoBuild and AutoMR will apply to
both if you specify them in autobuild_input_list_add. For example if you
set the resolution in AutoBuild with autobuild_input_list_add=resolution and
autobuild_resolution=2.6 then this resolution will apply to both AutoMR and
AutoBuild.
AutoMR searching for 2 componentsRun AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \ resolution=0.0 resolution_build=3.0 \ component_1.component_type=protein component_1.seq_file=beta.seq \ component_1.component_copies=1 \ component_2.component_type=protein component_2.seq_file=blip.seq \ component_2.component_copies=1 \ ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=1 \ ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \ n_cycle_rebuild_max=1 Specifying molecular masses of 2 componentsRun AutoMR as in the previous example, except specify the components of the asymmetric unit with molecular masses (30000 and 20000), and define the search models with PDB files and percent sequence identity with the true structures (50% and 60%). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \ resolution=0.0 resolution_build=3.0 \ component_1.component_type=protein component_1.mass=30000 \ component_1.component_copies=1 \ component_2.component_type=protein component_2.mass=20000 \ component_2.component_copies=1 \ ensemble_1.coords=beta.pdb ensemble_1.identity=50 ensemble_1.copies_to_find=1 \ ensemble_2.coords=blip.pdb ensemble_2.identity=60 ensemble_2.copies_to_find=1 \ n_cycle_rebuild_max=1 AutoMR searching for 2 components, but specifying the orientation of one of themRun AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. Define the orientation and position of one component. Define the numbrer of copies to find for each component (0 for beta, which is fixed, 1 for blip). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \ resolution=0.0 resolution_build=3.0 \ component_1.component_type=protein component_1.seq_file=beta.seq \ component_1.component_copies=1 \ component_2.component_type=protein component_2.seq_file=blip.seq \ component_2.component_copies=1 \ ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=0 \ ensemble_1.ensembleID="beta" \ ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \ ensemble_2.ensembleID="blip" \ n_cycle_rebuild_max=1 \ fixed_ensembleID_list="beta" \ fixed_euler_list="199.84,41.535,184.15"\ fixed_frac_list="-0.49736,-0.15895,-0.28067"Note: you have to define an ensemble for the fixed molecule (beta in this example). Combining MR and SAD phasing information (MRSAD)You can combine MR information with SAD phases (see J. P. Schuermann and J. J. Tanner Acta Cryst. (2003). D59, 1731-1736 ) conveniently in PHENIX by running the three wizards AutoMR, AutoSol, and AutoBuild one after the other. Here is a set of three simple commands to do that: First run AutoMR to find the molecular replacement solution, but don't rebuild it yet: phenix.automr gene-5.pdb infl.sca copies=1 \ RMS=1.5 mass=9800 rebuild_after_mr=False Now your MR solution is in AutoMR_run_1_/MR.1.pdb and phases are in AutoMR_run_1_/MR.1.mtz. Use these phases as input to AutoSol, along with some weak SAD data, still not building any new models: phenix.autosol data=infl.sca \ input_phase_file=AutoMR_run_1_/MR.1.mtz input_phase_labels="F PHIC FOM" \ seq_file=sequence.dat build=False note that we have specified the data columns for F PHI and FOM in the input_phase_file. For input_phase_file you must specify all three of these (if you leave out FOM it will set it to zero). AutoSol will write an MTZ file with experimental phases to phaser_xx.mtz where xx depends on how many solutions are considered during the run. The next command for running AutoBuild you will need to edit depending on the value of xx: phenix.autobuild data=AutoSol_run_1_/phaser_2.mtz \ model=AutoMR_run_1_/MR.1.pdb seq_file=sequence.dat rebuild_in_place=False AutoBuild will now take the phases from your AutoSol run and combine them with model-based information from your AutoMR MR solution, and will carry out iterative density modification, model-building and refinement to rebuild your model. Note that you may wish to set rebuild_in_place=True, depending on how good your MR model is. Possible ProblemsSpecific limitations and problems
Literature
Additional informationList of all AutoMR keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
automr
write_run_directory_to_file= None Writes the full name of a run directory
to the specified file. This can be used as a
call-back to tell a script where the output is
going to go. (Command-line only)
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to use
as part of the model-building process. If you just say coot then the
facts for the highest-numbered existing run will be shown.
(Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line keyword
with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop". If
you type "phenix.autobuild run=3 stop" then this will stop run 3 of
autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number. If
you just say display_facts then the facts for the
highest-numbered existing run will be shown. (Command-line
only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run run-number.
If you just say display_summary then the summary for the
highest-numbered existing run will be shown. (Command-line
only)
carry_on= None Set carry_on to True to carry on with highest-numbered run
from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels that
identify data in test.mtz. You can use the label strings
that are produced in AutoSol to identify which data to use
from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" #
the entire string in quotes counts here You can use the
individual labels from these strings as identifiers for
data columns in AutoSol and AutoBuild like this:
input_refinement_labels="FP SIGFP FreeR_flags" # each
individual label counts
dry_run= False Just read in and check parameter names
build= True Run AutoBuild immediately after AutoMR (Command-line only)
data= None Datafile (any standard format) (Command-line only)
copies= None Set both copies_to_find and component_copies with copies. This
is the number of copies of this search model to find, and also the
number of copies of this sequence or mass in the asymmetric unit.
(Command-line only)
autobuild_two_fofc_in_rebuild= None Actively sets two_fofc_in_rebuild in
AutoBuild. NOTE: value is not checked
autobuild_include_input_model= None Actively sets include_input_model in
AutoBuild. NOTE: value is not checked
autobuild_n_cycle_rebuild_min= None Actively sets n_cycle_rebuild_min in
AutoBuild. NOTE: value is not checked
autobuild_n_cycle_rebuild_max= None Actively sets n_cycle_rebuild_max in
AutoBuild. NOTE: value is not checked
autobuild_debug= None Actively sets debug in AutoBuild. NOTE: value is not
checked
autobuild_n_cycle_build_min= None Actively sets n_cycle_build_min in
AutoBuild. NOTE: value is not checked
autobuild_n_cycle_build_max= None Actively sets n_cycle_build_max in
AutoBuild. NOTE: value is not checked
autobuild_rebuild_in_place= None Actively sets rebuild_in_place in
AutoBuild. NOTE: value is not checked
autobuild_thorough_denmod= None Actively sets thorough_denmod in AutoBuild.
NOTE: value is not checked
autobuild_i_ran_seed= None Actively sets i_ran_seed in AutoBuild. NOTE:
value is not checked
autobuild_start_chains_list= None Actively sets start_chains_list in
AutoBuild. NOTE: value is not checked
autobuild_input_refinement_file= None Actively sets input_refinement_file
in AutoBuild. NOTE: value is not checked
autobuild_input_refinement_labels= None Actively sets
input_refinement_labels in AutoBuild.
NOTE: value is not checked
autobuild_input_labels= None Actively sets input_labels in AutoBuild. NOTE:
value is not checked
autobuild_resolve_command_list= None Actively sets resolve_command_list in
AutoBuild. NOTE: value is not checked
autobuild_resolve_pattern_command_list= None Actively sets
resolve_pattern_command_list in
AutoBuild. NOTE: value is not
checked
autobuild_nproc= None Actively sets nproc in AutoBuild. NOTE: value is not
checked
autobuild_nbatch= None Actively sets nbatch in AutoBuild. NOTE: value is
not checked
ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
all_plausible_sg_list= None Choose which space groups to search
autobuild_input_list_add= None You can add keywords to those that AutoMR
passes on to AutoBuild (command-line only) The
format for this command is:
autobuild_input_list_add='semet refine' Then you
can set any of the variables you specify by
adding the prefix "autobuild_" to the name of
your variable: autobuild_semet=False
autobuild_refine=True This will now set
'semet'=False and refine=True in AutoBuild
background= *Yes No True False When you specify nproc=nn, you can run the
jobs in background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set run_command=qsub
(or otherwise submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this case
because all the runs go as controlled by your batch system. If
you use run_command=csh (or similar, csh is default) then
normally you will use background=True so that all the jobs run
simultaneously.
build_type= *RESOLVE_AND_TEXTAL RESOLVE TEXTAL You can choose to build
models with RESOLVE and TEXTAL or either one, and how many
different models to build with RESOLVE. The more you build, the
more likely to get a complete model. Note that rebuild_in_place
can only be carried out with RESOLVE model-building
cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta gamma
chain_type= *Auto PROTEIN DNA RNA You can specify whether to build protein,
DNA, or RNA chains. At present you can only build one of these
in a single run. If you have both DNA and protein, build one
first, then run AutoBuild again, supplying the prebuilt model
in the "input_lig_file_list" and build the other. NOTE: default
for this keyword is Auto, which means "carry out normal process
to guess this keyword". The process is to look at the sequence
file and/or input pdb file to see what the chain type is. If
there are more than one type, the type with the larger number
of residues is guessed. If you want to force the chain_type,
then set it to PROTEIN RNA or DNA.
clean_up= Yes *No True False At the end of the entire run the TEMP
directories will be removed if clean_up is True. The default is
No, keep these directories. If you want to remove them after your
run is finished use a command like "phenix.autobuild run=1
clean_up=True"
composition_num_list= 1 Enter number of copies of this component
coot_name= coot If your version of coot is called something else, then you
can specify that here.
debug= Yes *No True False You can have the wizard stop with error messages
about the code if you use debug. NOTE: you cannot use Pause with
debug.
do_anisotropy_correction= *Yes No True False Choose whether you want to
apply anisotropy correction
extra_verbose= Yes *No True False Facts and possible commands will be
printed every cycle if Yes
fixed_ensembleID_list= None Enter the ID (set with ensemble_1.ensembleID or
equivalent) of the component that is to be fixed.
NOTE 1: Each ensemble in fixed_ensembleID_list must
be defined. NOTE 2: you can enter more than one
fixed component if you want. If you do, then enter
fixed_euler_list in multiples of 3 numbers and also
fixed_frac_list in multiples of 3 numbers.
fixed_euler_list= 0.0 0.0 0.0 Enter Euler angles (from AutoMR or Phaser)
for fixed component defined with fixed_ensembleID_list.
NOTE 2: you can enter more than one fixed component if
you want. If you do, then enter fixed_euler_list in
multiples of 3 numbers and also fixed_frac_list in
multiples of 3 numbers.
fixed_frac_list= 0.0 0.0 0.0 Enter fractional offset (location) for fixed
component (from AutoMR or Phaser) for fixed component
defined with fixed_ensembleID_list. NOTE 2: you can enter
more than one fixed component if you want. If you do, then
enter fixed_euler_list in multiples of 3 numbers and also
fixed_frac_list in multiples of 3 numbers.
i_ran_seed= 84670 Random seed (positive integer) for model-building and
simulated annealing refinement
include_input_model= Yes *No True False The keyword include_input_model
defines whether the input model (if any) is to be
crossed with models that are derived from it, and the
best parts of each kept. Note that if
multiple_models=True and include_input_model=True then
no initial cycle of randomization will be carried out
and the keyword multiple_models_starting_resolution is
ignored. In most cases you should use
include_input_model=True If you want to generate
maximum diversity with multiple-models then you may
wish to use include_input_model=False. Also if you
want to decrease the amount of bias from your starting
model you may wish to use include_input_model=False.
input_data_file= None Enter the a file with input structure factor data.
For structure factor data only (e.g., FP SIGFP) any format
is ok. If you have free R flags, phase information or HL
coefficients that you want to use then an mtz file is
required. If this file contains phase information, this
phase information should be experimental (i.e.,
MAD/SAD/MIR etc), and should not be density-modified
phases (enter any files with density-modified phases as
input_map_file instead). If you also specify a hires data
file, then FP and SIGFP will come from that data file (and
not this one) If an input_refinement_file is specified,
then F, Sigma, FreeR_flag (if present) from that file will
be used for refinement instead of this one.
input_label_string= None Choose the set of labels that represent the data
and sigma columns for your data. NOTE: Applies to input
data file for AutoMR. See also 'input_labels', which
applies to input data file for AutoBuild.
input_labels= None Labels for input data columns NOTE: Applies to input
data file for LigandFit and AutoBuild, but not to AutoMR. For
AutoMR use instead 'input_label_string'.
input_pdb_file= None You can enter a PDB file containing a starting model
of your structure NOTE: If you enter a PDB file then the
AutoBuild wizard will start right in with rebuild steps,
skipping the build process. If the model is very poor than
it may be better to leave it out as the build process
(which includes pattern recognition and recognition of
helical and strand fragments) is optimized for improving
poor maps, while the rebuild process is optimized for
better maps that can be produced by having a partial model.
input_refinement_file= None Data file to use for refinement. The data in
this file should not be corrected for anisotropy. It
will be combined with experimental phase information
(if any) from input_data_file for refinement. If you
leave this blank, then the data in the
input_data_file will be used in refinement. If no
anisotropy correction is applied to the data you do
not need to specify a datafile for refinement. If an
anisotropy correction is applied to the data files,
then you should enter an uncorrected datafile for
refinement. Any standard format is fine; normally
only F and sigF will be used. Bijvoet pairs and
duplicates will be averaged. If an mtz file is
provided then a free R flag can be read in as well.
Any HL coeffs and phase information in this file is
ignored. NOTE: default for this keyword is Auto,
which means "carry out normal process to guess this
keyword". This means if you specify "after_autosol"
in AutoBuild, AutoBuild will automatically take the
value from AutoSol. If you do not want this to
happen, you can specify None which means "No file"
input_refinement_labels= None Labels for input refinement file columns (FP
SIGFP FreeR_flag)
input_seq_file= None Enter name of file with 1-letter code of protein
sequence NOTES: 1. lines starting with >>> are
ignored and separate chains 2. FASTA format is fine 3. If
there are multiple copies of a chain, just enter one copy.
4. If you enter a PDB file for rebuilding and it has the
sequence you want, then the sequence file is not necessary.
NOTE: You can also enter the name of a PDB file that
contains SEQRES records, and the sequence from the SEQRES
records will be read, written to
seq_from_seqres_records.dat, and used as your input
sequence. NOTE: for AutoBuild you can specify
start_chains_list on the first line of your sequence file:
>>> start_chains_list 23 11 5 NOTE: default for
this keyword is Auto, which means "carry out normal process
to guess this keyword". This means if you specify
"after_autosol" in AutoBuild, AutoBuild will automatically
take the value from AutoSol. If you do not want this to
happen, you can specify None which means "No file"
input_seq_file_list= None The keyword input_seq_file_list is used in AutoMR
to specify the molecular masses of the components of
the unit cell using a set of sequence files. Usually
you should input the sequences of the actual
components of the unit cell here (one sequence file
for each component). NOTE: If no input_seq_file is
specified, then the sequences from input_seq_file_list
are used to create a new file "composite_seq.dat" with
all their sequences and this is used as the
input_seq_file. NOTE: the format of each file in
input_seq_file_list is the 1-letter code of the
protein sequence (separate chains with >>>)
max_wait_time= 100.0 You can specify the length of time (seconds) to wait
when testing the run_command. If you have a cluster where
jobs do not start right away you may need a longer time to
wait.
min_seq_identity_percent= 50.0 The sequence in your input PDB file will be
adjusted to match the sequence in your sequence
file (if any). If there are insertions/deletions
in your model and the wizard does not seem to
identify them, you can split up your PDB file by
adding records like this: BREAK You can specify
the minimum sequence identity between your
sequence file and a segment from your input PDB
file to consider the sequences to be matched.
Default is 50.0%. You might want a higher number
to make sure that deletions in the sequence are
noticed.
n_cycle_build_max= None Maximum number of cycles for iterative
model-building, starting from experimental phases
without a model. Even if a satisfactory model is not
found, a maximum of n_cycle_build_max cycles will be
carried out.
n_cycle_build_min= None Minimum number of cycles for iterative
model-building, starting from experimental phases
without a model. Even if a satisfactory model is found,
n_cycle_build_min cycles will be carried out.
n_cycle_rebuild_max= None Maximum number of cycles for iterative
model-rebuilding, starting from a model. Even if a
satisfactory model is not found, a maximum of
n_cycle_rebuild_max cycles will be carried out.
n_cycle_rebuild_min= None Mininum number of cycles for iterative
model-rebuilding, starting from a model. Even if a
satisfactory model is found, n_cycle_rebuild_min
cycles will be carried out.
nbatch= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be set
automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be set
automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are rather
run with different random numbers. If you want to get the same
results, keep the same value of nbatch.
overlap_allowed= None Solutions with no C-alpha clashes will be accepted.
If the best packing has some clashes, solutions with that
number of clashes will be accepted, as long as this does
not exceed the maximum allowed. You can choose to increase
the maximum if the packing is tight and your search
molecule is not exactly the same as the molecule in the
cell. If you leave it blank then Phaser will decide for
you.
rebuild_after_mr= *Yes No True False You can choose to go right on to the
AutoBuild wizard with the rebuild-in-place option after
running molecular replacement.
rebuild_in_place= *Auto Yes No True False You can choose to rebuild your
model while fixing the sequence alignment by iteratively
rebuilding segments within the model. This is done
n_rebuild_in_place times, then the models are recombined,
taking the best-fitting parts of each. Crossovers allowed
where main-chain atom rmsd is less than dist_close. Note
that the sequence of the input model must match the
supplied sequence closely enough to allow a clear
alignment. Also this method does not build any new chain,
it just moves the existing model around. Normally this
procedure is useful if the model is greater than 95%
identical with the target sequence. You can include
information directly from the starting model if you want
with the keyword include_input_model. Then this model
will be recombined with the models that are built based
on it. Note that this requires that the input model have
a sequence that is identical to the model to be rebuilt.
You can also rebuild just a portion of the model with the
keywords keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1 (use " "
for blank) The residues from 3 to 4 of chain1 will be
rebuilt. You can specify more than one region by using
the Parameter Group Options button to add lines NOTE: if
a region cannot be rebuilt the original coordinates will
be preserved for that region.
resolution= 0.0 Enter the high-resolution limit for MR search. All the data
input will be written out regardless of your choice. By
default, the final rigid-body refinement will use all data.
resolution_build= 0.0 Enter the high-resolution limit for model-building.
If 0.0, the value of resolution is used as a default.
run_command= csh When you specify nproc=nn, you can run the subprocesses as
jobs in background with csh (default) or submit them to a
queue with the command of your choice (i.e., qsub ). If you
have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your system.
NOTE: If you set run_command=qsub (or otherwise submit to a
batch queue), then you should set background=False, so that
the batch queue can keep track of your runs. There is no need
to use background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
selection_criteria_rot= *Percent_of_best Number_of_solutions Z_score All
Choose a criterion for keeping rotation solutions at
each stage. The choices are: Percent of Best Score:
AutoMR looks down the list of LLG scores and only
keeps the ones that differ from the mean by more
than the chosen percentage, compared to the top
solution. Enter your desired percentage into the
entry field (default=75%) Number of Solutions: Keep
the N top solutions (you can set N; default=1)
Z-score: Keep all the solutions with a Z-score
greater than X (you can set X; default=6). All:
Keep everything and go on holiday while Phaser
crunches through it all (definitely not
recommended!)
selection_criteria_rot_value= 75 Choose a value for your criterion for
keeping rotation solutions at each stage.
Percent of Best Score: AutoMR looks down the
list of LLG scores and only keeps the ones
that differ from the mean by more than the
chosen percentage, compared to the top
solution. Enter your desired percentage into
the entry field (default=75%) Number of
Solutions: Keep the N top solutions (you can
set N; default=1) Z-score: Keep all the
solutions with a Z-score greater than X (you
can set X; default=6). All: Keep everything
and go on holiday while Phaser crunches
through it all (definitely not recommended!)
semet= Yes *No True False You can specify that the dataset that is used for
refinement is a selenomethionine dataset, and that the model should
be the SeMet version of the protein, with all SD of MET replaced
with Se of MSE.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
skip_xtriage= Yes *No True False You can bypass xtriage if you want. This
will prevent you from applying anisotropy corrections,
however.
start_chains_list= None You can specify the starting residue number for
each of the unique chains in your structure. If you use
a sequence file then the unique chains are extracted and
the order must match the order of your starting residue
numbers. For example, if your sequence file has chains A
and B (identical) and chains C and D (identical to each
other, but different than A and B) then you can enter 2
numbers, the starting residues for chains A and C. NOTE:
you need to specify an input sequence file for
start_chains_list to be applied.
temp_dir= None Define a temporary directory (it must exist)
thorough_denmod= *Auto Yes No True False Choose whether you want to go for
thorough density modification when no model is used ("No"
speeds it up and for a terrible map is sometimes better)
title= Run 1 AutoMR Mon May 26 12:09:04 2008 Enter any text you like to
help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
two_fofc_in_rebuild= Yes *No True False You can choose to use a
sigmaa-weighted 2Fo-Fc map in all cycles of rebuilding
instead of a density-modified map. If the model is
poor this can sometimes allow model-building in place
to work even when it will not for density-modified
maps.
use_all_plausible_sg= Yes *No True False Normally you will want to search
all space groups with the same point group as you may
not know which is correct from your data. You can
select which of these to choose using 'Choose
variable to set' and selecting
'all_plausible_sg_list'
verbose= Yes *No True False Command files and other verbose output will be
printed
weight_list= 0.0 Molecular weight of component (Da; e.g. 30000)
weight_seq_list= None Choose whether to define composition through
molecular weight or sequence
ensemble_1
ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_2
ensembleID= ensemble_2 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_3
ensembleID= ensemble_3 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_4
ensembleID= ensemble_4 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_5
ensembleID= ensemble_5 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
component_1
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_2
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_3
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_4
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_5
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
| |||||||