| Python-based Hierarchical ENvironment for Integrated Xtallography |
| Documentation Home |
Automated molecular replacement with AutoMR
Author(s)
PurposePurpose of the AutoMR WizardThe AutoMR Wizard provides a convenient interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding. The AutoMR Wizard begins with datafiles with structure factor amplitudes and uncertainties, a search model or models, and identifies placements of the search models that are compatible with the data. UsageThe AutoMR Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here. NOTE: You may find it easiest to run the GUI version of AutoMR when you are learning how to use it, and then to move to the command-line or script versions later, as the GUI version will take you through all the necessary steps of organizing your data. Summary of inputs and outputs for AutoMRInput data file. This file can be in most any format, and must contain either amplitudes or intensities and sigmas. You can specify what resolution to use for molecular replacement and separately what resolution to use for model rebuilding. If you specify "0.0" for resolution (recommended) then defaults will be used for molecular replacement (i.e. use data to 2.5A if available to solve structure, then carry out rigid body refinement of final solution with all data) and all the data will be used for model rebuilding. Composition of the asymmetric unit. PHASER needs to know what the total mass in the asymmetric unit is (i.e. not just the mass of the search models). You can define this either by specifying one or more protein or nucleic acid sequence files, or by specifying protein or nucleic acid molecular masses, and telling the Wizard how many copies of each are present. Space groups to search. You can request that all space groups with the same point group as the one you start out with be searched, and the best one be chosen. If you select this option then the best space group will be used for model rebuilding in AutoBuild. Ensembles to search for. AutoMR builds up a model by finding a set of good positions and orientations of one "ensemble", and then using each of those placements as starting points for finding the next ensemble, until all the contents of the asymmetric unit are found and a consistent solution is obtained. You can specify any number of different ensembles to search for, and you can search for any number of copies of each ensemble. The order of searching for ensembles does make a difference. If possible, you want to search for the biggest, best-ordered, most accurate ensemble first. You specify the order when you list the ensembles to search for on the last main window of the AutoMR wizard. Each ensemble can be specified by a single PDB file or a set of PDB files. The contents of one set of PDB files for an ensemble must all be oriented in the same way, as they will be put together and used as a group always in the molecular replacement process. You will need to specify how similar you think each input PDB file that is part of an ensemble is to the structure that is in your crystal. You can specify either sequence identity, or expected rmsd. Note that if you use a homology model, you should give the sequence identity of the template from which the model was constructed, not the 100% identity of the model! Output files from AutoMRWhen you run AutoMR the output files will be in a subdirectory with your run number: AutoMR_run_1_/ # subdirectory with results
Model rebuilding. After PHASER molecular replacement the AutoMR Wizard loads the AutoBuild Wizard and sets the defaults based on the MR solution that has just been found. You can use the default values, or you may choose to use 2Fo-Fc maps instead of density-modified maps for rebuilding, or you may choose to start the model-rebuilding with the map coefficients from MR.MAP_COEFFS.1.mtz. How to run the AutoMR WizardRunning the AutoMR Wizard is easy. For example, from the command-line you can type: phenix.automr native.sca search.pdb RMS=0.8 mass=23000 copies=1 The AutoMR Wizard will find the best location and orientation of the search model search.pdb in the unit cell based on the data in native.sca, assuming that the RMSD between the correct model and search.pdb is about 0.8 A, that the molecular mass of the true model is 23000 and that there is 1 copy of this model in the asymmetric unit. Once the AutoMR Wizard has found a solution, it will automatically call the AutoBuild Wizard and rebuild the model. Components, copies, search models, and ensembles
What the AutoMR wizard needs to runIn a simple case where you have one search model and are looking for N copies of this model in your structure, you need:
It may be advantageous to search using an ensemble of similar structures, rather than a single structure. If you have an ensemble of search models to search for, then specify it as coords='model_1.pdb model_2.pdb model_3.pdb' In this case you need to give the RMS or identity for each model: identity='45 40 35'. Each of the models in the ensemble must be in the same orientation as the others, so that the ensemble of models can be placed as a group in the unit cell. If you are searching for more than one ensemble, or if there is more than one component in the a.u., then use the full syntax and specify them as (NOTE copies becomes copies_to_find or component_copies): ensemble_1.coords=s1.pdb ensemble_1.RMS=0.8 ensemble_1.copies_to_find=1 \ component_1.mass=23000 component_1.component_copies=1 Specifying which columns of data to use from input data filesIf one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing. For example, if your data file data.mtz has columns F SIGF then you might specify data=data.mtz input_label_string="F SIGF" You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=data.mtz # display all labels for data.mtz You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are: data=w1.sca # data file model=coords.pdb # starting model seq_file=seq.dat # sequence file ExamplesStandard AutoMR run with coords.pdb native.scaRun AutoMR using coords.pdb as search model, native.sca as data, assume RMS between coords.pdb and true model is about 0.85 A, the sequence of true model is seq.dat and there is 1 copy in the unit cell: phenix.automr coords.pdb native.sca RMS=0.85 seq.dat copies=1 \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Specifying data columnsRun AutoMR as above, but specify the data columns explicitly: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Note that the data columns are specified by a string that includes both
F and SIGF : "F SIGF". The string must match some set of data labels
that can be extracted automatically from your data file.
You can find the possible values of this string as described above
with
phenix.automr display_labels=data.mtz Specifying a refinement file for AutoBuildRun AutoMR as above, but specify a refinement file that is different from the file used for the MR search: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
input_refinement_file=refinement.mtz \
input_refinement_labels="FP SIGFP FreeR_flag" \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Note that the commands input_refinement_file and input_refinement_labels are
in the scope "autobuild_variables" .
These commands and others with this prefix are
passed on to AutoBuild.
Passing any commands to AutoBuildYou can pass any AutoBuild commands on to AutoBuild, even if they are not already defined for you in AutoMR. Use the command autobuild_input_list_add to add a command, and then apply that command by adding "autobuild_" to the beginning of the command name. For example, to add the commands semet=True and refine=False: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
autobuild_input_list_add='semet refine' \
semet=True \
refine=False
Notes. This applies only to command-line operation of AutoMR.
Note that any keywords that are used in both AutoBuild and AutoMR will apply to
both if you specify them in autobuild_input_list_add. For example if you
set the resolution in AutoBuild with autobuild_input_list_add=resolution and
resolution=2.6 then this resolution will apply to both AutoMR and
AutoBuild.
AutoMR searching for 2 componentsRun AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \ resolution=0.0 resolution_build=3.0 \ component_1.component_type=protein component_1.seq_file=beta.seq \ component_1.component_copies=1 \ component_2.component_type=protein component_2.seq_file=blip.seq \ component_2.component_copies=1 \ ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=1 \ ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \ n_cycle_rebuild_max=1 Specifying molecular masses of 2 componentsRun AutoMR as in the previous example, except specify the components of the asymmetric unit with molecular masses (30000 and 20000), and define the search models with PDB files and percent sequence identity with the true structures (50% and 60%). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \ resolution=0.0 resolution_build=3.0 \ component_1.component_type=protein component_1.mass=30000 \ component_1.component_copies=1 \ component_2.component_type=protein component_2.mass=20000 \ component_2.component_copies=1 \ ensemble_1.coords=beta.pdb ensemble_1.identity=50 ensemble_1.copies_to_find=1 \ ensemble_2.coords=blip.pdb ensemble_2.identity=60 ensemble_2.copies_to_find=1 \ n_cycle_rebuild_max=1 AutoMR searching for 2 components, but specifying the orientation of one of themRun AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. Define the orientation and position of one component. Define the number of copies to find for each component (0 for beta, which is fixed, 1 for blip). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \ resolution=0.0 resolution_build=3.0 \ component_1.component_type=protein component_1.seq_file=beta.seq \ component_1.component_copies=1 \ component_2.component_type=protein component_2.seq_file=blip.seq \ component_2.component_copies=1 \ ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=0 \ ensemble_1.ensembleID="beta" \ ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \ ensemble_2.ensembleID="blip" \ n_cycle_rebuild_max=1 \ fixed_ensembleID_list="beta" \ fixed_euler_list="199.84,41.535,184.15"\ fixed_frac_list="-0.49736,-0.15895,-0.28067"Note: you have to define an ensemble for the fixed molecule (beta in this example). Possible ProblemsSpecific limitations and problems
Literature
Additional informationList of all AutoMR keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
automr
build= True Run AutoBuild immediately after AutoMR (Command-line only)
data= None Datafile (any standard format) (Command-line only)
copies= None Set both copies_to_find and component_copies with copies. This
is the number of copies of this search model to find, and also the
number of copies of this sequence or mass in the asymmetric unit.
(Command-line only)
ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
autobuild_variables
two_fofc_in_rebuild= None Actively sets two_fofc_in_rebuild in
AutoBuild. NOTE: value is not checked
include_input_model= None Actively sets include_input_model in
AutoBuild. NOTE: value is not checked
n_cycle_rebuild_min= None Actively sets n_cycle_rebuild_min in
AutoBuild. NOTE: value is not checked
n_cycle_rebuild_max= None Actively sets n_cycle_rebuild_max in
AutoBuild. NOTE: value is not checked
n_cycle_build_min= None Actively sets n_cycle_build_min in AutoBuild.
NOTE: value is not checked
n_cycle_build_max= None Actively sets n_cycle_build_max in AutoBuild.
NOTE: value is not checked
rebuild_in_place= None Actively sets rebuild_in_place in AutoBuild.
NOTE: value is not checked
thorough_denmod= None Actively sets thorough_denmod in AutoBuild. NOTE:
value is not checked
i_ran_seed= None Actively sets i_ran_seed in AutoBuild. NOTE: value is
not checked
start_chains_list= None Actively sets start_chains_list in AutoBuild.
NOTE: value is not checked
input_refinement_file= None Actively sets input_refinement_file in
AutoBuild. NOTE: value is not checked
input_refinement_labels= None Actively sets input_refinement_labels in
AutoBuild. NOTE: value is not checked
input_labels= None Actively sets input_labels in AutoBuild. NOTE: value
is not checked
resolve_command_list= None Actively sets resolve_command_list in
AutoBuild. NOTE: value is not checked
resolve_pattern_command_list= None Actively sets
resolve_pattern_command_list in AutoBuild.
NOTE: value is not checked
morph= None Actively sets morph in AutoBuild. NOTE: value is not checked
morph_rad= None Actively sets morph_rad in AutoBuild. NOTE: value is not
checked
ensemble_1
ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_2
ensembleID= ensemble_2 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_3
ensembleID= ensemble_3 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_4
ensembleID= ensemble_4 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_5
ensembleID= ensemble_5 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
component_1
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_2
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_3
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_4
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_5
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
crystal_info
cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta
gamma
chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
protein, DNA, or RNA chains. At present you can only build
one of these in a single run. If you have both DNA and
protein, build one first, then run AutoBuild again,
supplying the prebuilt model in the "input_lig_file_list"
and build the other. NOTE: default for this keyword is Auto,
which means "carry out normal process to guess this
keyword". The process is to look at the sequence file and/or
input pdb file to see what the chain type is. If there are
more than one type, the type with the larger number of
residues is guessed. If you want to force the chain_type,
then set it to PROTEIN RNA or DNA.
resolution= 0.0 Enter the high-resolution limit for MR search. All the
data input will be written out regardless of your choice. By
default, the final rigid-body refinement will use all data.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
decision_making
min_seq_identity_percent= 50.0 The sequence in your input PDB file will
be adjusted to match the sequence in your
sequence file (if any). If there are
insertions/deletions in your model and the
wizard does not seem to identify them, you can
split up your PDB file by adding records like
this: BREAK You can specify the minimum
sequence identity between your sequence file
and a segment from your input PDB file to
consider the sequences to be matched. Default
is 50.0%. You might want a higher number to
make sure that deletions in the sequence are
noticed.
overlap_allowed= None Solutions with no C-alpha clashes will be
accepted. If the best packing has some clashes,
solutions with that number of clashes will be accepted,
as long as this does not exceed the maximum allowed.
You can choose to increase the maximum if the packing
is tight and your search molecule is not exactly the
same as the molecule in the cell. If you leave it blank
then Phaser will decide for you.
selection_criteria_rot= *Percent_of_best Number_of_solutions Z_score All
Choose a criterion for keeping rotation
solutions at each stage. The choices are:
Percent of Best Score: AutoMR looks down the list
of LLG scores and only keeps the ones that
differ from the mean by more than the chosen
percentage, compared to the top solution. Enter
your desired percentage into the entry field
(default=75%) Number of Solutions: Keep the N
top solutions (you can set N; default=1)
Z-score: Keep all the solutions with a Z-score
greater than X (you can set X; default=6). All:
Keep everything and go on holiday while Phaser
crunches through it all (definitely not
recommended!)
selection_criteria_rot_value= 75 Choose a value for your criterion for
keeping rotation solutions at each stage.
Percent of Best Score: AutoMR looks down
the list of LLG scores and only keeps the
ones that differ from the mean by more
than the chosen percentage, compared to
the top solution. Enter your desired
percentage into the entry field
(default=75%) Number of Solutions: Keep
the N top solutions (you can set N;
default=1) Z-score: Keep all the solutions
with a Z-score greater than X (you can set
X; default=6). All: Keep everything and go
on holiday while Phaser crunches through
it all (definitely not recommended!)
fixed_ensembles
fixed_ensembleID_list= None Enter the ID (set with ensemble_1.ensembleID
or equivalent) of the component that is to be
fixed. NOTE 1: Each ensemble in
fixed_ensembleID_list must be defined. NOTE 2:
you can enter more than one fixed component if
you want. If you do, then enter fixed_euler_list
in multiples of 3 numbers and also
fixed_frac_list in multiples of 3 numbers.
fixed_euler_list= 0.0 0.0 0.0 Enter Euler angles (from AutoMR or Phaser)
for fixed component defined with
fixed_ensembleID_list. NOTE 2: you can enter more than
one fixed component if you want. If you do, then enter
fixed_euler_list in multiples of 3 numbers and also
fixed_frac_list in multiples of 3 numbers.
fixed_frac_list= 0.0 0.0 0.0 Enter fractional offset (location) for
fixed component (from AutoMR or Phaser) for fixed
component defined with fixed_ensembleID_list. NOTE 2:
you can enter more than one fixed component if you
want. If you do, then enter fixed_euler_list in
multiples of 3 numbers and also fixed_frac_list in
multiples of 3 numbers.
general
all_plausible_sg_list= None Choose which space groups to search
autobuild_input_list_add= None You can add keywords to those that AutoMR
passes on to AutoBuild (command-line only) The
format for this command is:
autobuild_input_list_add='semet refine' Then
you can set any of the variables you specify
by adding the prefix "autobuild_" to the name
of your variable: autobuild_semet=False
autobuild_refine=True This will now set
'semet'=False and refine=True in AutoBuild
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
do_anisotropy_correction= True Choose whether you want to apply
anisotropy correction
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
max_wait_time= 100.0 You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 1 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 AutoMR Sun Dec 7 17:46:24 2008 Enter any text you like to
help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
use_all_plausible_sg= False Normally you will want to search all space
groups with the same point group as you may not
know which is correct from your data. You can
select which of these to choose using 'Choose
variable to set' and selecting
'all_plausible_sg_list'
verbose= False Command files and other verbose output will be printed
input_files
input_data_file= None Enter the a file with input structure factor data.
For structure factor data only (e.g., FP SIGFP) any
format is ok. If you have free R flags, phase
information or HL coefficients that you want to use
then an mtz file is required. If this file contains
phase information, this phase information should be
experimental (i.e., MAD/SAD/MIR etc), and should not be
density-modified phases (enter any files with
density-modified phases as input_map_file instead).
NOTE: If you supply HL coefficients they will be used
in phase recombination. If you supply PHIB or PHIB and
FOM and not HL coefficients, then HL coefficients will
be derived from your PHIB and FOM and used in phase
recombination. If you also specify a hires data file,
then FP and SIGFP will come from that data file (and
not this one) If an input_refinement_file is
specified, then F, Sigma, FreeR_flag (if present) from
that file will be used for refinement instead of this
one.
input_label_string= None Choose the set of labels that represent the
data and sigma columns for your data. NOTE: Applies
to input data file for AutoMR. See also
'input_labels', which applies to input data file for
AutoBuild.
input_pdb_file= None You can enter a PDB file containing a starting
model of your structure NOTE: If you enter a PDB file
then the AutoBuild wizard will start right in with
rebuild steps, skipping the build process. If the model
is very poor than it may be better to leave it out as
the build process (which includes pattern recognition
and recognition of helical and strand fragments) is
optimized for improving poor maps, while the rebuild
process is optimized for better maps that can be
produced by having a partial model.
input_seq_file= None Enter name of file with 1-letter code of protein
sequence NOTES: 1. lines starting with > are ignored
and separate chains 2. FASTA format is fine 3. If
there are multiple copies of a chain, just enter one
copy. 4. If you enter a PDB file for rebuilding and it
has the sequence you want, then the sequence file is not
necessary. NOTE: You can also enter the name of a PDB
file that contains SEQRES records, and the sequence from
the SEQRES records will be read, written to
seq_from_seqres_records.dat, and used as your input
sequence. NOTE: for AutoBuild you can specify
start_chains_list on the first line of your sequence
file: >> start_chains_list 23 11 5 NOTE: default
for this keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not
want this to happen, you can specify None which means
"No file"
input_seq_file_list= None The keyword input_seq_file_list is used in
AutoMR to specify the molecular masses of the
components of the unit cell using a set of sequence
files. Usually you should input the sequences of
the actual components of the unit cell here (one
sequence file for each component). NOTE: If no
input_seq_file is specified, then the sequences
from input_seq_file_list are used to create a new
file "composite_seq.dat" with all their sequences
and this is used as the input_seq_file. NOTE: the
format of each file in input_seq_file_list is the
1-letter code of the protein sequence (separate
chains with >>>>)
model_building
build_type= *RESOLVE_AND_TEXTAL RESOLVE TEXTAL You can choose to build
models with RESOLVE and TEXTAL or either one, and how many
different models to build with RESOLVE. The more you build,
the more likely to get a complete model. Note that
rebuild_in_place can only be carried out with RESOLVE
model-building
rebuild_after_mr= True You can choose to go right on to the AutoBuild
wizard with the rebuild-in-place option after running
molecular replacement.
resolution_build= 0.0 Enter the high-resolution limit for
model-building. If 0.0, the value of resolution is
used as a default.
semet= False You can specify that the dataset that is used for
refinement is a selenomethionine dataset, and that the model
should be the SeMet version of the protein, with all SD of MET
replaced with Se of MSE.
non_user_parameters
composition_num_list= 1 Enter number of copies of this component
weight_list= 0.0 Molecular weight of component (Da; e.g. 30000)
weight_seq_list= None Choose whether to define composition through
molecular weight or sequence
refinement
link_distance_cutoff= 3.0 You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
r_free_flags_fraction= 0.1 Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_lattice_symmetry_max_delta= 5.0 You can set the maximum
deviation of distances in the
lattice that are to be
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
| |||||||