| Python-based Hierarchical ENvironment for Integrated Xtallography |
| Documentation Home |
Automated ligand fitting with LigandFit
Author(s)
PurposePurpose of the LigandFit WizardThe LigandFit Wizard carries out fitting of flexible ligands to electron density maps. UsageThe LigandFit Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here. How the LigandFit Wizard worksThe LigandFit wizard provides a command-line and graphical user interface allowing the user to identify a datafile containing crystallographic structure factor information, an optional PDB file with a partial model of the structure without the ligand, and a PDB file containing the ligand to be fit (in an allowed but arbitrary conformation). The wizard checks the data files for consistency and then calls RESOLVE to carry out the fitting of the ligand into the electron-density map. The map used is normally a difference map, with F=FP-FC. It can also be an Fobs map (calulated from FP with phases PHIC from the input partial model), or an arbitrary map, calculated with FP PHI and FOM. If you supply an input partial model, then the region occupied by the partial model is flattened in the map used to fit the ligand, so that the ligand will normally not get placed in this region. The ligand fitting is done by RESOLVE in a three-stage process. First, the largest contiguous region of density in the map not already occupied by the model is identified. The ligand will be placed in this density. (If desired, the location of the ligand can instead be defined by the user as near a certain residue or near specified coordinates. ) Next, many possible placements of the largest rigid sub-fragments of the ligand are found within this region of high density. Third, each of these placements is taken as a starting point for fitting the remainder of the ligand. All these ligand fits are scored based on the fit to the density, and the best-fitting placement is written out. The output of the wizard consists of a fitted ligand in PDB format and a summary of the quality of the fit. Multiple copies of a ligand can be fit to a single map in an automated fashion using the LigandFit wizard as well. How to run the LigandFit WizardRunning the LigandFit Wizard is easy. For example, from the command-line you can type: phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb The LigandFit Wizard will carry out ligand fitting of the ligand in ligand.pdb based on the structure factor amplitudes in datafile.mtz, calculating phases based on partial-model.pdb. All rotatable bonds will be identified and allowed to take stereochemically reasonable orientations. What the LigandFit wizard needs to run
ExamplesSample command_line inputs
Possible ProblemsSpecific limitations and problems
Literature
Additional informationList of all LigandFit keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
ligandfit
data= None Datafile (alias for input_data_file). This can be any format if
only FP is to be read in. If phases are to be read in then MTZ format
is required. The Wizard will guess the column identification. If you
want to specify it you can say input_labels="FP" , or
input_labels="FP PHIB FOM". (Command-line only)
ligand= None File containing information about the ligand (PDB or SMILES)
(alias for input_lig_file) (Command-line only)
model= None PDB file with model for everything but the ligand (alias for
input_partial_model_file). (Command-line only)
quick= False Run as quickly as possible. (Command-line only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
crystal_info
cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta
gamma
resolution= 0.0 High-resolution limit.Used as resolution limit for
density modification and as general default high-resolution
limit. If resolution_build or refinement_resolution are set
then they override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution
is ignored completely.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
display
number_of_solutions_to_display= None Number of solutions to put on
screen and to write out
solution_to_display= 1 Solution number of the solution to display and
write out ( use 0 to let the wizard display the top
solution)
file_info
file_or_file_list= *single_file file_with_list_of_files Choose if you
want to input a single file with PDB or other
information about the ligand or if you want to input
a file containing a list of files with this
information for a list of ligands
input_labels= None Labels for input data columns NOTE: Applies to input
data file for LigandFit and AutoBuild, but not to AutoMR.
For AutoMR use instead 'input_label_string'.
lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs
Enter the type of map to use in ligand fitting
fo-fc_difference_map: Fo-Fc difference map phased on
partial model fobs_map: Fo map phased on partial model
pre_calculated_map_coeffs: map calculated from FP PHIB
[FOM] coefficients in input data file
ligand_format= *PDB SMILES Enter whether the files contain SMILES
strings or PDB formatted information
general
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
extend_try_list= False You can fill out the list of parallel jobs to
match the number of jobs you want to run at one time,
as specified with nbatch.
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
i_ran_seed= 289564 Random seed (positive integer) for model-building
and simulated annealing refinement
ligand_id= None You can specify an integer value for the ID of a
ligand... This number will be added to whatever residue
number the ligand search model in input_lig_file has. The
keyword is only valid if a single copy of the ligand is to be
found.
max_wait_time= 100.0 You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 5 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single quotes
(') like this: resolve_command_list="'no_build'
'b_overall 23' "
resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
("","_giant","_huge","_extra_huge")
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 LigandFit Sun Dec 7 17:46:24 2008 Enter any text you like
to help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
verbose= False Command files and other verbose output will be printed
input_files
existing_ligand_file_list= None You can enter a list of files with
ligands you have already fit. These will be
used to exclude that region from
consideration.
input_data_file= None Enter the file with input structure factor data
(files other than MTZ will be converted to mtz and
intensities to amplitudes)
input_lig_file= None Enter either a single file with PDB information or
a SMILES string or a file containing a list of files
with this information for a list of ligands. If you
enter a file containing a list of files you need also to
specify
"file_or_file_list=file_with_list_of_files".
If the format is not PDB, then ELBOW will generate a PDB
file.
input_ligand_compare_file= None If you enter a PDB file with a ligand in
it, the coordinates of the newly-built ligand
will be compared with the coordinates in this
file.
input_partial_model_file= None Enter a PDB file containing a model of
your structure without the ligand. This is
used to calculate phases. If you are providing
phases in your data file and have selected
"pre_calculated_map_coeffs" for map_type this
file may be left out.
non_user_parameters
get_lig_volume= False You can ask to get the volume of the ligand and
to then stop
offsets_list= 7 53 29 You can specify an offset for the orientation of
the helix and strand templates in building. This is used
in generating different starting models.
refinement
link_distance_cutoff= 3.0 You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
r_free_flags_fraction= 0.1 Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_lattice_symmetry_max_delta= 5.0 You can set the maximum
deviation of distances in the
lattice that are to be
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
search_parameters
conformers= 1 Enter how many conformers to create. If greater than 1,
then ELBOW will always be used to generate them. If 1 then
ELBOW will be used if a PDB file is not specified. These
conformers are used to identify allowed torsion angles for
your ligand. The alternative is to use the empirical rules
in RESOLVE. ELBOW takes longer but is more accurate.
delta_phi_ligand= 40.0 Specify the angle (degrees) between successive
tries in FFT search for fragments
fit_phi_inc= 20 Specify the angle (degrees) between rotations around
bonds
fit_phi_range= -180 180 Range of bond rotation angles to search
group_search= 0 Enter the ID number of the group from the ligand to use
to seed the search for conformations
ligand_cc_min= 0.75 Enter the minimum correlation coefficient of the
ligand to the map to quit searching for more
conformations
ligand_completeness_min= 1.0 Enter the minimum completeness of the
ligand to the map to quit searching for more
conformations
local_search= True If local_search is Yes then, only the region within
search_dist of the point in the map with the highest local
rmsd will be searched in the FFT search for fragments
n_group_search= 3 Enter the number of different fragments of the ligand
that will be looked for in FFT search of the map
n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at
once otherwise all are first searched at once then
individually up to the number specified
n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
tested at once otherwise all are first tested at once
then individually up to the number specified
number_of_ligands= 1 Number of copies of the ligand expected in the
asymmetric unit
search_dist= 10.0 If local_search is Yes then, only the region within
this distance of the point in the map with the highest
local rmsd will be searched in the FFT search for fragments
use_cc_local= False You can specify the use of a local correlation
coefficient for scoring ligand fits to the map. If you do
not do this, then the region over which the ligand is
scored are all points within 2.5 A of the atoms in the
ligand. If you do specify use_cc_local, then the region
over which the ligand is scored are all these points, plus
all the contingous points that have density greater than
0.5 * sigma .
search_target
ligand_near_chain= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain. If you set
ligand_near_chain="None" or leave it blank or do not
set it, then all chains will be included. The
keywords ligand_near_res and ligand_near_chain refer
to residue/chain in the file defined by
input_partial_model_file (or model if running from
command line).
ligand_near_pdb= None You can specify where LigandFit should look for
your ligands by providing a PDB file containing one or
more copies of the ligand. If you want you can provide
a PDB file with ligand+ macromolecule and specify the
ligand name with name_of_ligand_near_pdb.
ligand_near_res= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain The keywords ligand_near_res and
ligand_near_chain refer to residue/chain in the file
defined by input_partial_model_file (or model if
running from command line).
name_of_ligand_near_pdb= None You can specify where LigandFit should
look for your ligands by providing a PDB file
containing one or more copies of the ligand. If
you want you can provide a PDB file with
ligand+ macromolecule and specify the ligand
name with name_of_ligand_near_pdb.
search_center= 0.0 0.0 0.0 Enter coordinates for center of search region
(ignored if [0,0,0])
| |||||