| Python-based Hierarchical ENvironment for Integrated Xtallography |
| Documentation Home |
Automated ligand fitting with LigandFit
Author(s)
PurposePurpose of the LigandFit WizardThe LigandFit Wizard carries out fitting of flexible ligands to electron density maps. UsageThe LigandFit Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here. How the LigandFit Wizard worksThe LigandFit wizard provides a command-line and graphical user interface allowing the user to identify a datafile containing crystallographic structure factor information, an optional PDB file with a partial model of the structure without the ligand, and a PDB file containing the ligand to be fit (in an allowed but arbitrary conformation). The wizard checks the data files for consistency and then calls RESOLVE to carry out the fitting of the ligand into the electron-density map. The map used is normally a difference map, with F=FP-FC. It can also be an Fobs map (calulated from FP with phases PHIC from the input partial model), or an arbitrary map, calculated with FP PHI and FOM. If you supply an input partial model, then the region occupied by the partial model is flattened in the map used to fit the ligand, so that the ligand will normally not get placed in this region. The ligand fitting is done by RESOLVE in a three-stage process. First, the largest contiguous region of density in the map not already occupied by the model is identified. The ligand will be placed in this density. (If desired, the location of the ligand can instead be defined by the user as near a certain residue or near specified coordinates. ) Next, many possible placements of the largest rigid sub-fragments of the ligand are found within this region of high density. Third, each of these placements is taken as a starting point for fitting the remainder of the ligand. All these ligand fits are scored based on the fit to the density, and the best-fitting placement is written out. The output of the wizard consists of a fitted ligand in PDB format and a summary of the quality of the fit. Multiple copies of a ligand can be fit to a single map in an automated fashion using the LigandFit wizard as well. How to run the LigandFit WizardRunning the LigandFit Wizard is easy. For example, from the command-line you can type: phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb The LigandFit Wizard will carry out ligand fitting of the ligand in ligand.pdb based on the structure factor amplitudes in datafile.mtz, calculating phases based on partial-model.pdb. All rotatable bonds will be identified and allowed to take stereochemically reasonable orientations. What the LigandFit wizard needs to run
ExamplesSample command_line inputs
Possible ProblemsSpecific limitations and problems
Literature
Additional informationList of all LigandFit keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
ligandfit
write_run_directory_to_file= None Writes the full name of a run directory
to the specified file. This can be used as a
call-back to tell a script where the output is
going to go. (Command-line only)
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to use
as part of the model-building process. If you just say coot then the
facts for the highest-numbered existing run will be shown.
(Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line keyword
with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop". If
you type "phenix.autobuild run=3 stop" then this will stop run 3 of
autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number. If
you just say display_facts then the facts for the
highest-numbered existing run will be shown. (Command-line
only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run run-number.
If you just say display_summary then the summary for the
highest-numbered existing run will be shown. (Command-line
only)
carry_on= None Set carry_on to True to carry on with highest-numbered run
from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels that
identify data in test.mtz. You can use the label strings
that are produced in AutoSol to identify which data to use
from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" #
the entire string in quotes counts here You can use the
individual labels from these strings as identifiers for
data columns in AutoSol and AutoBuild like this:
input_refinement_labels="FP SIGFP FreeR_flags" # each
individual label counts
dry_run= False Just read in and check parameter names
data= None Datafile (alias for input_data_file). This can be any format if
only FP is to be read in. If phases are to be read in then MTZ format
is required. The Wizard will guess the column identification. If you
want to specify it you can say input_labels="FP" , or
input_labels="FP PHIB FOM". (Command-line only)
ligand= None File containing information about the ligand (PDB or SMILES)
(alias for input_lig_file) (Command-line only)
model= None PDB file with model for everything but the ligand (alias for
input_partial_model_file). (Command-line only)
quick= True *False Yes No Run as quickly as possible. (Command-line only)
background= *Yes No True False When you specify nproc=nn, you can run the
jobs in background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set run_command=qsub
(or otherwise submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this case
because all the runs go as controlled by your batch system. If
you use run_command=csh (or similar, csh is default) then
normally you will use background=True so that all the jobs run
simultaneously.
cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta gamma
clean_up= Yes *No True False At the end of the entire run the TEMP
directories will be removed if clean_up is True. The default is
No, keep these directories. If you want to remove them after your
run is finished use a command like "phenix.autobuild run=1
clean_up=True"
conformers= 1 Enter how many conformers to create. If greater than 1, then
ELBOW will always be used to generate them. If 1 then ELBOW
will be used if a PDB file is not specified. These conformers
are used to identify allowed torsion angles for your ligand.
The alternative is to use the empirical rules in RESOLVE. ELBOW
takes longer but is more accurate.
coot_name= coot If your version of coot is called something else, then you
can specify that here.
debug= Yes *No True False You can have the wizard stop with error messages
about the code if you use debug. NOTE: you cannot use Pause with
debug.
delta_phi_ligand= 40.0 Specify the angle (degrees) between successive tries
in FFT search for fragments
extend_try_list= None You can fill out the list of parallel jobs to match
the number of jobs you want to run at one time, as
specified with nbatch.
extra_verbose= Yes *No True False Facts and possible commands will be
printed every cycle if Yes
file_or_file_list= *single_file file_with_list_of_files Choose if you want
to input a single file with PDB or other information
about the ligand or if you want to input a file
containing a list of files with this information for a
list of ligands
fit_phi_inc= 20 Specify the angle (degrees) between rotations around bonds
fit_phi_range= -180 180 Range of bond rotation angles to search
group_search= 0 Enter the ID number of the group from the ligand to use to
seed the search for conformations
i_ran_seed= 289564 Random seed (positive integer) for model-building and
simulated annealing refinement
input_data_file= None Enter the file with input structure factor data
(files other than MTZ will be converted to mtz and
intensities to amplitudes)
input_labels= None Labels for input data columns NOTE: Applies to input
data file for LigandFit and AutoBuild, but not to AutoMR. For
AutoMR use instead 'input_label_string'.
input_lig_file= None Enter either a single file with PDB information or a
SMILES string or a file containing a list of files with
this information for a list of ligands. If you enter a file
containing a list of files you need also to specify
"file_or_file_list=file_with_list_of_files". If
the format is not PDB, then ELBOW will generate a PDB file.
input_ligand_compare_file= None If you enter a PDB file with a ligand in
it, the coordinates of the newly-built ligand
will be compared with the coordinates in this
file.
input_partial_model_file= None Enter a PDB file containing a model of your
structure without the ligand. This is used to
calculate phases. If you are providing phases in
your data file and have selected "map" for
map_type this file may be left out.
lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs
Enter the type of map to use in ligand fitting
fo-fc_difference_map: Fo-Fc difference map phased on partial
model fobs_map: Fo map phased on partial model
pre_calculated_map_coeffs: map calculated from FP PHIB [FOM]
coefficients in input data file
ligand_cc_min= 0.75 Enter the minimum correlation coefficient of the ligand
to the map to quit searching for more conformations
ligand_completeness_min= 1.0 Enter the minimum completeness of the ligand
to the map to quit searching for more
conformations
ligand_format= *PDB SMILES Enter whether the files contain SMILES strings
or PDB formatted information
ligand_id= None You can specify an integer value for the ID of a ligand...
This number will be added to whatever residue number the ligand
search model in input_lig_file has. The keyword is only valid if
a single copy of the ligand is to be found.
ligand_near_chain= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain. If you set ligand_near_chain="None"
or leave it blank or do not set it, then all chains will
be included. The keywords ligand_near_res and
ligand_near_chain refer to residue/chain in the file
defined by input_partial_model_file (or model if running
from command line).
ligand_near_pdb= None You can specify where LigandFit should look for your
ligands by providing a PDB file containing one or more
copies of the ligand. If you want you can provide a PDB
file with ligand+ macromolecule and specify the ligand
name with name_of_ligand_near_pdb.
ligand_near_res= None You can specify where to search for the ligand either
with search_center or with ligand_near_res and
ligand_near_chain The keywords ligand_near_res and
ligand_near_chain refer to residue/chain in the file
defined by input_partial_model_file (or model if running
from command line).
link_distance_cutoff= 3.0 You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
local_search= *Yes No True False If local_search is Yes then, only the
region within search_dist of the point in the map with the
highest local rmsd will be searched in the FFT search for
fragments
max_wait_time= 100.0 You can specify the length of time (seconds) to wait
when testing the run_command. If you have a cluster where
jobs do not start right away you may need a longer time to
wait.
n_group_search= 3 Enter the number of different fragments of the ligand
that will be looked for in FFT search of the map
n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at once
otherwise all are first searched at once then
individually up to the number specified
n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
tested at once otherwise all are first tested at once
then individually up to the number specified
name_of_ligand_near_pdb= None You can specify where LigandFit should look
for your ligands by providing a PDB file
containing one or more copies of the ligand. If
you want you can provide a PDB file with ligand+
macromolecule and specify the ligand name with
name_of_ligand_near_pdb.
nbatch= 5 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be set
automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be set
automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are rather
run with different random numbers. If you want to get the same
results, keep the same value of nbatch.
number_of_ligands= 1 Number of copies of the ligand expected in the
asymmetric unit
number_of_solutions_to_display= None Number of solutions to put on screen
and to write out
offsets_list= 7 53 29 You can specify an offset for the orientation of the
helix and strand templates in building. This is used in
generating different starting models.
r_free_flags_fraction= 0.1 Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum number
of reflections in the free R set. The number of
reflections in the free R set will be up the lower
of the values defined by these two parameters.
r_free_flags_lattice_symmetry_max_delta= 5.0 You can set the maximum
deviation of distances in the
lattice that are to be considered
the same for purposes of
generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum number
of reflections in the free R set. The number of
reflections in the free R set will be up the lower
of the values defined by these two parameters.
r_free_flags_use_lattice_symmetry= *Yes No True False When generating
r_free_flags you can decide whether to
include lattice symmetry (good in
general, necessary if there is
twinning).
resolution= 0.0 High-resolution limit.Used as resolution limit for density
modification and as general default high-resolution limit. If
resolution_build or refinement_resolution are set then they
override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution is
ignored completely.
resolve_command_list= None Commands for resolve. One per line in the form:
keyword value value can be optional Examples:
coarse_grid resolution 200 2.0 hklin test.mtz NOTE:
for command-line usage you need to enclose the whole
set of commands in double quotes (") and each
individual command in single quotes (') like this:
resolve_command_list="'no_build' 'b_overall 23' "
resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
("","_giant","_huge","_extra_huge")
resolve_wait_time= 1 You can specify the length of time (seconds) to wait
when running solve resolve and resolve_pattern before
looking for their log files. If you have NFS-mounted
disks you may need to increase this beyond the default
(1 second).
run_command= csh When you specify nproc=nn, you can run the subprocesses as
jobs in background with csh (default) or submit them to a
queue with the command of your choice (i.e., qsub ). If you
have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your system.
NOTE: If you set run_command=qsub (or otherwise submit to a
batch queue), then you should set background=False, so that
the batch queue can keep track of your runs. There is no need
to use background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
search_center= 0.0 0.0 0.0 Enter coordinates for center of search region
(ignored if [0,0,0])
search_dist= 10.0 If local_search is Yes then, only the region within this
distance of the point in the map with the highest local rmsd
will be searched in the FFT search for fragments
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
skip_xtriage= Yes *No True False You can bypass xtriage if you want. This
will prevent you from applying anisotropy corrections,
however.
solution_to_display= 1 Solution number of the solution to display and write
out ( use 0 to let the wizard display the top
solution)
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 LigandFit Sun Aug 10 23:28:07 2008 Enter any text you like to
help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
use_cc_local= Yes *No True False You can specify the use of a local
correlation coefficient for scoring ligand fits to the map.
If you do not do this, then the region over which the ligand
is scored are all points within 2.5 A of the atoms in the
ligand. If you do specify use_cc_local, then the region over
which the ligand is scored are all these points, plus all the
contingous points that have density greater than 0.5 * sigma
.
verbose= Yes *No True False Command files and other verbose output will be
printed
| |||||