phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated ligand fitting with LigandFit

Author(s)
Purpose
Purpose of the LigandFit Wizard
Usage
How the LigandFit Wizard works
How to run the LigandFit Wizard
What the LigandFit wizard needs to run
Specifying which columns of data to use from input data files
Output files from LigandFit
Examples
Sample command_line inputs
Possible Problems
Specific limitations and problems
Literature
Additional information
List of all LigandFit keywords

Author(s)

  • LigandFit Wizard: Tom Terwilliger
  • PHENIX GUI and PDS Server: Nigel W. Moriarty
  • RESOLVE: Tom Terwilliger

Purpose

Purpose of the LigandFit Wizard

The LigandFit Wizard carries out fitting of flexible ligands to electron density maps.

Usage

The LigandFit Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here.

How the LigandFit Wizard works

The LigandFit wizard provides a command-line and graphical user interface allowing the user to identify a datafile containing crystallographic structure factor information, an optional PDB file with a partial model of the structure without the ligand, and a PDB file containing the ligand to be fit (in an allowed but arbitrary conformation).

The wizard checks the data files for consistency and then calls RESOLVE to carry out the fitting of the ligand into the electron-density map. The map used is normally a difference map, with F=FP-FC. It can also be an Fobs map (calulated from FP with phases PHIC from the input partial model), or an arbitrary map, calculated with FP PHI and FOM. If you supply an input partial model, then the region occupied by the partial model is flattened in the map used to fit the ligand, so that the ligand will normally not get placed in this region.

The ligand fitting is done by RESOLVE in a three-stage process. First, the largest contiguous region of density in the map not already occupied by the model is identified. The ligand will be placed in this density. (If desired, the location of the ligand can instead be defined by the user as near a certain residue or near specified coordinates. ) Next, many possible placements of the largest rigid sub-fragments of the ligand are found within this region of high density. Third, each of these placements is taken as a starting point for fitting the remainder of the ligand. All these ligand fits are scored based on the fit to the density, and the best-fitting placement is written out.

The output of the wizard consists of a fitted ligand in PDB format and a summary of the quality of the fit. Multiple copies of a ligand can be fit to a single map in an automated fashion using the LigandFit wizard as well.

How to run the LigandFit Wizard

Running the LigandFit Wizard is easy. For example, from the command-line you can type:

phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb

The LigandFit Wizard will carry out ligand fitting of the ligand in ligand.pdb based on the structure factor amplitudes in datafile.mtz, calculating phases based on partial-model.pdb. All rotatable bonds will be identified and allowed to take stereochemically reasonable orientations.

What the LigandFit wizard needs to run

The ligandfit wizard needs:

  • (1) a datafile (w1.sca or data=w1.sca); this can be any format

  • (2) a PDB file with your model without ligand (model=partial.pdb; optional if your datafile contains map coefficients)

  • (3) a file with information about your ligand (ligand=side.pdb)

    The ligand file can be a PDB file with 1 stereochemically acceptable conformation of your ligand. It can alternatively be a file containing a SMILES string, in which case the starting ligand conformation will be generated with the PHENIX elbow routine.

    The command_line ligandfit interpreter will guess which file is your data file but you have to tell it which file is the model and which is the ligand.

    Specifying which columns of data to use from input data files

    If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

    For example, if your data file data.mtz has columns FP SIGFP then you might specify

    data=data.mtz
    input_labels="FP SIGFP"
    

    You can find out all the possible label strings in a data file that you might use by typing:

    phenix.autosol display_labels=data.mtz  # display all labels for data.mtz
    

    You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

    data=w1.sca       # data file
    partial_model=coords.pdb  # starting model without ligand
    ligand=ligand.pdb # any stereochemically allowed conformation of your ligand
    resolution=3     # dmin of 3 A
    quick=False      # specify if you want to look hard for a good conformation
    ligand_cc_min=0.75   # quit if the CC of ligand to map is 0.75 or better
    number_of_ligands=3  # find 3 copies of the ligand
    n_group_search=3     # try 3 different fragments of the ligand in initial search
    resolve_command="'ligand_start side.pdb'" # build ligand superimposing on side.pdb
    

    Output files from LigandFit

    When you run LigandFit the output files will be in a subdirectory with your run number:

    LigandFit_run_1_/   # subdirectory with results
    

  • A summary file listing the results of the run and the other files produced:
    LigandFit_summary.dat  # overall summary
    

  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    LigandFit_Facts.dat   # all Facts about the run
    

  • A warnings file listing any warnings about the run
    LigandFit_warnings.dat  # any warnings
    

  • A PDB file with the fitted ligand (in this case the first copy of ligand number 1):
    ligand_fit_1_1.pdb
    

  • A log file with the fitting of the ligand:
    ligand_1_1.log
    

  • A log file with the fit of the ligand to the map:
    ligand_cc_1_1.log
    

  • Map coefficients for the map used for fitting:
    resolve_map.mtz
    

Examples

Sample command_line inputs

  • Standard run of ligandfit (generate map from model and data file)
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb
    

  • Build into a map from pre-determined coefficients
    phenix.ligandfit data=perfect.mtz \
     lig_map_type=fo-fc_difference_map   \
       model=partial.pdb ligand=side.pdb
    

  • Quick run of ligandfit
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb quick=True
    

  • Run ligandfit on a series of ligands specified in ligand_list.dat
    phenix.ligandfit w1.sca model=partial.pdb \
      ligand=ligand_list.dat file_or_file_list=file_with_list_of_files
    
    Note that you have to specify
    file_or_file_list=file_with_list_of_files
    
    or else the Wizard will try to interpret the contents of ligand_list.dat as a SMILES string. Here the "file_with_list_of_files" is a flag, not something you substitute with an actual file name. You use it just as listed above.

  • Place ligand near residue 94 of chain "A" from partial.pdb
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \
       ligand_near_chain="A" ligand_near_res=92
    

  • Use start.pdb as a template for some of the atoms in the ligand; build the remainder of the ligand, fixing the coordinates of the corresponding atoms:
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \
       resolve_command="'ligand_start start.pdb'"  # NOTE ' and " quotes necessary
    
    Note that the formatting is slightly tricky and requires the two different quotation marks on either end of the command. This is an example of passing a specific keyword to RESOLVE.

Possible Problems

Specific limitations and problems

  • The ligand to be searched for must have at least 3 atoms.

  • The partial-model file must not have any atoms in the position where the ligand is to be built. If this file contains solvent waters, then you may wish to remove them before building the ligand.

  • If a ring in the ligand can have more than one conformation (e.g., chair or boat conformation) then you need to do separate runs for each conformation of the ring (rings are taken as fixed units in LigandFit).

  • LigandFit ignores insertion codes, so if you specify a residue with ligand_near_res, only the residue number is used.

  • The size of the asymmetric unit in the SOLVE/RESOLVE portion of the LigandFit wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.

  • The LigandFit Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Ligand identification using electron-density map correlations. T. C. Terwilliger, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D63, 101-107 (2007)
[pdf]
Automated ligand fitting by core-fragment fitting and extension into density. T. C. Terwilliger, H. Klei, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D62, 915-922 (2006)
[pdf]

Additional information

List of all LigandFit keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
ligandfit
   write_run_directory_to_file= None Writes the full name of a run directory
                                to the specified file. This can be used as a
                                call-back to tell a script where the output is
                                going to go. (Command-line only)
   coot= None Set coot to True and optionally run=[run-number] to run Coot
         with the current model and map for run run-number. In some wizards
         (AutoBuild) you can edit the model and give it back to PHENIX to use
         as part of the model-building process. If you just say coot then the
         facts for the highest-numbered existing run will be shown.
         (Command-line only)
   ignore_blanks= None ignore_blanks allows you to have a command-line keyword
                  with a blank value like "input_lig_file_list="
   stop= None You can stop the current wizard with "stopwizard" or "stop". If
         you type "phenix.autobuild run=3 stop" then this will stop run 3 of
         autobuild. (Command-line only)
   display_facts= None Set display_facts to True and optionally
                  run=[run-number] to display the facts for run run-number. If
                  you just say display_facts then the facts for the
                  highest-numbered existing run will be shown. (Command-line
                  only)
   display_summary= None Set display_summary to True and optionally
                    run=[run-number] to show the summary for run run-number.
                    If you just say display_summary then the summary for the
                    highest-numbered existing run will be shown. (Command-line
                    only)
   carry_on= None Set carry_on to True to carry on with highest-numbered run
             from where you left off. (Command-line only)
   run= None Set run to n to continue with run n where you left off.
        (Command-line only)
   copy_run= None Set copy_run to n to copy run n to a new run and continue
             where you left off. (Command-line only)
   display_runs= None List all runs for this wizard. (Command-line only)
   delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
   display_labels= None display_labels=test.mtz will list all the labels that
                   identify data in test.mtz. You can use the label strings
                   that are produced in AutoSol to identify which data to use
                   from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" #
                   the entire string in quotes counts here You can use the
                   individual labels from these strings as identifiers for
                   data columns in AutoSol and AutoBuild like this:
                   input_refinement_labels="FP SIGFP FreeR_flags" # each
                   individual label counts
   dry_run= False Just read in and check parameter names
   data= None Datafile (alias for input_data_file). This can be any format if
         only FP is to be read in. If phases are to be read in then MTZ format
         is required. The Wizard will guess the column identification. If you
         want to specify it you can say input_labels="FP" , or
         input_labels="FP PHIB FOM". (Command-line only)
   ligand= None File containing information about the ligand (PDB or SMILES)
           (alias for input_lig_file) (Command-line only)
   model= None PDB file with model for everything but the ligand (alias for
          input_partial_model_file). (Command-line only)
   quick= True *False Yes No Run as quickly as possible. (Command-line only)
   background= *Yes No True False When you specify nproc=nn, you can run the
               jobs in background (default if nproc is greater than 1) or
               foreground (default if nproc=1). If you set run_command=qsub
               (or otherwise submit to a batch queue), then you should set
               background=False, so that the batch queue can keep track of
               your runs. There is no need to use background=True in this case
               because all the runs go as controlled by your batch system. If
               you use run_command=csh (or similar, csh is default) then
               normally you will use background=True so that all the jobs run
               simultaneously.
   cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta gamma
   clean_up= Yes *No True False At the end of the entire run the TEMP
             directories will be removed if clean_up is True. The default is
             No, keep these directories. If you want to remove them after your
             run is finished use a command like "phenix.autobuild run=1
             clean_up=True"
   conformers= 1 Enter how many conformers to create. If greater than 1, then
               ELBOW will always be used to generate them. If 1 then ELBOW
               will be used if a PDB file is not specified. These conformers
               are used to identify allowed torsion angles for your ligand.
               The alternative is to use the empirical rules in RESOLVE. ELBOW
               takes longer but is more accurate.
   coot_name= coot If your version of coot is called something else, then you
              can specify that here.
   debug= Yes *No True False You can have the wizard stop with error messages
          about the code if you use debug. NOTE: you cannot use Pause with
          debug.
   delta_phi_ligand= 40.0 Specify the angle (degrees) between successive tries
                     in FFT search for fragments
   extend_try_list= None You can fill out the list of parallel jobs to match
                    the number of jobs you want to run at one time, as
                    specified with nbatch.
   extra_verbose= Yes *No True False Facts and possible commands will be
                  printed every cycle if Yes
   file_or_file_list= *single_file file_with_list_of_files Choose if you want
                      to input a single file with PDB or other information
                      about the ligand or if you want to input a file
                      containing a list of files with this information for a
                      list of ligands
   fit_phi_inc= 20 Specify the angle (degrees) between rotations around bonds
   fit_phi_range= -180 180 Range of bond rotation angles to search
   group_search= 0 Enter the ID number of the group from the ligand to use to
                 seed the search for conformations
   i_ran_seed= 289564 Random seed (positive integer) for model-building and
               simulated annealing refinement
   input_data_file= None Enter the file with input structure factor data
                    (files other than MTZ will be converted to mtz and
                    intensities to amplitudes)
   input_labels= None Labels for input data columns NOTE: Applies to input
                 data file for LigandFit and AutoBuild, but not to AutoMR. For
                 AutoMR use instead 'input_label_string'.
   input_lig_file= None Enter either a single file with PDB information or a
                   SMILES string or a file containing a list of files with
                   this information for a list of ligands. If you enter a file
                   containing a list of files you need also to specify
                   "file_or_file_list=file_with_list_of_files". If
                   the format is not PDB, then ELBOW will generate a PDB file.
   input_ligand_compare_file= None If you enter a PDB file with a ligand in
                              it, the coordinates of the newly-built ligand
                              will be compared with the coordinates in this
                              file.
   input_partial_model_file= None Enter a PDB file containing a model of your
                             structure without the ligand. This is used to
                             calculate phases. If you are providing phases in
                             your data file and have selected "map" for
                             map_type this file may be left out.
   lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs 
                Enter the type of map to use in ligand fitting
                 fo-fc_difference_map: Fo-Fc difference map phased on partial
                 model fobs_map: Fo map phased on partial model 
                 pre_calculated_map_coeffs: map calculated from FP PHIB [FOM]
                 coefficients in input data file
   ligand_cc_min= 0.75 Enter the minimum correlation coefficient of the ligand
                  to the map to quit searching for more conformations
   ligand_completeness_min= 1.0 Enter the minimum completeness of the ligand
                            to the map to quit searching for more
                            conformations
   ligand_format= *PDB SMILES Enter whether the files contain SMILES strings
                  or PDB formatted information
   ligand_id= None You can specify an integer value for the ID of a ligand...
              This number will be added to whatever residue number the ligand
              search model in input_lig_file has. The keyword is only valid if
              a single copy of the ligand is to be found. 
   ligand_near_chain= None You can specify where to search for the ligand
                      either with search_center or with ligand_near_res and
                      ligand_near_chain. If you set ligand_near_chain="None"
                      or leave it blank or do not set it, then all chains will
                      be included. The keywords ligand_near_res and
                      ligand_near_chain refer to residue/chain in the file
                      defined by input_partial_model_file (or model if running
                      from command line). 
   ligand_near_pdb= None You can specify where LigandFit should look for your
                    ligands by providing a PDB file containing one or more
                    copies of the ligand. If you want you can provide a PDB
                    file with ligand+ macromolecule and specify the ligand
                    name with name_of_ligand_near_pdb.
   ligand_near_res= None You can specify where to search for the ligand either
                    with search_center or with ligand_near_res and
                    ligand_near_chain The keywords ligand_near_res and
                    ligand_near_chain refer to residue/chain in the file
                    defined by input_partial_model_file (or model if running
                    from command line). 
   link_distance_cutoff= 3.0 You can specify the maximum bond distance for
                         linking residues in phenix.refine called from the
                         wizards.
   local_search= *Yes No True False If local_search is Yes then, only the
                 region within search_dist of the point in the map with the
                 highest local rmsd will be searched in the FFT search for
                 fragments
   max_wait_time= 100.0 You can specify the length of time (seconds) to wait
                  when testing the run_command. If you have a cluster where
                  jobs do not start right away you may need a longer time to
                  wait.
   n_group_search= 3 Enter the number of different fragments of the ligand
                   that will be looked for in FFT search of the map
   n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at once
                      otherwise all are first searched at once then
                      individually up to the number specified
   n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
                      tested at once otherwise all are first tested at once
                      then individually up to the number specified
   name_of_ligand_near_pdb= None You can specify where LigandFit should look
                            for your ligands by providing a PDB file
                            containing one or more copies of the ligand. If
                            you want you can provide a PDB file with ligand+
                            macromolecule and specify the ligand name with
                            name_of_ligand_near_pdb.
   nbatch= 5 You can specify the number of processors to use (nproc) and the
           number of batches to divide the data into for parallel jobs.
           Normally you will set nproc to the number of processors available
           and leave nbatch alone. If you leave nbatch as None it will be set
           automatically, with a value depending on the Wizard. This is
           recommended. The value of nbatch can affect the results that you
           get, as the jobs are not split into exact replicates, but are
           rather run with different random numbers. If you want to get the
           same results, keep the same value of nbatch.
   nproc= 1 You can specify the number of processors to use (nproc) and the
          number of batches to divide the data into for parallel jobs.
          Normally you will set nproc to the number of processors available
          and leave nbatch alone. If you leave nbatch as None it will be set
          automatically, with a value depending on the Wizard. This is
          recommended. The value of nbatch can affect the results that you
          get, as the jobs are not split into exact replicates, but are rather
          run with different random numbers. If you want to get the same
          results, keep the same value of nbatch.
   number_of_ligands= 1 Number of copies of the ligand expected in the
                      asymmetric unit
   number_of_solutions_to_display= None Number of solutions to put on screen
                                   and to write out
   offsets_list= 7 53 29 You can specify an offset for the orientation of the
                 helix and strand templates in building. This is used in
                 generating different starting models.
   r_free_flags_fraction= 0.1 Maximum fraction of reflections in the free R
                          set. You can choose the maximum fraction of
                          reflections in the free R set and the maximum number
                          of reflections in the free R set. The number of
                          reflections in the free R set will be up the lower
                          of the values defined by these two parameters.
   r_free_flags_lattice_symmetry_max_delta= 5.0 You can set the maximum
                                            deviation of distances in the
                                            lattice that are to be considered
                                            the same for purposes of
                                            generating a
                                            lattice-symmetry-unique set of
                                            free R flags.
   r_free_flags_max_free= 2000 Maximum number of reflections in the free R
                          set. You can choose the maximum fraction of
                          reflections in the free R set and the maximum number
                          of reflections in the free R set. The number of
                          reflections in the free R set will be up the lower
                          of the values defined by these two parameters.
   r_free_flags_use_lattice_symmetry= *Yes No True False When generating
                                      r_free_flags you can decide whether to
                                      include lattice symmetry (good in
                                      general, necessary if there is
                                      twinning).
   resolution= 0.0 High-resolution limit.Used as resolution limit for density
               modification and as general default high-resolution limit. If
               resolution_build or refinement_resolution are set then they
               override this for model-building or refinement. If
               overall_resolution is set then data beyond that resolution is
               ignored completely.
   resolve_command_list= None Commands for resolve. One per line in the form:
                         keyword value value can be optional Examples:
                         coarse_grid resolution 200 2.0 hklin test.mtz NOTE:
                         for command-line usage you need to enclose the whole
                         set of commands in double quotes (") and each
                         individual command in single quotes (') like this:
                         resolve_command_list="'no_build' 'b_overall 23' "
   resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
                 ("","_giant","_huge","_extra_huge")
   resolve_wait_time= 1 You can specify the length of time (seconds) to wait
                      when running solve resolve and resolve_pattern before
                      looking for their log files. If you have NFS-mounted
                      disks you may need to increase this beyond the default
                      (1 second).
   run_command= csh When you specify nproc=nn, you can run the subprocesses as
                jobs in background with csh (default) or submit them to a
                queue with the command of your choice (i.e., qsub ). If you
                have a multi-processor machine, use csh. If you have a
                cluster, use qsub or the equivalent command for your system.
                NOTE: If you set run_command=qsub (or otherwise submit to a
                batch queue), then you should set background=False, so that
                the batch queue can keep track of your runs. There is no need
                to use background=True in this case because all the runs go as
                controlled by your batch system. If you use run_command=csh
                (or similar, csh is default) then normally you will use
                background=True so that all the jobs run simultaneously.
   search_center= 0.0 0.0 0.0 Enter coordinates for center of search region
                  (ignored if [0,0,0])
   search_dist= 10.0 If local_search is Yes then, only the region within this
                distance of the point in the map with the highest local rmsd
                will be searched in the FFT search for fragments
   sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
   skip_xtriage= Yes *No True False You can bypass xtriage if you want. This
                 will prevent you from applying anisotropy corrections,
                 however.
   solution_to_display= 1 Solution number of the solution to display and write
                        out ( use 0 to let the wizard display the top
                        solution)
   temp_dir= None Define a temporary directory (it must exist)
   title= Run 1 LigandFit Sun Aug 10 23:28:07 2008 Enter any text you like to
          help identify what you did in this run
   top_output_dir= None This is used in subprocess calls of wizards and to
                   tell the Wizard where to look for the STOPWIZARD file.
   use_cc_local= Yes *No True False You can specify the use of a local
                 correlation coefficient for scoring ligand fits to the map.
                 If you do not do this, then the region over which the ligand
                 is scored are all points within 2.5 A of the atoms in the
                 ligand.  If you do specify use_cc_local, then the region over
                 which the ligand is scored are all these points, plus all the
                 contingous points that have density greater than 0.5 * sigma
                 .
   verbose= Yes *No True False Command files and other verbose output will be
            printed