phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated ligand fitting with LigandFit

Author(s)
Purpose
Purpose of the LigandFit Wizard
Usage
How the LigandFit Wizard works
How to run the LigandFit Wizard
What the LigandFit wizard needs to run
Specifying which columns of data to use from input data files
Output files from LigandFit
Examples
Sample command_line inputs
Possible Problems
Specific limitations and problems
Literature
Additional information
List of all LigandFit keywords

Author(s)

  • LigandFit Wizard: Tom Terwilliger
  • PHENIX GUI and PDS Server: Nigel W. Moriarty
  • RESOLVE: Tom Terwilliger

Purpose

Purpose of the LigandFit Wizard

The LigandFit Wizard carries out fitting of flexible ligands to electron density maps.

Usage

The LigandFit Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here.

How the LigandFit Wizard works

The LigandFit wizard provides a command-line and graphical user interface allowing the user to identify a datafile containing crystallographic structure factor information, an optional PDB file with a partial model of the structure without the ligand, and a PDB file containing the ligand to be fit (in an allowed but arbitrary conformation).

The wizard checks the data files for consistency and then calls RESOLVE to carry out the fitting of the ligand into the electron-density map. The map used is normally a difference map, with F=FP-FC. It can also be an Fobs map (calulated from FP with phases PHIC from the input partial model), or an arbitrary map, calculated with FP PHI and FOM. If you supply an input partial model, then the region occupied by the partial model is flattened in the map used to fit the ligand, so that the ligand will normally not get placed in this region.

The ligand fitting is done by RESOLVE in a three-stage process. First, the largest contiguous region of density in the map not already occupied by the model is identified. The ligand will be placed in this density. (If desired, the location of the ligand can instead be defined by the user as near a certain residue or near specified coordinates. ) Next, many possible placements of the largest rigid sub-fragments of the ligand are found within this region of high density. Third, each of these placements is taken as a starting point for fitting the remainder of the ligand. All these ligand fits are scored based on the fit to the density, and the best-fitting placement is written out.

The output of the wizard consists of a fitted ligand in PDB format and a summary of the quality of the fit. Multiple copies of a ligand can be fit to a single map in an automated fashion using the LigandFit wizard as well.

How to run the LigandFit Wizard

Running the LigandFit Wizard is easy. For example, from the command-line you can type:

phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb

The LigandFit Wizard will carry out ligand fitting of the ligand in ligand.pdb based on the structure factor amplitudes in datafile.mtz, calculating phases based on partial-model.pdb. All rotatable bonds will be identified and allowed to take stereochemically reasonable orientations.

What the LigandFit wizard needs to run

The ligandfit wizard needs:

  • (1) a datafile (w1.sca or data=w1.sca); this can be any format

  • (2) a PDB file with your model without ligand (model=partial.pdb; optional if your datafile contains map coefficients)

  • (3) a file with information about your ligand (ligand=side.pdb)

    The ligand file can be a PDB file with 1 stereochemically acceptable conformation of your ligand. It can alternatively be a file containing a SMILES string, in which case the starting ligand conformation will be generated with the PHENIX elbow routine.

    The command_line ligandfit interpreter will guess which file is your data file but you have to tell it which file is the model and which is the ligand.

    Specifying which columns of data to use from input data files

    If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

    For example, if your data file data.mtz has columns FP SIGFP then you might specify

    data=data.mtz
    input_labels="FP SIGFP"
    

    You can find out all the possible label strings in a data file that you might use by typing:

    phenix.autosol display_labels=data.mtz  # display all labels for data.mtz
    

    You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

    data=w1.sca       # data file
    partial_model=coords.pdb  # starting model without ligand
    ligand=ligand.pdb # any stereochemically allowed conformation of your ligand
    resolution=3     # dmin of 3 A
    quick=False      # specify if you want to look hard for a good conformation
    ligand_cc_min=0.75   # quit if the CC of ligand to map is 0.75 or better
    number_of_ligands=3  # find 3 copies of the ligand
    n_group_search=3     # try 3 different fragments of the ligand in initial search
    resolve_command="'ligand_start side.pdb'" # build ligand superimposing on side.pdb
    

    Output files from LigandFit

    When you run LigandFit the output files will be in a subdirectory with your run number:

    LigandFit_run_1_/   # subdirectory with results
    

  • A summary file listing the results of the run and the other files produced:
    LigandFit_summary.dat  # overall summary
    

  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    LigandFit_Facts.dat   # all Facts about the run
    

  • A warnings file listing any warnings about the run
    LigandFit_warnings.dat  # any warnings
    

  • A PDB file with the fitted ligand (in this case the first copy of ligand number 1):
    ligand_fit_1_1.pdb
    

  • A log file with the fitting of the ligand:
    ligand_1_1.log
    

  • A log file with the fit of the ligand to the map:
    ligand_cc_1_1.log
    

  • Map coefficients for the map used for fitting:
    resolve_map.mtz
    

Examples

Sample command_line inputs

  • Standard run of ligandfit (generate map from model and data file)
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb
    

  • Build into a map from pre-determined coefficients
    phenix.ligandfit data=perfect.mtz \
     lig_map_type=fo-fc_difference_map   \
       model=partial.pdb ligand=side.pdb
    

  • Quick run of ligandfit
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb quick=True
    

  • Run ligandfit on a series of ligands specified in ligand_list.dat
    phenix.ligandfit w1.sca model=partial.pdb \
      ligand=ligand_list.dat file_or_file_list=file_with_list_of_files
    
    Note that you have to specify
    file_or_file_list=file_with_list_of_files
    
    or else the Wizard will try to interpret the contents of ligand_list.dat as a SMILES string. Here the "file_with_list_of_files" is a flag, not something you substitute with an actual file name. You use it just as listed above.

  • Place ligand near residue 94 of chain "A" from partial.pdb
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \
       ligand_near_chain="A" ligand_near_res=92
    

  • Use start.pdb as a template for some of the atoms in the ligand; build the remainder of the ligand, fixing the coordinates of the corresponding atoms:
    phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \
       resolve_command="'ligand_start start.pdb'"  # NOTE ' and " quotes necessary
    
    Note that the formatting is slightly tricky and requires the two different quotation marks on either end of the command. This is an example of passing a specific keyword to RESOLVE.

Possible Problems

Specific limitations and problems

  • The ligand to be searched for must have at least 3 atoms.

  • The partial-model file must not have any atoms (other than waters, which are automatically removed) in the position where the ligand is to be built. If this file contains atoms other than waters in the position where the ligand is to be built, then you may wish to remove them before building the ligand.

  • If a ring in the ligand can have more than one conformation (e.g., chair or boat conformation) then you need to do separate runs for each conformation of the ring (rings are taken as fixed units in LigandFit).

  • LigandFit ignores insertion codes, so if you specify a residue with ligand_near_res, only the residue number is used.

  • The size of the asymmetric unit in the SOLVE/RESOLVE portion of the LigandFit wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.

  • The LigandFit Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Ligand identification using electron-density map correlations. T. C. Terwilliger, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D63, 101-107 (2007)
[pdf]
Automated ligand fitting by core-fragment fitting and extension into density. T. C. Terwilliger, H. Klei, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D62, 915-922 (2006)
[pdf]

Additional information

List of all LigandFit keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
ligandfit
   data= None Datafile (alias for input_data_file). This can be any format if
         only FP is to be read in. If phases are to be read in then MTZ format
         is required. The Wizard will guess the column identification. If you
         want to specify it you can say input_labels="FP" , or
         input_labels="FP PHIB FOM". (Command-line only)
   ligand= None File containing information about the ligand (PDB or SMILES)
           (alias for input_lig_file) (Command-line only)
   model= None PDB file with model for everything but the ligand (alias for
          input_partial_model_file). (Command-line only)
   quick= False Run as quickly as possible. (Command-line only)
   special_keywords
      write_run_directory_to_file= None Writes the full name of a run
                                   directory to the specified file. This can
                                   be used as a call-back to tell a script
                                   where the output is going to go.
                                   (Command-line only)
   run_control
      coot= None Set coot to True and optionally run=[run-number] to run Coot
            with the current model and map for run run-number. In some wizards
            (AutoBuild) you can edit the model and give it back to PHENIX to
            use as part of the model-building process. If you just say coot
            then the facts for the highest-numbered existing run will be
            shown. (Command-line only)
      ignore_blanks= None ignore_blanks allows you to have a command-line
                     keyword with a blank value like "input_lig_file_list="
      stop= None You can stop the current wizard with "stopwizard" or "stop".
            If you type "phenix.autobuild run=3 stop" then this will stop run
            3 of autobuild. (Command-line only)
      display_facts= None Set display_facts to True and optionally
                     run=[run-number] to display the facts for run run-number.
                     If you just say display_facts then the facts for the
                     highest-numbered existing run will be shown.
                     (Command-line only)
      display_summary= None Set display_summary to True and optionally
                       run=[run-number] to show the summary for run
                       run-number. If you just say display_summary then the
                       summary for the highest-numbered existing run will be
                       shown. (Command-line only)
      carry_on= None Set carry_on to True to carry on with highest-numbered
                run from where you left off. (Command-line only)
      run= None Set run to n to continue with run n where you left off.
           (Command-line only)
      copy_run= None Set copy_run to n to copy run n to a new run and continue
                where you left off. (Command-line only)
      display_runs= None List all runs for this wizard. (Command-line only)
      delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
      display_labels= None display_labels=test.mtz will list all the labels
                      that identify data in test.mtz. You can use the label
                      strings that are produced in AutoSol to identify which
                      data to use from a datafile like this: peak.data="F+
                      SIGF+ F- SIGF-" # the entire string in quotes counts
                      here You can use the individual labels from these
                      strings as identifiers for data columns in AutoSol and
                      AutoBuild like this: input_refinement_labels="FP SIGFP
                      FreeR_flags" # each individual label counts
      dry_run= False Just read in and check parameter names
      params_only= False Just read in and return parameter defaults
      display_all= False Just read in and display parameter defaults
   crystal_info
      cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta
            gamma
      resolution= 0.0 High-resolution limit.Used as resolution limit for
                  density modification and as general default high-resolution
                  limit. If resolution_build or refinement_resolution are set
                  then they override this for model-building or refinement. If
                  overall_resolution is set then data beyond that resolution
                  is ignored completely. 
      sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
   display
      number_of_solutions_to_display= None Number of solutions to put on
                                      screen and to write out
      solution_to_display= 1 Solution number of the solution to display and
                           write out ( use 0 to let the wizard display the top
                           solution)
   file_info
      file_or_file_list= *single_file file_with_list_of_files Choose if you
                         want to input a single file with PDB or other
                         information about the ligand or if you want to input
                         a file containing a list of files with this
                         information for a list of ligands
      input_labels= None Labels for input data columns NOTE: Applies to input
                    data file for LigandFit and AutoBuild, but not to AutoMR.
                    For AutoMR use instead 'input_label_string'.
      lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs 
                   Enter the type of map to use in ligand fitting 
                   fo-fc_difference_map: Fo-Fc difference map phased on
                    partial model fobs_map: Fo map phased on partial model  
                   pre_calculated_map_coeffs: map calculated from FP PHIB
                    [FOM] coefficients in input data file
      ligand_format= *PDB SMILES Enter whether the files contain SMILES
                     strings or PDB formatted information 
   general
      background= True When you specify nproc=nn, you can run the jobs in
                  background (default if nproc is greater than 1) or
                  foreground (default if nproc=1).  If you set
                  run_command=qsub (or otherwise submit to a batch queue),
                  then you should set background=False, so that the batch
                  queue can keep track of your runs. There is no need to use
                  background=True in this case because all the runs go as
                  controlled by your batch system. If you use run_command=csh
                  (or similar, csh is default) then normally you will use
                  background=True so that all the jobs run simultaneously.
      base_path= None You can specify the base path for files (default is
                 current working directory)
      clean_up= False At the end of the entire run the TEMP directories will
                be removed if clean_up is True. The default is No, keep these
                directories. If you want to remove them after your run is
                finished use a command like "phenix.autobuild run=1
                clean_up=True"
      coot_name= coot If your version of coot is called something else, then
                 you can specify that here.
      debug= False  You can have the wizard stop with error messages about the
             code if you use debug. NOTE: you cannot use Pause with debug.
      extend_try_list= False  You can fill out the list of parallel jobs to
                       match the number of jobs you want to run at one time,
                       as specified with nbatch.
      extra_verbose= False Facts and possible commands will be printed every
                     cycle if Yes
      i_ran_seed= 289564  Random seed (positive integer) for model-building
                  and simulated annealing refinement
      ligand_id= None  You can specify an integer value for the ID of a
                 ligand... This number will be added to whatever residue
                 number the ligand search model in input_lig_file has. The
                 keyword is only valid if a single copy of the ligand is to be
                 found. 
      max_wait_time= 100.0 You can specify the length of time (seconds) to
                     wait when testing the run_command. If you have a cluster
                     where jobs do not start right away you may need a longer
                     time to wait.
      nbatch= 5 You can specify the number of processors to use (nproc) and
              the number of batches to divide the data into for parallel jobs.
              Normally you will set nproc to the number of processors
              available and leave nbatch alone. If you leave nbatch as None it
              will be set automatically, with a value depending on the Wizard.
              This is recommended. The value of nbatch can affect the results
              that you get, as the jobs are not split into exact replicates,
              but are rather run with different random numbers. If you want to
              get the same results, keep the same value of nbatch.
      nproc= 1 You can specify the number of processors to use (nproc) and the
             number of batches to divide the data into for parallel jobs.
             Normally you will set nproc to the number of processors available
             and leave nbatch alone. If you leave nbatch as None it will be
             set automatically, with a value depending on the Wizard. This is
             recommended. The value of nbatch can affect the results that you
             get, as the jobs are not split into exact replicates, but are
             rather run with different random numbers. If you want to get the
             same results, keep the same value of nbatch.
      resolve_command_list= None  Commands for resolve. One per line in the
                            form:  keyword value  value can be optional 
                            Examples:  coarse_grid  resolution 200 2.0  hklin
                            test.mtz  NOTE: for command-line usage you need to
                            enclose the whole set of commands in double quotes
                            (") and each individual command in single quotes
                            (') like this: resolve_command_list="'no_build'
                            'b_overall 23' "
      resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
                    ("","_giant","_huge","_extra_huge")
      run_command= csh When you specify nproc=nn, you can run the subprocesses
                   as jobs in background with csh (default) or submit them to
                   a queue with the command of your choice (i.e., qsub ). If
                   you have a multi-processor machine, use csh. If you have a
                   cluster, use qsub or the equivalent command for your
                   system.  NOTE: If you set run_command=qsub (or otherwise
                   submit to a batch queue), then you should set
                   background=False, so that the batch queue can keep track of
                   your runs. There is no need to use background=True in this
                   case because all the runs go as controlled by your batch
                   system. If you use run_command=csh (or similar, csh is
                   default) then normally you will use background=True so that
                   all the jobs run simultaneously.
      skip_xtriage= False You can bypass xtriage if you want. This will
                    prevent you from applying anisotropy corrections, however.
      temp_dir= None Define a temporary directory (it must exist)
      title= Run 1 LigandFit Sun Dec 7 17:46:24 2008  Enter any text you like
             to help identify what you did in this run
      top_output_dir= None This is used in subprocess calls of wizards and to
                      tell the Wizard where to look for the STOPWIZARD file. 
      verbose= False Command files and other verbose output will be printed
   input_files
      existing_ligand_file_list= None You can enter a list of files with
                                 ligands you have already fit. These will be
                                 used to exclude that region from
                                 consideration.
      input_data_file= None Enter the file with input structure factor data
                       (files other than MTZ will be converted to mtz and
                       intensities to amplitudes) 
      input_lig_file= None Enter either a single file with PDB information or
                      a SMILES string or a file containing a list of files
                      with this information for a list of ligands. If you
                      enter a file containing a list of files you need also to
                      specify
                      "file_or_file_list=file_with_list_of_files". 
                     If the format is not PDB, then ELBOW will generate a PDB
                      file.
      input_ligand_compare_file= None If you enter a PDB file with a ligand in
                                 it, the coordinates of the newly-built ligand
                                 will be compared with the coordinates in this
                                 file.
      input_partial_model_file= None Enter a PDB file containing a model of
                                your structure without the ligand. This is
                                used to calculate phases. If you are providing
                                phases in your data file and have selected
                                "pre_calculated_map_coeffs" for map_type this
                                file may be left out.
   non_user_parameters
      get_lig_volume= False  You can ask to get the volume of the ligand and
                      to then stop
      offsets_list= 7 53 29 You can specify an offset for the orientation of
                    the helix and strand templates in building. This is used
                    in generating different starting models.
   refinement
      link_distance_cutoff= 3.0 You can specify the maximum bond distance for
                            linking residues in phenix.refine called from the
                            wizards.
      r_free_flags_fraction= 0.1 Maximum fraction of reflections in the free R
                             set. You can choose the maximum fraction of
                             reflections in the free R set and the maximum
                             number of reflections in the free R set. The
                             number of reflections in the free R set will be
                             up the lower of the values defined by these two
                             parameters.
      r_free_flags_lattice_symmetry_max_delta= 5.0 You can set the maximum
                                               deviation of distances in the
                                               lattice that are to be
                                               considered the same for
                                               purposes of generating a
                                               lattice-symmetry-unique set of
                                               free R flags.
      r_free_flags_max_free= 2000 Maximum number of reflections in the free R
                             set. You can choose the maximum fraction of
                             reflections in the free R set and the maximum
                             number of reflections in the free R set. The
                             number of reflections in the free R set will be
                             up the lower of the values defined by these two
                             parameters.
      r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
                                         can decide whether to include lattice
                                         symmetry (good in general, necessary
                                         if there is twinning).
   search_parameters
      conformers= 1 Enter how many conformers to create. If greater than 1,
                  then ELBOW will always be used to generate them. If 1 then
                  ELBOW will be used if a PDB file is not specified. These
                  conformers are used to identify allowed torsion angles for
                  your ligand. The alternative is to use the empirical rules
                  in RESOLVE. ELBOW takes longer but is more accurate. 
      delta_phi_ligand= 40.0 Specify the angle (degrees) between successive
                        tries in FFT search for fragments
      fit_phi_inc= 20 Specify the angle (degrees) between rotations around
                   bonds
      fit_phi_range= -180 180 Range of bond rotation angles to search
      group_search= 0 Enter the ID number of the group from the ligand to use
                    to seed the search for conformations
      ligand_cc_min= 0.75 Enter the minimum correlation coefficient of the
                     ligand to the map to quit searching for more
                     conformations
      ligand_completeness_min= 1.0 Enter the minimum completeness of the
                               ligand to the map to quit searching for more
                               conformations
      local_search= True If local_search is Yes then, only the region within
                    search_dist of the point in the map with the highest local
                    rmsd will be searched in the FFT search for fragments
      n_group_search= 3 Enter the number of different fragments of the ligand
                      that will be looked for in FFT search of the map
      n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at
                         once otherwise all are first searched at once then
                         individually up to the number specified 
      n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
                         tested at once otherwise all are first tested at once
                         then individually up to the number specified 
      number_of_ligands= 1 Number of copies of the ligand expected in the
                         asymmetric unit
      search_dist= 10.0 If local_search is Yes then, only the region within
                   this distance of the point in the map with the highest
                   local rmsd will be searched in the FFT search for fragments
      use_cc_local= False  You can specify the use of a local correlation
                    coefficient for scoring ligand fits to the map. If you do
                    not do this, then the region over which the ligand is
                    scored are all points within 2.5 A of the atoms in the
                    ligand.  If you do specify use_cc_local, then the region
                    over which the ligand is scored are all these points, plus
                    all the contingous points that have density greater than
                    0.5 * sigma .
   search_target
      ligand_near_chain= None  You can specify where to search for the ligand
                         either with search_center or with ligand_near_res and
                         ligand_near_chain. If you set
                         ligand_near_chain="None" or leave it blank or do not
                         set it, then all chains will be included. The
                         keywords ligand_near_res and ligand_near_chain refer
                         to residue/chain in the file defined by
                         input_partial_model_file (or model if running from
                         command line). 
      ligand_near_pdb= None You can specify where LigandFit should look for
                       your ligands by providing a PDB file containing one or
                       more copies of the ligand. If you want you can provide
                       a PDB file with ligand+ macromolecule and specify the
                       ligand name with name_of_ligand_near_pdb. 
      ligand_near_res= None  You can specify where to search for the ligand
                       either with search_center or with ligand_near_res and
                       ligand_near_chain The keywords ligand_near_res and
                       ligand_near_chain refer to residue/chain in the file
                       defined by input_partial_model_file (or model if
                       running from command line). 
      name_of_ligand_near_pdb= None You can specify where LigandFit should
                               look for your ligands by providing a PDB file
                               containing one or more copies of the ligand. If
                               you want you can provide a PDB file with
                               ligand+ macromolecule and specify the ligand
                               name with name_of_ligand_near_pdb. 
      search_center= 0.0 0.0 0.0 Enter coordinates for center of search region
                     (ignored if [0,0,0])