phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

phenix.find_alt_orig_sym_mate: Identify equivalent MR solutions of the same dataset irrespective of origin and symmetry operation

Author(s)
Purpose
Usage
Examples of PHIL input
Output
Do the MR solutions match?
Algorithm
Also move HETATM (hetero atoms)
Debug mode
Caveats
Changes
Literature
Additional information
List of all find_alt_orig_sym_mate keywords

Author(s)

  • phenix.find_alt_orig_sym_mate: Robert D. Oeffner
  • PHENIX GUI: Nat Echols

Purpose

For different molecular replacement (MR) solutions from the same dataset phenix.find_alt_orig_sym_mate.py finds for each chain in the moving solution the copy closest to chains in the fixed solution with respect to all symmetry operations and alternative origin shifts permitted by the spacegroup of the crystal.

Usage

phenix.find_alt_orig_sym_mate can be run from the command-line with PHIL input in either a text file or as keywords like

phenix.find_alt_orig_sym_mate moving_pdb=pdbfile1 fixed_pdb=pdbfile2
or
phenix.find_alt_orig_sym_mate my_phil_input.txt.
The PHIL input specifies the MR solution files either as "moving_pdb" and "fixed_pdb" or as the scopes, "moving" and "fixed". Both these scopes must hold valid content. The two scopes hold the parameter "xyzfname" and the sub-scopes "mrsolution" and "pickle_solution". For a scope to be valid one and only of the methods (1), (2) and (3) must be followed:
  1. Assign the "moving_pdb" or "fixed_pdb" parameter to the pdb file from one of the MR solutions in question. This is useful for simple cases such as when the solution comes from the PHENIX AutoMR GUI is or when it is from a different MR program other than Phaser.
  2. Specify a Phaser MR solution by assigning the file name of the solution file to the "moving.mrsolution.solfname" or "fixed.mrsolution.solfname" parameter as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters "moving.mrsolution.ensembles.name" or  "fixed.mrsolution.ensembles.name" and "moving.mrsolution.ensembles.xyzfname" or "fixed.mrsolution.ensembles.xyzfname" respectively. Multiple components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line.
  3. Use a solution from the PhaserMR GUI in PHENIX. In that case the parameter "moving.pickle_solution.pklfname" or "fixed.pickle_solution.pklfname" is assigned to the solution file that is produced by PHENIX after an MR calculation. The "moving.pickle_solution.philfname" or "fixed.pickle_solution.philfname" is then assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.
There is no restriction on what method to use for the fixed scope depending on the method used on the moving scope, and vice versa. If space group and the unit cell dimensions is not available when for instance (2) is used for both the moving and the fixed scopes then these need to be specified by assigning the parameter, "spacegroupfname", to a PDB file with a CRYST1 record or to an MTZ file with that information. Typically this would be the data file used for the molecular replacement calculation.

Examples of PHIL input

Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phenix.find_alt_orig_sym_mate are given below. Testing a command-line Phaser MR solution file against a solution specified as a PDB file:

    AltOrigSymMates.fixed.mrsolution
    {
      solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
      ensembles
      {
        name = "MR_2P82_A0"
        xyzfname = "testdata/sculpt_2P82_A0.pdb"
      }
      ensembles
      {
        name = "MR_3ECI_A0"
        xyzfname = "testdata/sculpt_3ECI_A0.pdb"
      }
    }
    AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:
    AltOrigSymMates.fixed.pickle_solution
    {
      philfname = "testdata/phaser_mr_13.eff"
      pklfname = "testdata/phaser_mr_13.pkl"
    }
    
    AltOrigSymMates.moving.pickle_solution
    {
      philfname = "testdata/phaser_mr_11.eff"
      pklfname = "testdata/phaser_mr_11.pkl"
    }
Testing two solution files from the command-line version of Phaser against one another:
    AltOrigSymMates.fixed.mrsolution
    {
      solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
      ensembles
      {
        name = "MR_2P82_A0"
        xyzfname = "testdata/sculpt_2P82_A0.pdb"
      }
      ensembles
      {
        name = "MR_3ECI_A0"
        xyzfname = "testdata/sculpt_3ECI_A0.pdb"
      }
    }
    
    AltOrigSymMates.moving.mrsolution
    {
      solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"
      ensembles
      {
        name = "MR_2P82_A0"
        xyzfname = "testdata/sculpt_2P82_A0.pdb"
      }
      ensembles
      {
        name = "MR_2ZPN_A0"
        xyzfname = "testdata/sculpt_2ZPN_A0.pdb"
      }
    }
    AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"

For more information on the PHIL input see the bottom of this page.

Output

The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phenix.find_alt_orig_sym_mate will output mulitple log files corresponding to each MR solution.

Do the MR solutions match?

A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program. If the structures contain more chains and phenix.find_alt_orig_sym_mate did not place them all on the same origin a warning will be printed.

Algorithm

phenix.find_alt_orig_sym_mate computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving_pdb and fixed_pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:

MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1)))),

where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. MLAD is used as the distance measure. MLAD will downplay the contributions of spatially distant C-alpha atom pairs which may otherwise lead to incorrect matches when the space group has a floating origin and the chains tested against one another consists of large unaligned domains. phenix.find_alt_orig_sym_mate.py will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving_pdb.

Also move HETATM (hetero atoms)

Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains. After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.

Debug mode

Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".

If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.

Caveats

The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.

The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.

Changes

The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.

Literature

Algorithms for deriving crystallographic space-group information R. W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel* and K. Henrick Acta Cryst. D60, 2256-2268 (2004)

Additional information

List of all find_alt_orig_sym_mate keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
AltOrigSymMates
   moving_pdb= None path to a PDB file from an MR solution
   fixed_pdb= None path to a PDB file from an MR solution
   spacegroupfname= None A PDB file with a CRYST1 record or an MTZ file
                    providing the space group and cell dimension for this
                    computation.
   chosenorigin= "Any" By default all allowed origin shifts are probed.
                 Specifying "2/3, 1/3, 1/2" tells the program to
                 only explore symmetry operations for the unit cells displaced
                 at fractional positions (2/3, 1/3, 1/2)
   use_all_SSM= False By default only the best sequence alignment from SSM is
                used. Setting this value to True will include other suboptimal
                alignments for testing.
   no_symmetry_operations= False When set to False all symmetry operations are
                           probed during the computation.
   movehetatms= False Move hetatoms close to a chain in conjunction with that
                chain. Hetatoms are first associated with individual chains
                with phenix.sort_hetatms before undergoing the same spatial
                transformation as the chains.
   debug= False
   verbose= False
   gui_output_dir= None For PHENIX GUI only.
   job_title= None Job title in PHENIX GUI, not used on command line
   moving Specify either mrsolution or pickle_solution.
      mrsolution MR solution produced by command-line version of Phaser
         solfname= None path to the solution file, i.e. the .sol file
         ensembles Ensembles constituent in this MR solution. Typically these
                   are chains in the PDB file.
            name= None ID for ensemble in the MR solution, i.e. the name after
                  the "SOLU 6DIM ENSE" string in the .sol file.
            xyzfname= None Path to the PDB file with atoms representing this
                      ensemble. This is the search model used by Phaser for
                      this ensemble prior to solving the structure. It is not
                      the solved structure.
      pickle_solution Phaser MR solution from PHENIX GUI
         pklfname= None PHENIX GUI solution file from Phaser-MR, i.e. the .pkl
                   file
         philfname= None PHENIX GUI configuration file used as input prior to
                    Phaser-MR, i.e. the .eff file
   fixed Specify either mrsolution or pickle_solution.
      mrsolution MR solution produced by command-line version of Phaser
         solfname= None path to the solution file, i.e. the .sol file
         ensembles Ensembles constituent in this MR solution. Typically these
                   are chains in the PDB file.
            name= None ID for ensemble in the MR solution, i.e. the name after
                  the "SOLU 6DIM ENSE" string in the .sol file.
            xyzfname= None Path to the PDB file with atoms representing this
                      ensemble. This is the search model used by Phaser for
                      this ensemble prior to solving the structure. It is not
                      the solved structure.
      pickle_solution Phaser MR solution from PHENIX GUI
         pklfname= None PHENIX GUI solution file from Phaser-MR, i.e. the .pkl
                   file
         philfname= None PHENIX GUI configuration file used as input prior to
                    Phaser-MR, i.e. the .eff file