phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

Author(s)
Purpose
Usage
How simple_ncs_from_pdb works:
Additional notes on how simple_ncs_from_pdb works:
Output files from simple_ncs_from_pdb
Examples
Standard run of simple_ncs_from_pdb:
Possible Problems
Specific limitations and problems:
Literature
Additional information
List of all simple_ncs_from_pdb keywords

Author(s)

  • simple_ncs_from_pdb : Tom Terwilliger
  • Phil command interpreter: Ralf W. Grosse-Kunstleve
  • find_domain: Peter Zwart

Purpose

The simple_ncs_from_pdb method identifies NCS in the chains in a PDB file and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild Wizards.

Usage

How simple_ncs_from_pdb works:

The basic steps that the simple_ncs_from_pdb carries out are:

  • (1) Identify sets of chains in the PDB file that have the same sequences. These are potential NCS-related chains.

  • (2) Determine which chains in a group actually are related by NCS within a given tolerance (max_rmsd, typically 2 A)

  • (3) Determine which residues in each chain are related by NCS, and break the chains into domains that do follow NCS if necessary.

  • (4) Determine the NCS operators for all chains in each NCS group or domain

Additional notes on how simple_ncs_from_pdb works:

The matching of chains is done in a first quick pass by calling simple_ncs_from_pdb recursively and only using every 10th residue in the analysis. This allows a check of whether chains that have the same sequence really have the same structure or whether some such chains should be in separate NCS groups. The use of only every 10th residue allows time for an all-against all matching of chains.

If residue numbers are not the same for corresponding chains, but they are simply offset by a constant for each chain, this will be recognized and the chains will be aligned.

An assumption in simple_ncs_from_pdb is that residue numbers are consistent among chains. They do not have to be the same: chain A can be residues 1-100 and chain B 211-300. However chain A cannot be residues 1-10 and 20-50, matching to chain B residues 1-10 and 21-51.

Residue numbers are used to align pairs of chains, maximizing identities of matching pairs of residues. Pairs of chains that can match are identified.

Groupings of chains are chosen that maximize the number of matching residues between each member of a group and the first (reference) member of the group.

For a pair of chains, some segments may match and others not. Each pair of segments must have a length at least as long as min_length and a percent identity at least as high as min_percent. A pair of segments may not end in a mismatch. An overall pair of chains must have an rmsd of CA atoms of less than or equal to rmsd_max.

If find_invariant_domain is specified then once all chains that can be matched with the above algorithm are identified, all remaining chains are matched, allowing the break-up of chains into invariant domains. The invariant domains each get a separate NCS group.

Output files from simple_ncs_from_pdb

The output files that are produced are:

  • NCS operators written in format for phenix.refine
    simple_ncs_from_pdb.ncs
    

  • NCS operators written in format for the PHENIX Wizards
    simple_ncs_from_pdb.ncs_spec
    

Examples

Standard run of simple_ncs_from_pdb:

Running simple_ncs_from pdb is easy. For example, you can type:

phenix.simple_ncs_from_pdb anb.pdb

Simple_ncs_from_pdb will analyze the chains in anb.pdb and identify any NCS that exists. For this sample run the following output is produced:

Chains in this PDB file:  ['A', 'N', 'B']
GROUPS BASED ON QUICK COMPARISON: [['A', 'B']]
Looking for invariant domains for ...: ['A', 'N', 'B'] [[[2, 525]], 
[[2, 259], [290, 525]], [[20, 525]]]

There were 3 chains in the PDB file A, N and B. Chains A and B were very similar and clearly related by NCS. This relationship was found in a quick comparison. Chain N had the same sequence as A and B, but was not in the identical comparison. Searching for domains that did have NCS among all three chains produced three domains, represented below by 4 NCS groups:

GROUP 1
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[2, 5], [20, 35], 
[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516], 
[520, 525]], [[2, 5], [20, 35], [60, 76], [78, 107], [110, 137], 
[401, 431], [433, 483], [485, 516], [520, 525]], [[2, 5], [20, 35], 
[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516], 
[520, 525]]]]
RMSD (A) from chain A:  0.0  1.09  0.07
Number of residues matching chain A:[215, 215, 194]
Source of NCS info: anb.pdb

The residues in chains A, B, and N in this group are 2-5, 20-35, 60-76, 78-107, 110-137, 401-431, 433-483, 485-516 and 520-525. Note that these are not all contiguous. These are all the residues that all have the same relationships among the 3 chains. The RMSD of CA atoms between chains A and N is 1.09 A and between A and B is 0.07 A.

The NCS operators relating these domains are given below.

OPERATOR 1
CENTER:   29.9208  -53.3304  -13.4779

ROTA 1:    1.0000    0.0000    0.0000
ROTA 2:    0.0000    1.0000    0.0000
ROTA 3:    0.0000    0.0000    1.0000
TRANS:     0.0000    0.0000    0.0000

OPERATOR 2
CENTER:   32.5410  -35.4227   20.2768

ROTA 1:    0.9370   -0.2825    0.2053
ROTA 2:   -0.3285   -0.9125    0.2439
ROTA 3:    0.1184   -0.2960   -0.9478
TRANS:   -14.7410  -79.9073   -8.5967

OPERATOR 3
CENTER:   50.0256  -91.8920  -13.6461

ROTA 1:    0.6257    0.7800   -0.0037
ROTA 2:   -0.7800    0.6257   -0.0010
ROTA 3:    0.0015    0.0035    1.0000
TRANS:    70.3889   42.4760    0.3937

GROUP 2
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[6, 9], 
[56, 59], [517, 519]], [[6, 9], [56, 59], [517, 519]], [[6, 9], 
[56, 59], [517, 519]]]]
RMSD (A) from chain A:  0.0  0.48  0.03
Number of residues matching chain A:[11, 11, 11]
Source of NCS info: anb.pdb

OPERATOR 1
CENTER:   47.5037  -61.5641  -11.2751

ROTA 1:    1.0000    0.0000    0.0000
ROTA 2:    0.0000    1.0000    0.0000
ROTA 3:    0.0000    0.0000    1.0000
TRANS:     0.0000    0.0000    0.0000

OPERATOR 2
CENTER:   51.8984  -33.6038   20.9877

ROTA 1:    0.9367   -0.2981    0.1836
ROTA 2:   -0.3113   -0.9492    0.0469
ROTA 3:    0.1603   -0.1011   -0.9819
TRANS:   -14.9810  -78.2888   -2.3823

OPERATOR 3
CENTER:   66.8308  -82.9508  -11.4633

ROTA 1:    0.6255    0.7802   -0.0016
ROTA 2:   -0.7802    0.6255   -0.0025
ROTA 3:   -0.0009    0.0028    1.0000
TRANS:    70.3999   42.4366    0.4815

GROUP 3
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[193, 255], 
[257, 259], [290, 355], [357, 374]], [[193, 255], [257, 259], 
[290, 355], [357, 374]], [[193, 255], [257, 259], [290, 355], [357, 374]]]]
RMSD (A) from chain A:  0.0  0.61  0.01
Number of residues matching chain A:[150, 150, 150]
Source of NCS info: anb.pdb

OPERATOR 1
CENTER:   36.1219  -37.6124  -62.1437

ROTA 1:    1.0000    0.0000    0.0000
ROTA 2:    0.0000    1.0000    0.0000
ROTA 3:    0.0000    0.0000    1.0000
TRANS:     0.0000    0.0000    0.0000

OPERATOR 2
CENTER:   39.1403  -33.0801   60.7270

ROTA 1:    0.7650    0.3808   -0.5194
ROTA 2:    0.0664   -0.8488   -0.5245
ROTA 3:   -0.6406    0.3668   -0.6746
TRANS:    50.3180  -36.4383   16.0299

OPERATOR 3
CENTER:   40.9347  -76.7723  -62.2004

ROTA 1:    0.5942    0.8043   -0.0007
ROTA 2:   -0.8043    0.5942   -0.0064
ROTA 3:   -0.0047    0.0043    1.0000
TRANS:    73.5084   40.5311    0.5807

GROUP 4
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[36, 41]], 
[[36, 41]], [[36, 41]]]]
RMSD (A) from chain A:  0.0  0.22  0.03
Number of residues matching chain A:[6, 6, 6]
Source of NCS info: anb.pdb

OPERATOR 1
CENTER:   45.4522  -37.4720  -14.4660

ROTA 1:    1.0000    0.0000    0.0000
ROTA 2:    0.0000    1.0000    0.0000
ROTA 3:    0.0000    0.0000    1.0000
TRANS:     0.0000    0.0000    0.0000

OPERATOR 2
CENTER:   42.1483  -55.6520   24.0535

ROTA 1:    0.9444   -0.3074    0.1171
ROTA 2:   -0.2975   -0.9501   -0.0940
ROTA 3:    0.1402    0.0540   -0.9887
TRANS:   -14.2728  -75.5420    6.4099

OPERATOR 3
CENTER:   46.7900  -69.5227  -14.6653

ROTA 1:    0.6247    0.7809   -0.0013
ROTA 2:   -0.7809    0.6247    0.0028
ROTA 3:    0.0030   -0.0008    1.0000
TRANS:    70.4964   42.5349    0.0067

NCS operators written in format for resolve to: simple_ncs_from_pdb.resolve
NCS operators written in format for phenix.refine to: simple_ncs_from_pdb.ncs
NCS written as ncs object information to: simple_ncs_from_pdb.ncs_spec

Possible Problems

Specific limitations and problems:

  • If user specifies chains to be in a suggested NCS group, but they are too dissimilar as a whole (rmsd > max_rmsd_use) then the group is rejected even if some fragment of the chains could be similar.

  • Chain specification from suggested_ncs_groups could in principle be have than one chain in one group...and simple_ncs_from_pdb can only use suggested groups that consist of N copies of single chains.

Literature

Additional information

List of all simple_ncs_from_pdb keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
find_ncs
   temp_dir= "" temporary directory (it must exist if you define it)
   min_length= 10 minimum number of matching residues in a segment
   njump= 1 Take every njumpth residue instead of each 1
   njump_recursion= 10 Take every njump_recursion residue instead of each 1 on
                    recursive call
   min_length_recursion= 50 minimum number of matching residues in a segment
                         for recursive call
   min_percent= 95. min percent identity of matching residues
   max_rmsd= 2. max rmsd of 2 chains. If 0, then only search for domains
   quick= True If quick is set and all chains match, just look for 1 NCS group
   max_rmsd_user= 3. max rmsd of chains suggested by user (i.e., if called
                  from phenix.refine with suggested ncs groups)
   verbose= False Verbose output
   domain_finding_parameters
      find_invariant_domains= True Find the parts of a set of chains that
                              follow NCS
      initial_rms= 0.5 Guess of RMS among chains
      match_radius= 2.0 Keep atoms that are within match_radius of NCS-related
                    atoms
      similarity_threshold= 0.75 Threshold for similarity between segments
      smooth_length= 0 two segments separated by smooth_length or less get
                     connected
      min_contig_length= 3 segments < min_contig_length rejected
      min_fraction_domain= 0.2 domain must be this fraction of a chain
      max_rmsd_domain= 2. max rmsd of domains