Finding NCS in chains from a PDB file with simple_ncs_from_pdb

Author(s)

Purpose

The simple_ncs_from_pdb method identifies NCS in the chains in a PDB file and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild Wizards.

Usage

How simple_ncs_from_pdb works:

The basic steps that the simple_ncs_from_pdb carries out are:

  1. Remove part of the structure specified in in exclude_selection parameter.
  2. Identify sets of matching segments in chains in the PDB file by sequences (use value of chain_similarity_threshold as cutoff). These are potential NCS-related chains.
  3. Determine which matching segments have overall rms distance (RMSD) within the given tolerance (chain_max_rmsd, typically 2 A)
  4. Remove residues from matching segments if locally spatially misaligned (residue_match_radius)
  5. Remove atoms from matching residues if not in both residues (for example, if a side chain is missing in one of the matching residues)
  6. Check atoms order residues

Additional notes on how simple_ncs_from_pdb works:

The chains matching is done using dynamic programming alignment of residues and atoms. The first pass contains restriction on minimal similarity, set by min/_percent (number of matching residues)/(number of residues in longer chain)

From the matching chains list, we remove chain pairs where the matching segments exceed the RMSD limit.

Matching segments are scanned for local residue misalignment. Residues where (max_atom_distance - min_atom_distance) > match/_radius are excluded from matching segment. This allow local differences in matching chains.

If matching residues have different number of atoms (For example if one containing the side chain while the other not), only the matching atoms will be included.

Grouping of chains is not performed.

Alternative conformation are excluded from matching segments.

The matching is done by the residues name strings, not by the residue numbers, this allows handling of insertions in PDB file.

The result of the NCS search is combination of NCS related groups and invariant or non-NCS related regions, to the atom level. In every NCS group all copies have the same number of atoms and can be reproduced by applying the applying the appropriate rotation and translation to the master copy.

Examples

When running

phenix.simple_ncs_from_pdb 2h50.pdb

The following files will be produced

2h50_simple_ncs_from_pdb.phil
2h50_simple_ncs_from_pdb.ncs_spec
2h50_simple_ncs_from_pdb.resolve

The file that should be used for refinement is 4boz_simple_ncs_from_pdb.phil. This file can also be modified if a particular NCS relations need to be changed. The content of that file is the exact selection sting of the atoms in the NCS groups

The content of 2h50_simple_ncs_from_pdb.ncs is

ncs_group {
  reference = chain 'A'
  selection = chain 'C'
  selection = chain 'E'
  selection = chain 'G'
  selection = chain 'I'
  selection = chain 'K'
  selection = chain 'M'
  selection = chain 'O'
  selection = chain 'Q'
  selection = chain 'S'
  selection = chain 'U'
  selection = chain 'W'
}

Other outputs

To get output in other format:

phenix.simple_ncs_from_pdb 4boz.pdb write_spec_files=True

Simple_ncs_from_pdb will analyze the chains in 4boz.pdb and identify any NCS that exists. For this sample run the following output is produced:

GROUP 1
Summary of NCS group with 2 operators:
ID of chain/residue where these apply: [['A', 'D'], [[[147, 150], [152, 211],
[213, 275], [280, 305], [307, 308]], [[147, 150], [152, 211], [213, 275], [280, 305], [307, 308]]]]
RMSD (A) from chain A:  0.0  0.72
Number of residues matching chain A:[155, 155]

OPERATOR 1
CENTER:   24.4880  -13.3177  -20.1848

ROTA 1:    1.0000    0.0000    0.0000
ROTA 2:    0.0000    1.0000    0.0000
ROTA 3:    0.0000    0.0000    1.0000
TRANS:     0.0000    0.0000    0.0000

OPERATOR 2
CENTER:   15.9430   11.8822    0.6609

ROTA 1:    0.7964   -0.5651    0.2152
ROTA 2:   -0.5503   -0.8249   -0.1295
ROTA 3:    0.2507   -0.0152   -0.9680
TRANS:    18.3631    5.3425  -23.3604


GROUP 2
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['B', 'C', 'E'], [[[1, 41], [43, 71]],
[[1, 41], [43, 71]], [[1, 41], [43, 71]]]]
RMSD (A) from chain B:  0.0  0.8  0.77
Number of residues matching chain B:[70, 70, 70]

OPERATOR 1
CENTER:   47.5581   -9.5652  -26.8403

ROTA 1:    1.0000    0.0000    0.0000
ROTA 2:    0.0000    1.0000    0.0000
ROTA 3:    0.0000    0.0000    1.0000
TRANS:     0.0000    0.0000    0.0000

OPERATOR 2
CENTER:   27.7410  -11.0237  -49.7361

ROTA 1:   -0.6661    0.2615   -0.6986
ROTA 2:    0.2866   -0.7749   -0.5634
ROTA 3:   -0.6886   -0.5755    0.4412
TRANS:    34.1743  -54.0788    7.8610

OPERATOR 3
CENTER:   29.3812   -4.6379   12.2671

ROTA 1:    0.7539   -0.5901    0.2888
ROTA 2:   -0.5860   -0.8027   -0.1106
ROTA 3:    0.2971   -0.0858   -0.9510
TRANS:    19.1286    5.2864  -24.3021

Another way to view the results is

phenix.simple_ncs_from_pdb 4boz.pdb show_summary=true

Chains in model:
---------------------------------------------------
A    B    C    D    E
. . . . . . . . . . . . . . . . . . . . . . . . . .

NCS summary:
---------------------------------------------------
Number of NCS groups     :   2
Group #                  :   1
Number of copies         :   2
Chains in master         :   'A'
Chains in copies         :   'D'
Group #                  :   2
Number of copies         :   3
Chains in master         :   'B'
Chains in copies         :   'C', 'E'
. . . . . . . . . . . . . . . . . . . . . . . . . .

Transforms:
---------------------------------------------------
Group #                  :   1
Transform #              :   1
RMSD                     :   0
ROTA   0    1.0000    0.0000    0.0000
ROTA   1    0.0000    1.0000    0.0000
ROTA   2    0.0000    0.0000    1.0000
TRANS       0.0000    0.0000    0.0000
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Transform #              :   2
RMSD                     :   0.720792956817
ROTA   0    0.7964   -0.5503    0.2507
ROTA   1   -0.5651   -0.8249   -0.0152
ROTA   2    0.2152   -0.1295   -0.9680
TRANS      -5.8295   14.4282  -25.8708
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Group #                  :   2
Transform #              :   1
RMSD                     :   0
ROTA   0    1.0000    0.0000    0.0000
ROTA   1    0.0000    1.0000    0.0000
ROTA   2    0.0000    0.0000    1.0000
TRANS       0.0000    0.0000    0.0000
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Transform #              :   2
RMSD                     :   0.796197645664
ROTA   0   -0.6661    0.2866   -0.6886
ROTA   1    0.2615   -0.7749   -0.5755
ROTA   2   -0.6986   -0.5634    0.4412
TRANS      43.6766  -46.3175  -10.0616
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Transform #              :   3
RMSD                     :   0.772963874518
ROTA   0    0.7539   -0.5860    0.2971
ROTA   1   -0.5901   -0.8027   -0.0858
ROTA   2    0.2888   -0.1106   -0.9510
TRANS      -4.1022   13.4462  -28.0502

There are 5 chains in the PDB file (A,B,C,D,E). In the first group the master is A the the copy D and in the second group the master is B and the copy is C and E. Chain C is not in any group.

RMSD (A) from chain A:  0.0  0.72

shows the RMSD of matching atoms between the master and every other copy, The list of numbers is the list of matching residues by residue number. Note that this is not the exact selection, as it appears in: 4boz_simple_ncs_from_pdb.ncs.

ROTA and TRANS are the rotation and translation information. In the .ncs_spec file and the default on-screen representation of the results, the rotation and translation are:

Master = ROT x Copy + TRANS

While in the summery and in the CCBTX implementation the coomon use is:

Copy = ROT x Master + TRANS

So the rotation/Translation are the inverse of each other in the two formats.

A portion of the contents of the 4boz_simple_ncs_from_pdb.ncs_spec file, which you can edit if you want and which you can use in the AutoBuild Wizard, are shown below. NOTE: The ncs operators describe how to map the N'th ncs-related copy on to the first copy.

Summary of NCS information
Thu Apr  2 15:44:03 2015
/net/cci-filer2/raid1/home/...

new_ncs_group
new_operator

rota_matrix    1.0000    0.0000    0.0000
rota_matrix    0.0000    1.0000    0.0000
rota_matrix    0.0000    0.0000    1.0000
tran_orth     0.0000    0.0000    0.0000

center_orth   24.4880  -13.3177  -20.1848
CHAIN A
RMSD 0
MATCHING 155
  RESSEQ 147:150
  RESSEQ 152:211
  RESSEQ 213:275
  RESSEQ 280:305
  RESSEQ 307:308

new_operator

rota_matrix    0.7955   -0.5660    0.2164
rota_matrix   -0.5511   -0.8242   -0.1300
rota_matrix    0.2520   -0.0159   -0.9676
tran_orth    18.4021    5.3569  -23.3621

center_orth   15.9430   11.8822    0.6609
CHAIN D
RMSD 0.817
MATCHING 155
  RESSEQ 147:150
  RESSEQ 152:211
  RESSEQ 213:275
  RESSEQ 280:305
  RESSEQ 307:308

Possible Problems

Specific limitations and problems:

Master = ROT x Copy + TRANS

and not:

Copy = ROT x Master + TRANS

They are the inverse of the rotation and translation that are used in the implementation of the NCS relation.

Literature

Additional information

List of all available keywords