Finding NCS in chains from a PDB file with simple_ncs_from_pdb

Author(s)

Purpose

The simple_ncs_from_pdb method identifies NCS in the chains in a PDB file and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild Wizards.

Usage

How simple_ncs_from_pdb works:

The basic steps that the simple_ncs_from_pdb carries out are:

Additional notes on how simple_ncs_from_pdb works:

The matching of chains is done in a first quick pass by calling simple_ncs_from_pdb recursively and only using every 10th residue in the analysis. This allows a check of whether chains that have the same sequence really have the same structure or whether some such chains should be in separate NCS groups. The use of only every 10th residue allows time for an all-against all matching of chains.

If residue numbers are not the same for corresponding chains, but they are simply offset by a constant for each chain, this will be recognized and the chains will be aligned.

An assumption in simple_ncs_from_pdb is that residue numbers are consistent among chains. They do not have to be the same: chain A can be residues 1-100 and chain B 211-300. However chain A cannot be residues 1-10 and 20-50, matching to chain B residues 1-10 and 21-51.

Residue numbers are used to align pairs of chains, maximizing identities of matching pairs of residues. Pairs of chains that can match are identified.

Groupings of chains are chosen that maximize the number of matching residues between each member of a group and the first (reference) member of the group.

For a pair of chains, some segments may match and others not. Each pair of segments must have a length at least as long as min_length and a percent identity at least as high as min_percent. A pair of segments may not end in a mismatch. An overall pair of chains must have an rmsd of CA atoms of less than or equal to rmsd_max.

If find_invariant_domain is specified then once all chains that can be matched with the above algorithm are identified, all remaining chains are matched, allowing the break-up of chains into invariant domains. The invariant domains each get a separate NCS group.

Output files from simple_ncs_from_pdb

The output files that are produced are:

simple_ncs_from_pdb.ncs
simple_ncs_from_pdb.ncs_spec

Examples

Standard run of simple_ncs_from_pdb:

Running simple_ncs_from pdb is easy. For example, you can type:

phenix.simple_ncs_from_pdb anb.pdb

Simple_ncs_from_pdb will analyze the chains in anb.pdb and identify any NCS that exists. For this sample run the following output is produced:

Chains in this PDB file:  ['A', 'N', 'B']
GROUPS BASED ON QUICK COMPARISON: [['A', 'B']]
Looking for invariant domains for ...: ['A', 'N', 'B'] [[[2, 525]],
[[2, 259], [290, 525]], [[20, 525]]]

There were 3 chains in the PDB file A, N and B. Chains A and B were very similar and clearly related by NCS. This relationship was found in a quick comparison. Chain N had the same sequence as A and B, but was not in the identical comparison. Searching for domains that did have NCS among all three chains produced three domains, represented below by 4 NCS groups:

GROUP 1
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[2, 5], [20, 35],
[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516],
[520, 525]], [[2, 5], [20, 35], [60, 76], [78, 107], [110, 137],
[401, 431], [433, 483], [485, 516], [520, 525]], [[2, 5], [20, 35],
[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516],
[520, 525]]]]
RMSD (A) from chain A:  0.0  1.09  0.07
Number of residues matching chain A:[215, 215, 194]
Source of NCS info: anb.pdb

The residues in chains A, B, and N in this group are 2-5, 20-35, 60-76, 78-107, 110-137, 401-431, 433-483, 485-516 and 520-525. Note that these are not all contiguous. These are all the residues that all have the same relationships among the 3 chains. The RMSD of CA atoms between chains A and N is 1.09 A and between A and B is 0.07 A.

The NCS information is written in three formats:

NCS operators written in format for resolve to: simple_ncs_from_pdb.resolve
NCS operators written in format for phenix.refine to: simple_ncs_from_pdb.ncs
NCS written as ncs object information to: simple_ncs_from_pdb.ncs_spec

The contents of the simple_ncs_from_pdb.ncs_spec file, which you can edit if you want and which you can use in the AutoBuild Wizard, are shown below. NOTE: The ncs operators describe how to map the N'th ncs-related copy on to the first copy.

Summary of NCS information
Thu Apr  9 08:39:48 2009
/Users/Shared/unix/transfer/test





new_ncs_group
new_operator

rota_matrix    1.0000    0.0000    0.0000
rota_matrix    0.0000    1.0000    0.0000
rota_matrix    0.0000    0.0000    1.0000
tran_orth     0.0000    0.0000    0.0000

center_orth   29.9208  -53.3304  -13.4779
CHAIN A
RMSD 0.0
MATCHING 215
  RESSEQ 2:5
  RESSEQ 20:35
  RESSEQ 60:76
  RESSEQ 78:107
  RESSEQ 110:137
  RESSEQ 401:431
  RESSEQ 433:483
  RESSEQ 485:516
  RESSEQ 520:525

new_operator

rota_matrix    0.9370   -0.2825    0.2053
rota_matrix   -0.3285   -0.9125    0.2439
rota_matrix    0.1184   -0.2960   -0.9478
tran_orth   -14.7410  -79.9073   -8.5967

center_orth   32.5410  -35.4227   20.2768
CHAIN N
RMSD 1.09447914951
MATCHING 215
  RESSEQ 2:5
  RESSEQ 20:35
  RESSEQ 60:76
  RESSEQ 78:107
  RESSEQ 110:137
  RESSEQ 401:431
  RESSEQ 433:483
  RESSEQ 485:516
  RESSEQ 520:525

new_operator

rota_matrix    0.6257    0.7800   -0.0037
rota_matrix   -0.7800    0.6257   -0.0010
rota_matrix    0.0015    0.0035    1.0000
tran_orth    70.3889   42.4760    0.3937

center_orth   50.0256  -91.8920  -13.6461
CHAIN B
RMSD 0.0715099139994
MATCHING 194
  RESSEQ 2:5
  RESSEQ 20:35
  RESSEQ 60:76
  RESSEQ 78:107
  RESSEQ 110:137
  RESSEQ 401:431
  RESSEQ 433:483
  RESSEQ 485:516
  RESSEQ 520:525





new_ncs_group
new_operator

rota_matrix    1.0000    0.0000    0.0000
rota_matrix    0.0000    1.0000    0.0000
rota_matrix    0.0000    0.0000    1.0000
tran_orth     0.0000    0.0000    0.0000

center_orth   47.5037  -61.5641  -11.2751
CHAIN A
RMSD 0.0
MATCHING 11
  RESSEQ 6:9
  RESSEQ 56:59
  RESSEQ 517:519

new_operator

rota_matrix    0.9367   -0.2981    0.1836
rota_matrix   -0.3113   -0.9492    0.0469
rota_matrix    0.1603   -0.1011   -0.9819
tran_orth   -14.9810  -78.2888   -2.3823

center_orth   51.8984  -33.6038   20.9877
CHAIN N
RMSD 0.479682710546
MATCHING 11
  RESSEQ 6:9
  RESSEQ 56:59
  RESSEQ 517:519

new_operator

rota_matrix    0.6255    0.7802   -0.0016
rota_matrix   -0.7802    0.6255   -0.0025
rota_matrix   -0.0009    0.0028    1.0000
tran_orth    70.3999   42.4366    0.4815

center_orth   66.8308  -82.9508  -11.4633
CHAIN B
RMSD 0.034689065899
MATCHING 11
  RESSEQ 6:9
  RESSEQ 56:59
  RESSEQ 517:519





new_ncs_group
new_operator

rota_matrix    1.0000    0.0000    0.0000
rota_matrix    0.0000    1.0000    0.0000
rota_matrix    0.0000    0.0000    1.0000
tran_orth     0.0000    0.0000    0.0000

center_orth   36.1219  -37.6124  -62.1437
CHAIN A
RMSD 0.0
MATCHING 150
  RESSEQ 193:255
  RESSEQ 257:259
  RESSEQ 290:355
  RESSEQ 357:374

new_operator

rota_matrix    0.7650    0.3808   -0.5194
rota_matrix    0.0664   -0.8488   -0.5245
rota_matrix   -0.6406    0.3668   -0.6746
tran_orth    50.3180  -36.4383   16.0299

center_orth   39.1403  -33.0801   60.7270
CHAIN N
RMSD 0.610762530957
MATCHING 150
  RESSEQ 193:255
  RESSEQ 257:259
  RESSEQ 290:355
  RESSEQ 357:374

new_operator

rota_matrix    0.5942    0.8043   -0.0007
rota_matrix   -0.8043    0.5942   -0.0064
rota_matrix   -0.0047    0.0043    1.0000
tran_orth    73.5084   40.5311    0.5807

center_orth   40.9347  -76.7723  -62.2004
CHAIN B
RMSD 0.0137641203481
MATCHING 150
  RESSEQ 193:255
  RESSEQ 257:259
  RESSEQ 290:355
  RESSEQ 357:374





new_ncs_group
new_operator

rota_matrix    1.0000    0.0000    0.0000
rota_matrix    0.0000    1.0000    0.0000
rota_matrix    0.0000    0.0000    1.0000
tran_orth     0.0000    0.0000    0.0000

center_orth   45.4522  -37.4720  -14.4660
CHAIN A
RMSD 0.0
MATCHING 6
  RESSEQ 36:41

new_operator

rota_matrix    0.9444   -0.3074    0.1171
rota_matrix   -0.2975   -0.9501   -0.0940
rota_matrix    0.1402    0.0540   -0.9887
tran_orth   -14.2728  -75.5420    6.4099

center_orth   42.1483  -55.6520   24.0535
CHAIN N
RMSD 0.215718434967
MATCHING 6
  RESSEQ 36:41

new_operator

rota_matrix    0.6247    0.7809   -0.0013
rota_matrix   -0.7809    0.6247    0.0028
rota_matrix    0.0030   -0.0008    1.0000
tran_orth    70.4964   42.5349    0.0067

center_orth   46.7900  -69.5227  -14.6653
CHAIN B
RMSD 0.0340725565251
MATCHING 6
  RESSEQ 36:41

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all available keywords