| Python-based Hierarchical ENvironment for Integrated Xtallography |
| Documentation Home |
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
Author(s)
PurposeThe simple_ncs_from_pdb method identifies NCS in the chains in a PDB file and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild Wizards. UsageHow simple_ncs_from_pdb works:The basic steps that the simple_ncs_from_pdb carries out are:
Additional notes on how simple_ncs_from_pdb works:The matching of chains is done in a first quick pass by calling simple_ncs_from_pdb recursively and only using every 10th residue in the analysis. This allows a check of whether chains that have the same sequence really have the same structure or whether some such chains should be in separate NCS groups. The use of only every 10th residue allows time for an all-against all matching of chains. If residue numbers are not the same for corresponding chains, but they are simply offset by a constant for each chain, this will be recognized and the chains will be aligned. An assumption in simple_ncs_from_pdb is that residue numbers are consistent among chains. They do not have to be the same: chain A can be residues 1-100 and chain B 211-300. However chain A cannot be residues 1-10 and 20-50, matching to chain B residues 1-10 and 21-51. Residue numbers are used to align pairs of chains, maximizing identities of matching pairs of residues. Pairs of chains that can match are identified. Groupings of chains are chosen that maximize the number of matching residues between each member of a group and the first (reference) member of the group. For a pair of chains, some segments may match and others not. Each pair of segments must have a length at least as long as min_length and a percent identity at least as high as min_percent. A pair of segments may not end in a mismatch. An overall pair of chains must have an rmsd of CA atoms of less than or equal to rmsd_max. If find_invariant_domain is specified then once all chains that can be matched with the above algorithm are identified, all remaining chains are matched, allowing the break-up of chains into invariant domains. The invariant domains each get a separate NCS group. Output files from simple_ncs_from_pdbThe output files that are produced are:
ExamplesStandard run of simple_ncs_from_pdb:Running simple_ncs_from pdb is easy. For example, you can type: phenix.simple_ncs_from_pdb anb.pdb Simple_ncs_from_pdb will analyze the chains in anb.pdb and identify any NCS that exists. For this sample run the following output is produced: Chains in this PDB file: ['A', 'N', 'B'] GROUPS BASED ON QUICK COMPARISON: [['A', 'B']] Looking for invariant domains for ...: ['A', 'N', 'B'] [[[2, 525]], [[2, 259], [290, 525]], [[20, 525]]] There were 3 chains in the PDB file A, N and B. Chains A and B were very similar and clearly related by NCS. This relationship was found in a quick comparison. Chain N had the same sequence as A and B, but was not in the identical comparison. Searching for domains that did have NCS among all three chains produced three domains, represented below by 4 NCS groups: GROUP 1 Summary of NCS group with 3 operators: ID of chain/residue where these apply: [['A', 'N', 'B'], [[[2, 5], [20, 35], [60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516], [520, 525]], [[2, 5], [20, 35], [60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516], [520, 525]], [[2, 5], [20, 35], [60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516], [520, 525]]]] RMSD (A) from chain A: 0.0 1.09 0.07 Number of residues matching chain A:[215, 215, 194] Source of NCS info: anb.pdb The residues in chains A, B, and N in this group are 2-5, 20-35, 60-76, 78-107, 110-137, 401-431, 433-483, 485-516 and 520-525. Note that these are not all contiguous. These are all the residues that all have the same relationships among the 3 chains. The RMSD of CA atoms between chains A and N is 1.09 A and between A and B is 0.07 A. The NCS operators relating these domains are given below. OPERATOR 1 CENTER: 29.9208 -53.3304 -13.4779 ROTA 1: 1.0000 0.0000 0.0000 ROTA 2: 0.0000 1.0000 0.0000 ROTA 3: 0.0000 0.0000 1.0000 TRANS: 0.0000 0.0000 0.0000 OPERATOR 2 CENTER: 32.5410 -35.4227 20.2768 ROTA 1: 0.9370 -0.2825 0.2053 ROTA 2: -0.3285 -0.9125 0.2439 ROTA 3: 0.1184 -0.2960 -0.9478 TRANS: -14.7410 -79.9073 -8.5967 OPERATOR 3 CENTER: 50.0256 -91.8920 -13.6461 ROTA 1: 0.6257 0.7800 -0.0037 ROTA 2: -0.7800 0.6257 -0.0010 ROTA 3: 0.0015 0.0035 1.0000 TRANS: 70.3889 42.4760 0.3937 GROUP 2 Summary of NCS group with 3 operators: ID of chain/residue where these apply: [['A', 'N', 'B'], [[[6, 9], [56, 59], [517, 519]], [[6, 9], [56, 59], [517, 519]], [[6, 9], [56, 59], [517, 519]]]] RMSD (A) from chain A: 0.0 0.48 0.03 Number of residues matching chain A:[11, 11, 11] Source of NCS info: anb.pdb OPERATOR 1 CENTER: 47.5037 -61.5641 -11.2751 ROTA 1: 1.0000 0.0000 0.0000 ROTA 2: 0.0000 1.0000 0.0000 ROTA 3: 0.0000 0.0000 1.0000 TRANS: 0.0000 0.0000 0.0000 OPERATOR 2 CENTER: 51.8984 -33.6038 20.9877 ROTA 1: 0.9367 -0.2981 0.1836 ROTA 2: -0.3113 -0.9492 0.0469 ROTA 3: 0.1603 -0.1011 -0.9819 TRANS: -14.9810 -78.2888 -2.3823 OPERATOR 3 CENTER: 66.8308 -82.9508 -11.4633 ROTA 1: 0.6255 0.7802 -0.0016 ROTA 2: -0.7802 0.6255 -0.0025 ROTA 3: -0.0009 0.0028 1.0000 TRANS: 70.3999 42.4366 0.4815 GROUP 3 Summary of NCS group with 3 operators: ID of chain/residue where these apply: [['A', 'N', 'B'], [[[193, 255], [257, 259], [290, 355], [357, 374]], [[193, 255], [257, 259], [290, 355], [357, 374]], [[193, 255], [257, 259], [290, 355], [357, 374]]]] RMSD (A) from chain A: 0.0 0.61 0.01 Number of residues matching chain A:[150, 150, 150] Source of NCS info: anb.pdb OPERATOR 1 CENTER: 36.1219 -37.6124 -62.1437 ROTA 1: 1.0000 0.0000 0.0000 ROTA 2: 0.0000 1.0000 0.0000 ROTA 3: 0.0000 0.0000 1.0000 TRANS: 0.0000 0.0000 0.0000 OPERATOR 2 CENTER: 39.1403 -33.0801 60.7270 ROTA 1: 0.7650 0.3808 -0.5194 ROTA 2: 0.0664 -0.8488 -0.5245 ROTA 3: -0.6406 0.3668 -0.6746 TRANS: 50.3180 -36.4383 16.0299 OPERATOR 3 CENTER: 40.9347 -76.7723 -62.2004 ROTA 1: 0.5942 0.8043 -0.0007 ROTA 2: -0.8043 0.5942 -0.0064 ROTA 3: -0.0047 0.0043 1.0000 TRANS: 73.5084 40.5311 0.5807 GROUP 4 Summary of NCS group with 3 operators: ID of chain/residue where these apply: [['A', 'N', 'B'], [[[36, 41]], [[36, 41]], [[36, 41]]]] RMSD (A) from chain A: 0.0 0.22 0.03 Number of residues matching chain A:[6, 6, 6] Source of NCS info: anb.pdb OPERATOR 1 CENTER: 45.4522 -37.4720 -14.4660 ROTA 1: 1.0000 0.0000 0.0000 ROTA 2: 0.0000 1.0000 0.0000 ROTA 3: 0.0000 0.0000 1.0000 TRANS: 0.0000 0.0000 0.0000 OPERATOR 2 CENTER: 42.1483 -55.6520 24.0535 ROTA 1: 0.9444 -0.3074 0.1171 ROTA 2: -0.2975 -0.9501 -0.0940 ROTA 3: 0.1402 0.0540 -0.9887 TRANS: -14.2728 -75.5420 6.4099 OPERATOR 3 CENTER: 46.7900 -69.5227 -14.6653 ROTA 1: 0.6247 0.7809 -0.0013 ROTA 2: -0.7809 0.6247 0.0028 ROTA 3: 0.0030 -0.0008 1.0000 TRANS: 70.4964 42.5349 0.0067 NCS operators written in format for resolve to: simple_ncs_from_pdb.resolve NCS operators written in format for phenix.refine to: simple_ncs_from_pdb.ncs NCS written as ncs object information to: simple_ncs_from_pdb.ncs_spec Possible ProblemsSpecific limitations and problems:
LiteratureAdditional informationList of all simple_ncs_from_pdb keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
simple_ncs_from_pdb
pdb_in= None Input PDB file to be used to identify ncs
temp_dir= "" temporary directory (ncs_domain_pdb will be written there)
min_length= 10 minimum number of matching residues in a segment
njump= 1 Take every njumpth residue instead of each 1
njump_recursion= 10 Take every njump_recursion residue instead of each 1 on
recursive call
min_length_recursion= 50 minimum number of matching residues in a segment
for recursive call
min_percent= 95. min percent identity of matching residues
max_rmsd= 2. max rmsd of 2 chains. If 0, then only search for domains
quick= True If quick is set and all chains match, just look for 1 NCS group
max_rmsd_user= 3. max rmsd of chains suggested by user (i.e., if called
from phenix.refine with suggested ncs groups)
maximize_size_of_groups= False You can request that the scoring be set up
to maximize the number of members in NCS groups
ncs_domain_pdb_stem= None NCS domains will be written to
ncs_domain_pdb_stem+"group_"+nn
write_ncs_domain_pdb= False You can write out PDB files representing NCS
domains for density modification if you want
verbose= False Verbose output
debug= False Debugging output
dry_run= False Just read in and check parameter names
domain_finding_parameters
find_invariant_domains= True Find the parts of a set of chains that
follow NCS
initial_rms= 0.5 Guess of RMS among chains
match_radius= 2.0 Keep atoms that are within match_radius of NCS-related
atoms
similarity_threshold= 0.75 Threshold for similarity between segments
smooth_length= 0 two segments separated by smooth_length or less get
connected
min_contig_length= 3 segments < min_contig_length rejected
min_fraction_domain= 0.2 domain must be this fraction of a chain
max_rmsd_domain= 2. max rmsd of domains
| |