This is a customized version of Textal for molecular replacement applications. If you have phased your dataset by molecular replacement using phases calculated from a homologous model, you can give the coordinates of the transformed MR solution to TEXTAL, along with an alignment to the sequence of your new protein, and it will automatically build your new model for you, with correct amino acids substituted in.
Of course, the quality of the model built depends on the quality of the phases, and that depends on the degree of homology. It is also important to supply an alignment that is as accurate as possible; errors in the alignment (e.g. gap placement) will produce errors in the model (incorrect amino acid side-chains).
The program currently makes only a limited attempt to build divergent regions (e.g. loops that differ between the two structures), as well as to bridge gaps between broken chains. At the moment, it only attempts to bridge gaps of size 5 or less. (Methods for handling longer gaps will be added in the future.) It attempts to apply the three following strategies in sequence:
The residues in the output model will be numbered from 0 to N-1, where N is the number of amino acids (non-gap characters) in the sequence of the protein given in the alignment file.
> textal.build_mr --symmetry=human-otc.inp human-otc.hkl a1s_mr_solution.pdb alignment making map covering model scaling map output: human-otc-mr-scaled.xplor tracing map output: human-otc-mr-trace.pdb running Capra output: human-otc-mr-chains.pdb output: human-otc-mr-waypts.pdb output: human-otc-mr-links.pdb using molecular-replacement model to identify side-chains gap between chains 0 (end=LYS-4) and 1 (start=ARG-6): G failed to construct bridge via candidate CA's failed to construct bridge via fragment matching failed to connect gap between chains 1 (end=ARG-32) and 2 (start=LYS-34): I constructed bridge via candidate CAs # ATOM 17 CA ILE 33 30.350 44.389 52.411 1.81 0.38 gap between chains 2 (end=ASP-81) and 3 (start=VAL-86): IHLG constructed bridge via candidate CAs # ATOM 1352 CA ILE 82 56.850 64.389 49.911 1.31 0.87 # ATOM 1357 CA HIS 83 56.350 67.889 48.911 1.94 2.16 # ATOM 1688 CA LEU 84 55.850 70.389 50.411 1.50 1.76 # ATOM 1693 CA GLY 85 56.350 73.389 51.411 1.96 2.37 ... output: human-otc-mr-aa.pdb patching small breaks output: human-otc-mr-patch.pdb output: human-otc-mr-patch.log the following residues were connected by 'patch' and should be inspected: 4..6 109..111 145..147 178..180 261..263 (an attempt was made to bridge these gaps, but they are not necessarily consistent with the alignment) refining C-alpha coords output: human-otc-mr-capra.pdb summary: #chains=3, #CAs=294, expected=310, percent built=94.0% building side-chains original_clookup_main: Amino acid corrections will be applied. original_clookup_main: Partial assignments detected. Chain direction override disabled! original_clookup_main: Lookup is 3% complete, estimate 184m10s remaining for 265 residues... original_clookup_main: Lookup is 7% complete, estimate 120m03s remaining for 255 residues... ... original_clookup_main: Lookup is 98% complete, estimate 1m17s remaining for 5 residues... original_clookup_main: Lookup took 71m13s for the first phase. output: human-otc-mr-lookup.pdb applying real-space refinement output: human-otc-mr-model.pdbIf you want to run a quick test (e.g. 5 minutes vs. an hour), you can give the program a flag '-c' to build backbone only (C-alpha chains) but not the side chains. This will allow you to see what parts of the model will be built and to inspect the assignment of amino acid identities (residue names of C-alpha atoms in <prot>-mr-capra.pdb).
Variations:
// as above... > textal.build_mr --symmetry=human-otc.inp human-otc.hkl a1s_mr_solution.pdb alignment // build backbone (C-alpha chains) only (much faster) > textal.build_mr -c --symmetry=human-otc.inp human-otc.hkl a1s_mr_solution.pdb alignment // specify specific columns to use > textal.build_mr --amplitudes=FULL_MOD --phases=PA_MOD --FOM=FOM_MOD --symmetry=human-otc.inp human-otc.hkl a1s_mr_solution.pdb alignment // suppress running patch, if you are sure your alignment is accurate (output may be more disconnected but more veridical) > textal.build_mr --connectivity=conservative --symmetry=human-otc.inp human-otc.hkl a1s_mr_solution.pdb alignment
There is also a new task for the Phenix GUI (listed under tasks/textal/MR_Build.py) that serves as an interface to the program, with inputs that parallel those for the command line program.
There must be exactly one amino acid in the sequence for each residue in the homology model (MR solution structure). If not, the program will complain. Note that this means removing amino acids from the normal sequence of the protein if they are absent in the model (such a disordered terminus or loop region). You can extract the sequences using textal.pdb2seq, but trim off header and non-peptide characters ('X', e.g. for water or ligands).
If there are multiple chains in the structure, then the alignment file should provide a pair of lines for each, with the sequence of the new protein first and the sequence of the chain in the homology model second. Thus, if there are N chains in the homology model (even if they are NCS symmetry copies or otherwise identical, as in homo-oligomers), then there should be 2N lines in the alignment file.
To generate alignments, I suggest the following web server, which runs Bill Pearson's LALIGN. Be sure to set the 'global' alignment option, to force it to include all residues and not drop any on the ends. Note that for 'real' runs of this program, I recommend paying careful attention to various parameters, such as gap penalties and substitution matrix, to generate as accurate an alignment as possible, since the quality of the resulting model built will depend to some extent on it getting the alignment right.
Here is an example of the alignment file for nitrite-reduct with kbv (from the internal Phenix database of test structures; use nitrite-reduct/model/kbv_mr_solution.pdb as an input model to textal.build_mr) (note: for this example, CE was used to make a structural alignment, to remove the uncertainty about the alignment itself inherent to pure sequence-based methods like LALIGN):
ATAAEIAALPRQKVELVDPPFVHAHSQVAEGGPKVVEFTMVIEEKKIVIDDAGTEVHAMAFNGTVPGPLMVVHQDDYLELTLINPETNTLMHNIDFHAATGALGGGGLTEINPGEKTILRFKATKPGVFVYHCAPPGMVPWHVVSGMNGAIMVLPREGLHDGKGKALTYDKIYYVGEQDFYVPRDENGKYKKYEAPGDAYED---------TVKVMRTLTPTHVVFNGAVGALTGDKAMTAAVGEKVLIVHSQAN--RDTRPDLIGGHGDYVWATGKFNTPPDVDQETWFIPGGAAGAAFYTFQQPGIYAYVNHNLIEAFELGAAAHFKVTGEWNDDLMTSVLAPSGTIE -------ELPVIDAVTTHAPEVPPAI--DRDYPAKVRVKMETVEKTMKMDD-GVEYRYWTFDGDVPGRMIRVREGDTVEVEFSNNPSSTVPHNVDFHAATGQGGGAAATFTAPGRTSTFSFKALQPGLYIYHCAV-APVGMHIANGMYGLILVEPKEGL-------PKVDKEFYIVQGDFYTKG-----------------KKGAQGLQPFDMDKAVAEQPEYVVFNGHVGALTGDNALKAKAGETVRMYVGNGGPNLVSSFHVIGEIFDKVYVEGG--KLINENVQSTIVPAGGSAIVEFKVDIPGNYTLVDHSIFRAFNKGALGQLKVEGAENPEIM-----------Here is an example of the alignment file for human-otc (with a1s):
-VQLKGRDLLTLKNFTGEEIKYMLWLSADLKFRIKQKGEYLPLLQGKSLGMIFEKRSTRTRLSTETGFALLGGHPCFLTTQDIHLGVNESLTDTARVLSSMADAVLARVYKQSDLDTLAKEASIPIINGLSDLYHPIQILADYLTLQEHYSSLKGLTLSWIGDGNNILHSIMMSAAKFGMHLQAATPKGYEPDASVTKLAEQYAKENGTKLLLTNDPLEAAHGGNVLITDTWISMGREEEKKKRLQAFQGYQVTMKTAKVAASDWTFLHCLPRKP-EEVDDEVFYSPRSLVFPEAENRKWTIMAVMVSLLTD-- VVSLAGRDLLCLQDYTAEEIWTILETAKMFKIW-QKIGKPHRLLEGKTLAMIFQKPSTRTRVSFEVAMAHLGGHALYLNAQDLQLRRGETIADTARVLSRYVDAIMARVYDHKDVEDLAKYATVPVINGLSDFSHPCQALADYMTIWEKKGTIKGVKVVYVGDGNNVAHSLMIAGTKLGADVVVATPEGYEPDEKVIKWAEQNAAESGGSFELLHDPVKAVKDADVIYTDVWASMGQEAEAEERRKIFRPFQVNKDLVKHAKPDYMFMHCLPAHRGEEVTDDVIDSPNSVVWDQAENRLHAQKAVLALVMGGIKHere is an example of the alignment file for a2u-globulin (with mup); note that there are 4 chains (homotetramer in the ASU), hence 4 pairs of lines:
EEASSTRGNLDVAKLNGDWFSIVVASNKREKIEENGSMRVFMQHIDVLENSLGFKFRIKENGECRELYLVAYKTPEDGEYFVEYDGGNTFTILKTDYDRYVMFHLINFKNGETFQLMVLYGRTKDLSSDIKEKFAKLCEAHGITRDNIIDLTKTDRCL EEASSTGRNFNVEKINGEWHTIILASDKREKIEDNGNFRLFLEQIHVLEKSLVLKFHTVRDEECSELSMVADKTEKAGEYSVTYDGFNTFTIPKTDYDNFLMAHLINEKDGETFQLMGLYGREPDLSSDIKERFAQLCEEHGILRENIIDLSNANRC- EEASSTRGNLDVAKLNGDWFSIVVASNKREKIEENGSMRVFMQHIDVLENSLGFKFRIKENGECRELYLVAYKTPEDGEYFVEYDGGNTFTILKTDYDRYVMFHLINFKNGETFQLMVLYGRTKDLSSDIKEKFAKLCEAHGITRDNIIDLTKTDRCL EEASSTGRNFNVEKINGEWHTIILASDKREKIEDNGNFRLFLEQIHVLEKSLVLKFHTVRDEECSELSMVADKTEKAGEYSVTYDGFNTFTIPKTDYDNFLMAHLINEKDGETFQLMGLYGREPDLSSDIKERFAQLCEEHGILRENIIDLSNANRC- EEASSTRGNLDVAKLNGDWFSIVVASNKREKIEENGSMRVFMQHIDVLENSLGFKFRIKENGECRELYLVAYKTPEDGEYFVEYDGGNTFTILKTDYDRYVMFHLINFKNGETFQLMVLYGRTKDLSSDIKEKFAKLCEAHGITRDNIIDLTKTDRCL EEASSTGRNFNVEKINGEWHTIILASDKREKIEDNGNFRLFLEQIHVLEKSLVLKFHTVRDEECSELSMVADKTEKAGEYSVTYDGFNTFTIPKTDYDNFLMAHLINEKDGETFQLMGLYGREPDLSSDIKERFAQLCEEHGILRENIIDLSNANRC- EEASSTRGNLDVAKLNGDWFSIVVASNKREKIEENGSMRVFMQHIDVLENSLGFKFRIKENGECRELYLVAYKTPEDGEYFVEYDGGNTFTILKTDYDRYVMFHLINFKNGETFQLMVLYGRTKDLSSDIKEKFAKLCEAHGITRDNIIDLTKTDRCL EEASSTGRNFNVEKINGEWHTIILASDKREKIEDNGNFRLFLEQIHVLEKSLVLKFHTVRDEECSELSMVADKTEKAGEYSVTYDGFNTFTIPKTDYDNFLMAHLINEKDGETFQLMGLYGREPDLSSDIKERFAQLCEEHGILRENIIDLSNANRC-