|Python-based Hierarchical ENvironment for Integrated Xtallography|
Rapid helix fitting to a map with find_helices_strands
find_helices_strands is a command line tool for finding helices and strands in a map and building an model of the parts of a structure that have regular secondary structure. It can be used for protein, RNA, and DNA. An option is to use a rapid chain-tracing algorithm to build CA of proteins, followed by reconstruction of a full model.
How find_helices_strands finds helices and strands in maps:
find_helices_strands first identifies helical segments as rods of density at 5-8 A. Then it identifies helices at higher resolution keeping the overall locations of the helices fixed. Then it identifies the directions and CA positions of helices by noting the helical pattern of high-density points offset slightly along the helix axis from the main helical density (as used in "O" to identify helix direction). Finally model helices are fit to the density using the positions and orientations identified in the earlier steps. A similar procedure is used to identify strands. Then the helices and strands are combined into a single model.
How find_helices_strands finds RNA and DNA helices in maps:
find_helices_strands finds RNA and DNA helices differently than it finds helices in proteins. It uses a convolution search to find places in the asymmetric unit where an A-form RNA or B-form DNA helix can be placed. These are assembled into contiguous helical segments if possible. The resolution of this search is 4.5 A if you have resolution beyond 4.5 A, and the resolution of your data otherwise.
How trace_chain finds CA positions in maps:
The RESOLVE trace_chain algorithm places dummy atoms down the middle of all the tubes of density in a map, then it attempts to find sets of these atoms that may be CA atoms, where the atoms are spaced by 3.8 A and where there is strong density between each pair. This yields segments represented by CA atoms. Next PULCHRA ( Rotkiewicz P., Skolnick J.,Fast procedure for reconstruction of full-atom protein models from reduced representations. J Comput Chem. 2008 Jul 15;29(9):1460-5.) is used to reconstruct a full main-chain model. Finally RESOLVE is used to assemble all the resulting fragments into a model.
Output files from find_helices_strands
Standard run of find_helices_strands:
phenix.find_helices_strands map_coeffs.mtz quick=True
If you want a more thorough run, then skip the "quick=True" flag. If you want (or need) to specify the column names from your mtz file, you will need to tell find_helices_strands what FP and PHIB are, in this format:
phenix.find_helices_strands map_coeffs.mtz \ labin="LABIN FP=2FOFCWT PHIB=PH2FOFCWT"If you want to specify a sequence file, then in the last step find_helices_strands will try to align your sequence with the map and model:
phenix.find_helices_strands map_coeffs.mtz seq_file=seq.datIf you want to use the trace_chain algorithm, then specify:
phenix.find_helices_strands map_coeffs.mtz seq_file=seq.dat trace_chain=TrueHere is an example using data from the PHENIX examples library:
phenix.find_helices_strands $PHENIX/phenix_examples/p9-build/p9-resolve.mtz \ labin="FP=FP PHIB=PHIM FOM=FOMM" trace_chain=TrueThat should build a model using sample data in a few seconds. Now try:
phenix.find_helices_strands $PHENIX/phenix_examples/p9-build/p9-resolve.mtz \ trace_chain=True --guiwhich should set up the gui and do the same thing.
Using find_helices_strands to bootstrap phenix.autobuild:
If you run phenix.autobuild at low resolution (3.5 A or lower) then your model may have strands built instead of helices. You can use find_helices_strands to help bootstrap autobuild model-building by providing the helical model from find_helices_strands to phenix.autobuild. Just run phenix.find_helices_strands with your best map map_coeffs.mtz. Then take the helical model map_coeffs.mtz_helices.pdb and pass it to phenix.autobuild with the keyword (in addition to your usual keywords for autobuild):
consider_main_chain_list=map_coeffs.mtz_helices.pdbThen the AutoBuild wizard will treat your helical model just like one of the models that it builds, and merge it into the model as it is being assembled.
Specific limitations and problems:
List of all find_helices_strands keywords
------------------------------------------------------------------------------- Legend: black bold - scope names black - parameter names red - parameter values blue - parameter help blue bold - scope help Parameter values: * means selected parameter (where multiple choices are available) False is No True is Yes None means not provided, not predefined, or left up to the program "%3d" is a Python style formatting descriptor ------------------------------------------------------------------------------- find_helices_strands mtz_in= None MTZ file with coefficients for a map pdb_in= None Optional PDB file to be used for seeding trace_chain with CA use_pdb_in_directly= False Use PDB input model directly output_model= None Output PDB file output_log= None Output log file name. If you want to specify a directory to put this file in then please use "output_dir=myoutput_dir" output_dir= None Output directory seq_file= None Sequence file for sequence alignment compare_file= None PDB file for comparison only labin= "" Labin line for MTZ file with map coefficients. This is optional if find_helices_strands can guess the correct coefficients for FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP is your column label for FP resolution= None high-resolution limit for map calculation res_convolution= 4.5 high-resolution limit for convolution calculation. (Applies to nucleic acids only) chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and side-chain atoms) temp_dir= "temp_dir" Optional temporary work directory helices_only= False Find only helices (Applies only if trace_chain=False) strands_only= False Find only strands (Applies only if trace_chain=False) trace_chain= True Use resolve trace_chain algorithm helices_before_trace= None Find helices before running resolve trace_chain algorithm strands_before_trace= None Find strands before running resolve trace_chain algorithm pulchra= False Use PULCHRA to create all-atom model from CA model rho_cut_min= None Minimum rho/sigma at potential CA positions dist_ca_tol_max= None Maximum tolerance for CA-CA distances. Normally 0.8 A for medium and 1.3 A for quick dist_ca_start= None Guess of tolerance for CA-CA distances. Set automatically by default. dist_ca_tol= None Tolerance for CA-CA distances. Normally set automatically Typical values are 0.3-1.0 A A high number can be used to force a more thorough search. Compare with target_p_ratio which adjusts CA-CA tolerance to achieve a targeted ratio of nonamers to atoms. cutoff_trace= 0.000 The top cutoff_trace fraction of peaks in trace_chain will be assumed to be non-protein, and all peaks near them will be ignored. (Default = 0.00) ncut_trace_min= 0 The top ncut_trace_min peaks in trace_chain will be assumed to be non-protein, and all peaks near them will be ignored. (Default = 0) target_p_ratio= None Target ratio of nonamers found to atoms in a.u default=3 for quick, 4 otherwise. This can be used as an alternate method to adjust the thoroughness of trace_chain searches. Differs from setting dist_ca_tol by adjusting CA-CA tolerance to achieve the desired target ratio, while dist_ca_tol=tol sets the tolerance directly to tol trace_ratio_long= 0.5 When setting tolerances for CA-CA distances in trace_chain the upper bound will increase trace_ratio_long as fast as the lower bound. (Default=0.5) ratio_trace_extra= None Minimum ratio of dist between extra atoms added in trace_chain to rad_sep_trace Default is 1.25 for standard and 1.5 for quick rad_sep_trace= None Dummy atom separation in trace_chain Default is 0.6 A for standard and 0.75 for quick Increased if resolution is greater than 3 A Value of rad_mask_trace in resolve will be rad_sep_trace*2 fill_gaps= True Try to fill in gaps in trace_chain use_any_side= False Use any side chain that fits density in assembly cc_helix_min= None Minimum CC of low-res helical density to map to keep. group_ca_length= None Minimum length of a segment of helix or strand to keep. (only applies if trace_chain=False) cc_strand_min= None Minimum CC of strand density to map to keep. (only applies if trace_chain=False) quick= False Try to find structure quickly recycle= False Recycle CA positions in trace_chain optimize= True Try to optimize CA-CA tolerance in trace_chain to obtain nonamer ratio equal to target_p_ratio assemble= False Assemble model with resolve after trace_chain resolve_size= 12 Size of resolve to use. You may need a bigger size than in other resolve applications coarse_grid= False Coarse_grid allows the use of a smaller resolve size verbose= False Verbose output raise_sorry= False Raise sorry if problems debug= False Debugging output dry_run= False Just read in and check parameter names mtz_label_prefix= None Prefix for column names, used by Coot. The GUI will detect and set this automatically. It should not be changed directly by users. delete_tmp_dir= False GUI setting, does not apply to command-line version job_title= None Job title in PHENIX GUI, not used on command line