|Python-based Hierarchical ENvironment for Integrated Xtallography|
Rapid phase improvement and model-building using phase_and_build
phase_and_build is a new and rapid method for improving the quality of your map and building a model. The approach is to carry out an iterative process of building a model as rapidly as possible and using this model in density modification to improve the map. This approach is related to the older phenix.autobuild approach. The difference is that in phenix.autobuild much effort was spent on building the best possible model at each stage before carrying out density modification, while in phenix.phase_and_build speed of model-building is optimized. The result is that phenix.phase_and_build is 10 times faster than phenix.autobuild, yet it produces nearly as good a model in the end. The phenix.phase_and_build approach will also find NCS from your starting map and apply it during density modification.
How phase_and_build works:
Output files from phase_and_build
Parameters files in phase_and_build
When you run phenix.phase_and_build it will write out a phase_and_build_params.eff parameter file that can be used to re-run phenix.phase_and_build (just as for essentially all PHENIX methods). In addition, phenix.phase_and_build will write out the parameters files for the intermediate methods used as part of phenix.phase_and_build to the temporary directory used in building. You can run these with:
phenix.find_ncs temp_dir/find_ncs_params.eff # runs NCS identification phenix.autobuild temp_dir/AutoBuild_run_1_/autobuild.eff # runs first cycle of density modification phenix.build_one_model temp_dir/build_one_model_params.eff # runs most recent model-building phenix.assign_sequence temp_dir/assign_sequence_params.eff # runs sequence assignment and filling short gaps phenix.fit_loops temp_dir/fit_loops_params.eff # runs loop fittingThis gives you control of all the steps in map improvement and model-building in addition to letting you run them all together with phenix.phase_and_build
Standard run of phase_and_build:
phenix.phase_and_build exptl_fobs_phases_freeR_flags.mtz sequence.datIf you want to supply a file with anisotropy-corrected data to use in density modification you can do so:
phenix.phase_and_build data=exptl_fobs_phases_freeR_flags.mtz \ seq_file=sequence.dat \ aniso_corrected_data=solve_1.mtzwhere solve_1.mtz is anisotropy-corrected (the amplitudes are not measured amplitudes, but rather are corrected with an anisotropic B-factor), and exptl_fobs_phases_freeR_flags.mtz contains experimental amplitudes. These two files normally will contain the same phase information. (Usually these files will come from phenix.autosol.)
Specific limitations and problems:
phenix.phase_and_build does not have the full flexibility of phenix.autobuild, so you may want to get a nearly-complete model with phenix.phase_and_build and then use phenix.autobuild to increase the completeness and quality.
List of all phase_and_build keywords
------------------------------------------------------------------------------- Legend: black bold - scope names black - parameter names red - parameter values blue - parameter help blue bold - scope help Parameter values: * means selected parameter (where multiple choices are available) False is No True is Yes None means not provided, not predefined, or left up to the program "%3d" is a Python style formatting descriptor ------------------------------------------------------------------------------- phase_and_build input_files data= None MTZ file containing FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flags Used as source of FP SIGFP freeR information in refinement and as source of experimental phase information for density modification. A suitable file is exptl_fobs_phases_freeR_flags.mtz from autosol or autobuild NOTE: This is a temporary requirement. You can also supply any other format of file if the data columns can be identified automatically labin= None Labin line for MTZ file with FreeR_flags. This is optional if phase_and_build can guess the labels. Otherwise specify a line like: FP=FP SIGFP=SIGFP PHIB=PHIB FOM=FOM HLA=HLA HLB=HLB HLC=HLD FreeR_flags==myFreeR_flags aniso_corrected_data= None Optional MTZ file containing anisotropy-corrected data with FP SIGFP PHIB FOM HLA HLB HLC HLD Used as source of FP SIGFP information for density modification. A suitable file is solve_1.mtz or phaser_1.mtz If none supplied, the mtz file specified as data will be used. labin_aniso_corrected_data= None Labin line for aniso_corrected data MTZ file . This is optional if phase_and_build can guess the labels map_file_fom= None You can specify the FOM of the map_coeffs file (useful in cases where the map file has only FWT PHFWT and no FOM column). This FOM is used to set the default smoothing radius for the density modification solvent boundary. map_file_is_density_modified= False You can specify that the input_map_file has been density modified. (This changes the assumptions on statistics of the map.) ha_file= None Heavy atom sites to be used to find NCS and to remove high peaks of density in initial density modification seq_file= None File with 1-letter code sequence of molecule. Chains separated by blank line or greater-than sign pdb_in= None Optional starting PDB file (ends will be extended if present) map_coeffs= None MTZ file with coefficients for a map labin_map_coeffs= None Labin line for MTZ file with map coefficients. This is optional if build_one_model can guess the correct coefficients for FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP is your column label for FP ncs_info_file= None ncs_spec file with NCS information (written by simple_ncs_from_pdb or find_ncs) remove_free NOTE remove_free params only used in build_one_model, not phase_and_build free_in= None MTZ file containing FreeR_flags NOTE free_in only used in build_one_model. Ignored by phase_and_build Used as source of freeR information for real_space refinement. Note other columns of data may be present and can be used in reciprocal-space refinement. A suitable file is exptf_fobs_phases_freeR_flags.mtz from autosol/autobuild or my_model_refine_data.mtz from phenix.refine labin_free= None Labin line for MTZ file with FreeR_flags. This is optional if build_one_model can guess the correct coefficients for FreeR_flags.Otherwise specify: FreeR_flags==myFreeR_flags map_coeffs_no_free= None Optional MTZ file with coefficients for a map with freeR set removed. Use instead of free_in. This map will be used for real-space refinement labin_no_free= None Labin line for MTZ file with map coefficients and freeR set removed. This is optional if build_one_model can guess the correct coefficients for FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP is your column label for FP output_files mtz_out= 'phase_and_build_map_coeffs.mtz' Output MTZ file with map coeffs pdb_out= build_one_model.pdb Output PDB file log= build_one_model.log Output log file params_out= phase_and_build_params.eff Parameters file to rerun phase_and_build job_title= None Job title in PHENIX GUI, not used on command line cycles ncycle= 2 Number of initial cycles of model-building, refinement and density modification nmodels= 1 Number of models to build with map from initial cycles ncs find_ncs= True Find NCS from input_ha_file or density or chains in the model update_ncs= True Update NCS as new information becomes available use_ha_in_ncs= True Use ha_file as source of NCS information optimize_ncs= True Try to map NCS operators close together minimum_ncs_cc= None Minimum CC for NCS (default unless extreme denmod) density_modification truncate_ha_sites_in_resolve= True You can choose to truncate the density near heavy-atom sites at a maximum of 2.5 sigma. This is useful in cases where the heavy-atom sites are very strong, and rarely hurts in cases where they are not. The heavy-atom sites are specified with "ha_file" use_hl_anom_in_denmod= False You can choose to use HL coefficients not including model information (HLanom) in density modification. They must be present in your data file use_hl_anom_in_denmod_with_model= False You can choose to use HL coefficients not including model information (HLanom) in density modification when model information is used. They must be present in your data file fom_for_extreme_dm= 0.35 If FOM of phasing is less up to fom_for_extreme_dm then defaults for density modification become: mask_type=wang wang_radius=20 mask_cycles=1 minor_cycles=4 refinement refine= True Refine with standard reciprocal-space refinement refine_pdb_in= False Refine input model (if any) before using it use_hl_anom_in_refinement= False You can choose to use HL coefficients not including model information (HLanom) in refinement. They must be present in your data file include_ha_in_refinement= True You can choose to include your heavy-atom sites in the model for refinement. This is a good idea if your structure includes these heavy-atom sites (i.e., for SAD or MAD structures where you are not using a native dataset). Heavy-atom sites that overlap an atom in your model will be ignored. refine_se_occ= True You can choose to refine the occupancy of SE atoms in a SEMET structure (default=True). This only applies if semet=true ordered_solvent= True You can add waters during refinement flood_with_waters= False You can use the parameters file in $PHENIX/phenix/phenix/autosol/flood.par to add lots of waters during the phase improvement stage macro_cycles= None You can set the number of macro_cycles in refinement Default (None) will use phenix.refine default add_free_r_if_needed= True If your input data file has no FreeR_flag then it will be added allow_overlapping= True You can allow atoms in your ligand files to overlap atoms in your protein/nucleic acid model. This overrides 'keep_pdb_atoms' Useful in early stages of model-building and refinement The ligand atoms get the altloc indicator 'L' NOTE: the ligand occupancy gets refined by default. You can turn this off with fix_ligand_occupancy=True fix_ligand_occupancy= False If allow_overlapping=True then ligand occupancies are refined as a group. You can turn this off with fix_ligand_occupancy=true NOTE: has no effect if allow_overlapping=False skip_hexdigest= False You may wish to ignore the hexdigest of the free R flags in your input PDB file if the dataset you provide is not identical to the one that you refined with (but has the same free R flags). ncs_in_refinement= *torsion cartesian None Use torsion_angle refinement of NCS. Alternative is cartesian or None (None will use phenix.refine default) correct_special_position_tolerance= None Adjust tolerance for special position check. If 0., then check for clashes near special positions is not carried out. This sometimes allows phenix.refine to continue even if an atom is near a special position. If 1., then checks within 1 A of special positions. If None, then uses phenix.refine default. (1) rs_refine= True You can run real-space refinement after model-building NOTE: real_space refinement requires a source of FreeR_flag and standard requires Fobs SigFobs and a source of FreeR_flag For real-space refinement you can supply either an mtz file with a FreeR_flag column or an mtz map file that has all the FreeR reflections removed model_building fit_loops= True Include loop fitting in full model-building. At lower resolution (3.5 A) it may be best to skip this step trace_loops= False Use trace_loops algorithm in loop fitting standard_loops= True Use standard_loops algorithm in loop fitting loop_lib= False Use loop_lib algorithm in loop fitting assign_sequence= True Include sequence assignment and short loop joining in full model-building. At lower resolution (3.5 A) it may be best to skip this step Only applicable for chain_type=PROTEIN min_percent_assigned_for_assign_sequence= 50 Skip assign_sequence if initial percentage sequence assigned is lower than min_percent_placed_for_assign_s equence quick= False You can run quickly (superquick_build/delta_phi=30.) or more thoroughly (default, thorough_build/delta_phi=20.) insert_helices= False You can find helices and use them as a starting point for model-building. This is useful if your resolution is worse than 3 A. i_ran_seed= 712341 Random seed for model-building directories temp_dir= "temp_dir" Optional temporary work directory output_dir= "" Output directory where files are to be written top_output_dir= "" Top output directory for control files base_gui_dir= None Base output path for Phenix GUI only. crystal_info ncs_copies= none Number of NCS copies (defines solvent_fraction with sequence) Normally determined automatically resolution= 0. high-resolution limit for map calculation solvent_fraction= None You can specify the solvent fraction Normally it is set automatically chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and side-chain atoms) semet= False You can specify that your protein contains selenomethionine control verbose= False Verbose output raise_sorry= False Raise sorry if problems debug= False Debugging output dry_run= False Just read in and check parameter names write_run_directory_to_file= None The working directory name is written to this file resolve_command_list= None You can supply any resolve command here NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_command_list="'no_build' 'b_overall 23' "