phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Rebuilding an RNA structure with ERRASER

Authors
Purpose
How erraser works
Installation notes
Input files preparation
Examples
Output files
Possible Problems
Specific limitations:
List of all erraser keywords

Authors

erraser: Fang-Chieh Chou and Rhiju Das, Stanford University

Purpose

ERRASER (Enumerative Real-space Refinment ASsisted by Electron-density under Rosetta) is an application for improving RNA crystal structures based on Rosetta and Phenix.

How erraser works

By supplementing the Rosetta RNA scoring function with electron-density restraint, ERRASER can confidently reduce the errors in RNA crystallographic models while retaining a good fit to the diffraction data. Two ERRASER applications are currently available. The standard ERRASER application remodels all the potentially problematic nucleotides in a RNA model in an automatic fashion, and output one final model. The ERRASER single residue rebuilding application apply ERRASER algorithm to a residue specified by user, and return up to 10 top-score models plus the minimized starting model. The first application is useful for automatic improving the entire model globally, and the second application is useful when the user wants to explore all possible alternative conformations for a problematic nucleotide and examine each one manually.

Installation notes

To run ERRASER you need to have Rosetta version 3.5 or later installed on your machine. Rosetta can be downloaded at https://www.rosettacommons.org/software/. For Rosetta installation notes and setting the environmental variable PHENIX_ROSETTA_PATH to tell Phenix where to find Rosetta, refer to the phenix.mr_rosetta documentation under "Installing Rosetta for use with mr_rosetta" (the same Rosetta installation will work for mr_rosetta and ERRASER).

Input files preparation

input_model.pdb: A PDB file with your starting RNA model. Please ensure it is in the standard PDB format otherwise ERRASER might not run properly.

mapfile.ccp4: A 2mFo-DFc density map for your model in CCP4 format that covers the entire unit cell. Rfree reflections should be removed in map creation to avoid overfitting. We also suggest one to fill the missing data with calculated data to avoid Fourier truncation error. The following options are suggested if PHENIX map calculation utility (GUI) is used in map creation: "Kicked", "Fill missing f obs", and "Exclude free r reflections" set to True. Also in "Map region", select "Unit Cell".

Examples

Running ERRASER is easy, and can be done both through the Phenix GUI or the command line. In the GUI, the program is listed under the Refinement category. Most configuration options are shown in the main window:

images/erraser_input.png

Toolbar buttons to launch phenix.maps and the FFT utility are also displayed.

From the command-line you can type:

phenix.erraser model.pdb mapfile.ccp4

This command will have erraser rebuild your entire model.

Here are some commonly used options to include: "map_reso=2.2" gives tha map resolution (2.2 angstrom here), which highly recommended to include as this usually gives better performance. "n_iterate=2" allows ERRASER to iterate twice before output the model. This takes much longer time but might gives better improvement. "fixed_res= A20 B15" gives residues to be fixed during remodeling. The format is chain ID followed by residue numbers. ERRASER currently did not model ligands (anything starts with "HETATM" in PDB, including modifies bases), proteins, and crystal contacts in a structure. Therefore we suggest to fix the position of nucleotides that are in close contact with protein or ligand component.

You can also run ERRASER in single residue rebuilding mode:

phenix.erraser model.pdb mapfile.ccp4 single_res_mode=True rebuild_res_pdb=A30

This will rebuild just residue 30 in chain A and output up to 10 models for manual inspection. (In the GUI, this is equivalent to checking the box labeled "Single residue rebuilding mode" and entering the residue ID in the field labeled "Residue to be rebuilt".)

Output files

For standard ERRASER:

model_erraser.pdb: A PDB file with your rebuilt RNA model.

For single residue rebuilding mode: model_0.pdb, model_1.pdb...: Up to 10 different models with the specified residue rebuilt.

In both modes, after the models are generated, the application will analyze the models and output a detailed comparison of the changes introduced by ERRASER. The GUI displays a list of output files and a summary of validation statistics:

images/erraser_result.png

For the standard rebuilding mode, a subset of the MolProbity analyses will be displayed in additional tabs. Lists of outliers are interactive with Coot (if installed); see the validation documentation for more information. In single-residue mode, a summary of the validation criteria for the specific residue being rebuilt is shown instead:

images/erraser_single_res.png

Possible Problems

Specific limitations:

ERRASER works only for RNA currently. Other parts in crystallographic model, including proteins, modified bases and ligands, are not being modeled. Remodeling of RNA residues that are in close contact with these components may be problematic.

Currently crystal contacts are not being modeled, which is known to cause problems in a few test cases when RNA is interacting strongly with its crystal-packing partner (ex. base-pairing and base-stacking). Right now this problem can be resolved by mannually adding the crystal-packing partner into the starting pdb file or forcing these residues as "fixed_res" during the run.

Literature

  • Correcting pervasive errors in RNA crystallography through enumerative structure prediction. F.C. Chou, P. Sripakdeevong, S.M. Dibrov, T. Hermann, and R. Das. Nature Methods 10, 74-76 (2012).

List of all erraser keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
erraser
   input_files
      pdb_in= None PDB file with starting model
      map_file= None Map file (CCP4 format) 2mFo-DFc map file in CCP4 format.
                Rfree should be excluded.
      map_reso= 2.5 The resolution of the input density map. It is highly
                recommended to input the map resolution whenever possible for
                better result.
      map_coeffs= None
      map_labels= None
      r_free_flags
         label= None
         test_flag_value= None
         disable_suitability_test= False
   output_files
      pdb_out= None Output pdb file name (optional)
      log= erraser.log Output log file
      params_out= erraser_params.eff Parameters file to rerun erraser
   directories
      temp_dir= "" Optional temporary work directory
      output_dir= "" Output directory where params files are to be written
      gui_output_dir= None
      rosetta_path= "" Location of rosetta directories. If you have set
                    PHENIX_ROSETTA_PATH then this can be blank. All rosetta
                    files are located relative to this path You can set the
                    environment variable 'PHENIX_ROSETTA_PATH' to indicate
                    where rosetta is to be found. In csh/tcsh use something
                    like: setenv PHENIX_ROSETTA_PATH
                    /Users/Shared/unix/rosetta In bash/sh use: export
                    PHENIX_ROSETTA_PATH=/Users/Shared/unix/rosetta
      rosetta_erraser_dir= "" Directory with Rosetta tools for erraser Path is
                           relative to rosetta_path
   erraser_control
      single_res_mode= False When is True, ERRASER just rebuild one residue
                       specified in rebuild_res option and output up to 10
                       models for manual inspection. Overides the standard
                       ERRASER procotol. Required option: rebuild_res_pdb. All
                       other erraser_control options except native_screen_rmsd
                       become unfunctional in this mode.
      n_iterate= 1 The number of rebuild-minimization iteration in ERRASER.
                 The user can increase the number to achieve best performance.
                 Usually 2-3 rounds will be enough. Alternatively, the user
                 can also take a ERRASER-refined model as the input for a next
                 ERRASER run to achieve mannual iteration.
      native_screen_rmsd= 3.0 In ERRASER default rebuilding, we only samples
                          conformations that are within 3.0 A to the starting
                          model (which is the 'native' here). The user can
                          modify the RMSD cutoff. If the value of
                          native_screen_RMSD is larger than 10.0, the RMSD
                          screening will be turned off.
      rebuild_all= False When is True, ERRASER will rebuild all the residues
                   instead of just rebuilding errorenous ones. Residues in
                   '-fixed_res' (see below) are still kept fixed during
                   rebuilding. It is more time consuming but not necessary
                   leads to better result. Standard rebuilding with more
                   iteration cycles is usually prefered.
      fixed_res= None (Example: fixed_res=A1 fixed_res=A14-19 fixed_res=B9
                 fixed_res=B10-13; Format is chain ID followed by residue
                 numbers). This allows users ton fix selected RNA residues
                 during ERRASER. For example, because protein and ligands are
                 not modeled in ERRASER, we recommand to fix RNA residues that
                 interacts strongly with these unmodeled atoms. ERRASER will
                 automatically detect residues covalently bonded to removed
                 atoms and hold them fixed during the rebuild, but users need
                 to specify residues having non-covalent interaction with
                 removed atoms manually.
      extra_res= None (Example: extra_res=A1 extra_res=A14-19 extra_res=B9
                 extra_res=B10-13; Format is chain ID followed by residue
                 numbers). This allows users to specify extra residues and
                 force ERRASER to rebuild them. ERRASER will automatically
                 pick out incorrect residues, but the user may be able to find
                 some particular residues that was not fixed after one ERRASER
                 run. The user can then re-run ERRASER with the extra_res
                 argument, and force ERRASER to remodel these residues.
      constrain_chi= True When is True, ERRASER will apply a weak constraint
                     on Chi angle to stay near the input conformer. Only new
                     Chi conformers with a large energy bonus will be
                     accepted.
      search_syn_pyrimidine_only_when_native_syn= True When is True, ERRASER
                                                  will only sample syn-chi
                                                  conformer for pyrimidines if
                                                  the input residue is in syn
                                                  conformer.
      rebuild_res_pdb= None (Example: rebuild_res_pdb=B21; Format is chain ID
                       followed by residue number.) Residue to be rebuilt.
                       Required input for single-residue rebuilding mode;
                       otherwise it is useless.
   control
      debug= False Debugging output
      job_title= None Job title in PHENIX GUI, not used on command line
   non_user_params
      print_citations= True Print citation information at end of run