Tutorial 5: Solving a structure with MR data


	Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home

Tutorial 5: Solving a structure with MR data

Introduction
Setting up to run PHENIX
Running the demo a2u-globulin-mr data with AutoMR: Where are my files?; What parameters did I use?; Reading the log files for your AutoMR run file; Summary of the command-line arguments; Running Phaser molecular replacement; The AutoMR_summary.dat summary file; How do I know if I have a good solution?; What to do next
Additional information

Introduction

This tutorial will use the structure of a2u-globulin-mr using a search model with 63% sequence identity as an example of how to solve a MR dataset with the AutoMR Wizard. It is designed to be read all the way through, giving pointers for you along the way. Once you have read it all and run the example data and looked at the output files, you will be in a good position to run your own data through AutoMR.

Setting up to run PHENIX

If PHENIX is already installed and your environment is all set, then if you type:

echo $PHENIX

then you should get back something like this:

/xtal//phenix-1.3

If instead you get:

PHENIX: undefined variable

then you need to set up your PHENIX environment. See the PHENIX installation page for details of how to do this. If you are using the C-shell environment (csh) then all you will need to do is add one line to your .cshrc (or equivalent) file that looks like this:

source /xtal/phenix-1.3/phenix_env

(except that the path in this statement will be where your PHENIX is installed). Then the next time you log in $PHENIX will be defined.

Running the demo a2u-globulin-mr data with AutoMR

To run AutoMR on the demo a2u-globulin-mr data, make yourself a tutorials directory and cd into that directory:

mkdir tutorials
cd tutorials

Now type the phenix command:

phenix.run_example --help

to list the available examples. Choosing a2u-globulin-mr for this tutorial, you can now use the phenix command:

phenix.run_example a2u-globulin-mr

to solve the a2u-globulin-mr structure with AutoMR. This command will copy the directory $PHENIX/examples/a2u-globulin-mr to your current directory (tutorials) and call it tutorials/a2u-globulin-mr/ . Then it will run AutoMR using the command file run.sh that is present in this tutorials/a2u-globulin-mr/ directory. This command file run.sh is simple. It says:

#!/bin/sh
echo "Running AutoMR on a2u-globulin data without building..."
phenix.automr mup_search.pdb scale.mtz mass=18000. resolution=2.5 \
    component_type=protein RMS=1.0 sequence.dat copies=4  \
    space_group=p212121 unit_cell="106.820   62.340  114.190  90.00  90.00  90.00" \
    build=False

The first line (#!/bin/sh) tells the system to interpret the remainder of the text in the file using the sh (or bash) -shell (sh). The command phenix.automr runs the command-line version of AutoMR (see Automated Molecular Replacement using AutoMR for all the details about AutoMR including a full list of keywords). The arguments on the command line tell AutoMR about the search model (mup_search.pdb), the datafile with structure factors (scale.mtz), the molecular mass of the molecule that we are searching for (mass=18000.), and the the resolution (resolution=2.5). Then the command continues with telling AutoMR that the component we are searching for is protein (component_type=protein) and that the search model has an estimated RMS difference from the true structure of about 1.0 A (RMS=1.0). Next the sequence file (sequence.dat) is specified along with the number of copies of the search model to look for (copies=4). Then the space group and cell dimensions are specified (these could also have been simply read from the data file). Finally the Wizard is told not to rebuild the model after MR with rebuild_after_mr=False Note that each of these is specified with an = sign, and that there are no spaces around the = sign. Note the backslash "\" at the end of some of the lines in the phenix.automr command. This tells the C-shell (which interprets everything in this file) that the next line is a continuation of the current line. There must be no characters (not even a space) after the backslash for this to work. Although the phenix.run_example a2u-globulin-mr command has just run AutoMR from a script (run.sh), you can run AutoMR yourself from the command line with the same phenix.automr seq_file= ... command. You can also run AutoMR from a GUI, or by putting commands in another type of script file. All these possibilities are described in Using the PHENIX Wizards.

Where are my files?

Once you have started AutoMR or another Wizard, an output directory will be created in your current (working) directory. The first time you run AutoMR in this directory, this output directory will be called AutoMR_run_1_ (or AutoMR_run_1_/, where the slash at the end just indicates that this is a directory). All of the output from run 1 of AutoMR will be in this directory. If you run AutoMR again, a new subdirectory called AutoMR_run_2_ will be created. Inside the directory AutoMR_run_1_ there will be one or more temporary directories such as TEMP0 created while the Wizard is running. The files in this temporary directory may be useful sometimes in figuring out what the Wizard is doing (or not doing!). By default these directories are emptied when the Wizard finishes (but you can keep their contents with the command clean_up=False if you want.)

What parameters did I use?

Once the AutoMR wizard has started (when run from the command line), a parameters file called automr.eff will be created in your output directory (e.g., AutoMR_run_1_/automr.eff). This parameters file has a header that says what command you used to run AutoMR, and it contains all the starting values of all parameters for this run (including the defaults for all the parameters that you did not set). The automr.eff file is good for more than just looking at the values of parameters, though. If you copy this file to a new one (for example automr_hires.eff) and edit it to change the values of some of the parameters (resolution=3.0) then you can re-run AutoMR with the new values of your parameters like this:

phenix.automr automr_hires.eff

This command will do everything just the same as in your first run but use only the data to 3.0 A.

Reading the log files for your AutoMR run file

While the AutoMR wizard is running, there are several places you can look to see what is going on. The most important one is the overall log file for the AutoMR run. This log file is located in:

AutoMR_run_1_/AutoMR_run_1_1.log

for run 1 of AutoMR. (The second 1 in this log file name will be incremented if you stop this run in the middle and restart it with a command like phenix.automr run=1). The AutoMR_run_1_1.log file is a running summary of what the AutoMR Wizard is doing. Here are a few of the key sections of the log files produced for the a2u-globulin-mr MR dataset.

Summary of the command-line arguments

Near the top of the log file you will find:

------------------------------------------------------------
Starting AutoMR with the command:
phenix.automr coords=mup_search.pdb data=scale.mtz mass=18000. resolution=2.5 \
component_type=protein RMS=1.0 seq_file=sequence.dat   \
input_seq_file=sequence.dat copies=4 space_group=p212121   \
unit_cell='106.820   62.340  114.190  90.00  90.00  90.00' rebuild_after_mr=False

This is just a repeat of how you ran AutoMR; you can copy it and paste it into the command line to repeat this run.

Running Phaser molecular replacement

The AutoMR Wizard will take the information you have input and use it to run Phaser molecular replacement algorithm:

 AutoMR_auto_MR  AutoMR  Run 1 Tue Jul  3 10:40:54 2007

This is followed by a summary of some of the input information:

CRITERIA FOR PHASER MR RUN:

sg : p212121
selection_criteria_rot : Percent_of_best
selection_criteria_rot_value : 75
all_plausible_sg_list : ['P 2 2 2', 'P 2 2 21', 'P 21 2 2', 'P 2 21 2', 'P 21 21 2', 'P 2 21 21', 'P 21 2 21', 'P 21 21 21']
use_all_plausible_sg : No
overlap_allowed :
DICTENS: {'ensemble_1': [['mup_search.pdb', 'RMS', 1.0]]}
PDBList entry:  ensemble_1 mup_search.pdb RMS 1.0
...
HALL:   P 2ac 2ab
CELL:  (106.81999999999999, 62.340000000000003, 114.19, 90.0, 90.0, 90.0)
ENSEMBLE  0 : ensemble_1
ENSEMBLE  1 : ensemble_1
ENSEMBLE  2 : ensemble_1
ENSEMBLE  3 : ensemble_1

ENSEMBLE: ensemble_1 , 1 PDB file(s)

Here the list of all plausible space groups are those with the same symmetry in reciprocal space as the one you have input, and hence these might be the correct space group. (If you are not sure which one is correct, then you can tell AutoMR to try all of these with use_all_plausible_sg=Yes). The AutoMR wizard then runs Phaser, and the log file for this is in MR.log. The summary is written to your AutoMR log file. It starts out with a list of steps to be carried out:

   Steps:
      Anisotropy correction
      Cell Content Analysis
      Fast Rotation Function
      Fast Translation Function
      Packing
      Refinement (if data higher resolution than search resolution)
   Number of search ensembles = 4
      #1: Ensemble ensemble_1
      #2: Ensemble ensemble_1
      #3: Ensemble ensemble_1
      #4: Ensemble ensemble_1
   Number of permutations of search ensembles = 1

   One test spacegroup
     P 21 21 21

Phaser then carries out each of these steps. The final summary is:

OUTPUT FILES
------------

   No script files output
   /net/cci-filer1/vol1/tmp/terwill/from_firebird/PHENIX/structure_lib_tests/MR/a2u-g
   lobulin/run_070307_new/AutoMR_run_1_/MR.1.pdb
   /net/cci-filer1/vol1/tmp/terwill/from_firebird/PHENIX/structure_lib_tests/MR/a2u-g
   lobulin/run_070307_new/AutoMR_run_1_/MR.1.mtz

followed by the final log likelihood gain (positive is good, anything over 100 is fine, and a very strong solution will be over 1000) and orientations for each of the 4 molecules:


Solution #1:  Likelihood Gain 1410.51
ENSE ensemble_1 - EULER  295.533,  59.329, 229.980 - FRAC    0.096,  -0.302,  -0.115
ENSE ensemble_1 - EULER  166.424, 152.963, 316.988 - FRAC   -0.241,  -0.440,   0.021
ENSE ensemble_1 - EULER  183.783,  14.133, 133.534 - FRAC   -0.217,  -0.221,   0.085
ENSE ensemble_1 - EULER   68.417, 114.921,  35.347 - FRAC    0.073,  -0.100,  -0.037

The AutoMR_summary.dat summary file

A quick summary of the results of your AutoMR run is in the AutoMR_summary.dat file in your output directory. This file lists the key files that were produced in your run of AutoMR (all these are in the output directory) and some of the key statistics for the run, including the overall log-likelihood gain. summary for this a2u-globulin-mr MR dataset:

 
**************** SOLUTION MR *******

 Log likelihood gain for solution MR: 1410.5121
 Output PDB files for solution MR: MR.1.pdb
 Output MTZ files for solution MR: MR.1.mtz
 Output log file for solution MR: MR.log
 Output summary file for solution MR: MR.sum

How do I know if I have a good solution?

Here are some of the things to look for to tell if you have obtained a correct solution:

Was a solution found? If not, then check first if you have asked for the correct number of molecules.
What is the log likelihood gain of the final solution? You want a high positive number, with 100 ok and 1000 very strong.

What to do next

Once you have run AutoMR and have obtained a good solution and model, the next thing to do is to run the AutoBuild Wizard. If you run it in the same directory where you ran AutoMR, the AutoBuild Wizard will pick up where the AutoMR Wizard left off and carry out iterative model-building, density modification and refinement to improve your model and map. See the web page Automated Model Building and Rebuilding with AutoBuild for details on how to run AutoBuild. If you do not obtain a good solution, then it's not time to give up yet. There are a number of standard things to try that may improve the structure determination. Here are a few that you should always try:

Have a careful look at all the output files. Work your way through the main log file (e.g., AutoMR_run_1_1.log) and the Phaser log file (MR.log). Is there anything strange or unusual in any of them that may give you a clue as to what to try next?

Additional information

For details about the AutoMR Wizard, see Automated molecular replacement with AutoMR. For help on running Wizards, see Using the PHENIX Wizards.