|Python-based Hierarchical ENvironment for Integrated Xtallography|
Tutorial 6: solving a structure with Phaser-MR
This tutorial shows how to use the Phaser-MR GUIs to solve structures by molecular replacement, using example data included with the Phenix distribution. We recommend reading the MR overview before running these tutorials (and for adventurous users, the primary Phaser reference as well).
A brief word about terminology: the search models used in Phaser are customarily named "ensembles", and in fact Phaser will often work best in difficult cases when a set of superposed models (generated using Ensembler or similar tools) is used. However, this is not required and for easily solved structures, such as the tutorial datasets used here, a single PDB file will be sufficient. Furthermore, while Phaser can search for multiple alternative ensembles and output only the best solution, here we will use just one candidate per component. Also note that the search models used here are single proteins, but this is not a requirement if the relative orientation of multiple chains are known (such as physiological dimers, multi-subunit complexes, or DNA helices).
An essential feature of Phaser is that it scores possible solutions based on the expected crystal composition (that is, the scattering mass). Therefore, for the examples shown here, we will be supplying sequence files for the crystallized proteins. We could, however, specify the molecular weight or number of residues instead. Importantly, the manner in which the composition is specified is completely decoupled from the "ensemble" (search model) input; where multiple proteins have been crystallized, we could supply separate sequence files (as is done here for the beta-blip example) or a single file with multiple sequences, and the result will be identical. While it is technically possible to run Phaser with just a search model and extrapolate the molecular weight based on the desired number of copies, this is not recommended.
For homogeneous structures where only one component needs to be placed, a streamlined interface is available. The tutorial data for this example is the A2U-globulin structure, which includes one each of data, PDB, and sequence files. In this interface, it is assumed that multiple PDB files are part of a single ensemble; if you instead want to try multiple equally likely ensembles and let Phaser pick the best one, use the advanced interface (described below).
Start the GUI by clicking "Phaser-MR (simple interface)", the first item in the list under "Molecular replacement." This interface has a single input field for all file types, and on most operating systems you can simply drag the three files from the project folder into the input field. Alternatively, click "Add file" to open a file dialog.
The symmetry information and labels list will automatically be populated when the reflections are loaded; you do not need to change the defaults. You must enter either the sequence identity or RMSD for each model in the ensemble; in this case an RMSD of 1.0 is appropriate for the single model used. If you have no specific idea of the accuracy of the model, it is best to simply provide the sequence identity to the target and let Phaser provide the corresponding RMSD. Because we already entered a sequence, no molecular weight is required, but the value for "Component copies" should be set to 4.
The next tab contains basic search options; you should change "Number of copies to search for" to 4. (While it may seem inefficient to specify the number of component copies and the number of search copies separately, this would allow us to search for a tetramer, while still only supplying the sequence of one chain.)
Usually it is best not to specify a resolution limit. Phaser chooses a resolution limit based on the particular problem (accuracy and completeness of the model, number of reflections). If the problem is expected to be straightforward, the resolution will automatically be limited to reduce runtime while preserving an excellent chance of solving the structure on the first pass.
The other controls on this tab do not need to modified, but the option for alternative space groups deserves special mention. If there is any ambiguity about the precise space group, for instance, if the data were processed as P222 but the exact combination of screw axes is unknown, or the space group could be either P6(1) or P6(5), Phaser can try multiple space groups and identify the best one at the translation function stage. The default is to try only the enantiomeric space group; this will have no effect for the examples used here (the MTZ files already contain the correct symmetry), but it would be appropriate for the P6(1) versus P6(5) case. The most permissive setting is "All possible in the same pointgroup".
Now we are ready to run Phaser: click the Run button on the toolbar. A new tab will appear with a status panel, a list of warnings, and summary output (basically excerpted from the log file). For many structures (including this one) there will be one or more warnings, but these do not necessarily mean that Phaser will not be successful. This example should finish in several minutes.
Once the run is complete, a results tab will appear, showing a summary of the solution and output files, with buttons to launch other programs. The two statistics reported are the log-likelihood gain (LLG), which expresses the probability of this solution compared to a random-atom model, and the translation function Z-score, which denotes the signal-to-noise ratio of the solution. In this case both are excellent, indicating that Phaser found an unambiguous solution.
At this point we recommend inspecting the maps and crystal packing in Coot. Because Phaser only performs rigid-body and group B-factor refinement, the map quality will most likely be relatively poor compared to a refined map, with many unexplained difference peaks, but they should still be interpretable for a correct solution.
The appropriate next step would be either rebuilding or refinement, both of which can be launched directly from the results tab. For this example we could go straight to refinement; if the structure was instead solved using homologous proteins, running AutoBuild would be a better choice.
The interface for multi-component searches is considerably more complicated, especially since the ensemble and composition inputs are split up. The tutorial data for this example is "TEM-1 Beta-lactamase/beta-lactamaseinhibitor complex" (or "beta-blip"), and includes five files: an MTZ file and two each of search models and sequence files.
Start the standard Phaser-MR GUI by clicking "Phaser-MR" under the "Molecular replacement" section. The first tab that appears defines the experimental data; click "Browse..." next to the "Data file" field and select the file named "beta_blip_P3221.mtz". The unit cell and space group will be automatically filled in, and the "Data labels" menu will be populated with suitable choices (only one in this case).
The second tab, "Ensembles", is where the search models are entered. First we will load the beta-lactamase model: for "Model ID", enter "beta", then click "Add file" below the list widget and select "beta.pdb". This will add it to the list with the "Variance" field blank. Click the "Change variance" button, and enter "100" for the sequence identity and click "OK".
At the bottom of the window, click "Add PDB ensemble". A second ensemble sub-tab will appear. Now repeat the procedure using "beta.pdb" instead. (The order in which these ensembles are input is technically not important, because Phaser will normally determine the appropriate search order automatically. However, in this case Phaser will indeed search for beta first, because it is a significantly larger protein.)
The third configuration tab, "Composition", is where we tell Phaser what scattering mass to expect in the crystal. If we know the exact molecular weight (or number of residues) of the entire structure, we could simply enter that here, but we will use sequences instead. The files "beta.seq" and "blip.seq" should be input in the "Sequence file" field in successive tabs; click the button "Add macromolecule(s)" at the bottom of the window to add a tab. The number of copies is 1 in both cases. (Here the order in which the composition is input really is unimportant.)
The final configuration tab, "Search procedure", is where we tell Phaser how to use the ensembles we input, along with several of the most important search options. Each search step has its own tab; assuming the ensembles have been entered appropriately, the tab will initially look like this:
Check "beta", leaving the number of copies set to 1, then click "Add search", and in the new tab, check "blip" (still with 1 copy). This interface allows you to check multiple ensembles in a single search, but this would instead try both as alternatives for the same component and pick just one, which is not what we want here. Each unique ensemble (or set of equally likely candidates) should get its own separate search tab.