phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated structure solution with AutoSol

Author(s)
Purpose
Usage
How the AutoSol Wizard works
Setting up inputs
Datasets and Solutions in AutoSol
Analyzing and scaling the data
Finding heavy-atom (anomalously-scattering atom) sites
Running AutoSol separately in each possible space group
Scoring of heavy-atom solutions
Phasing
Density modification (including NCS averaging)
Preliminary model-building and refinement
Resolution limits in AutoSol
Output files from AutoSol
How to run the AutoSol Wizard
Model viewing during model-building with the Coot-PHENIX interface
Examples
SAD dataset
SAD dataset specifying solvent fraction
SAD dataset without model-building
SAD dataset, building RNA instead of protein
SAD dataset, selecting a particular dataset from an MTZ file
SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss
MAD dataset
MAD dataset, selecting particular datasets from an MTZ file
SIR dataset
SAD with more than one anomalously-scattering atom
MIR dataset
SIR + SAD datasets
Possible Problems
General limitations
Specific limitations and problems
Literature
Additional information
List of all AutoSol keywords

Author(s)

  • AutoSol Wizard: Tom Terwilliger
  • PHENIX GUI and PDS Server: Nigel W. Moriarty
  • HYSS: Ralf W. Grosse-Kunstleve and Paul D. Adams
  • Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
  • SOLVE: Tom Terwilliger
  • RESOLVE: Tom Terwilliger
  • TEXTAL: K. Gopal, T.R. Ioerger, R.K. Pai, T.D. Romo, J.C. Sacchettini
  • phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams
  • phenix.xtriage: Peter Zwart

Purpose

The AutoSol Wizard uses HYSS, SOLVE, Phaser, RESOLVE, TEXTAL, xtriage and phenix.refine to solve a structure and generate experimental phases with the MAD, MIR, SIR, or SAD methods. The Wizard begins with datafiles (.sca, .hkl, etc) containing amplitidues of structure factors, identifies heavy-atom sites, calculates phases, carries out density modification and NCS identification, and builds and refines a preliminary model.

Usage

The AutoSol Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here, except for MIR and multiple datasets, which can only be run with the GUI or with a script.

How the AutoSol Wizard works

The basic steps that the AutoSol Wizard carries out are described below. They are: Setting up inputs, Analyzing and scaling the data, Finding heavy-atom (anomalously-scattering atom) sites, Scoring of heavy-atom solutions, Phasing, Density modification (including NCS averaging), and Preliminary model-building and refinement. The data for structure solution are grouped into Datasets and solutions are stored in Solution objects.

Setting up inputs

The AutoSol Wizard expects the following basic information:

(1) a datafile name (w1.sca or data=w1.sca)

(2) a sequence file (seq.dat or seq_file=seq.dat)

(3) how many sites to look for (2 or sites=2)

(4) what the anomalously-scattering atom is (Se or atom_type=Se)

(5) If you have SAD or MAD data, then it is helpful to add f_prime and f_double_prime for each wavelength.

You can also specify many other parameters, including resolution, number of sites, whether to search in a thorough or quick fashion, how thoroughly to build a model, etc. If you have a heavy-atom solution from a previous run or another approach, you can read it in directly as well.

Datasets and Solutions in AutoSol

AutoSol breaks down the data for a structure solution into datasets, where a dataset is a set of data that corresponds to a single set of heavy-atom sites. An entire MAD dataset is a single dataset. An MIR structure solution consists of several datasets (one for each native-derivative combination). A MAD + SIR structure has one dataset for the MAD data and a second dataset for the SIR data. The heavy-atom sites for each dataset are found separately (but using difference Fouriers from any previously-solved datasets to help). In the phasing step all the information from all datasets is merged into a single set of phases.

The AutoSol wizard uses a "Solution" object to keep track of heavy-atom solutions and the phased datasets that go with them. There are two types of Solutions: those which consist of a single dataset (Primary Solutions) and those that are combinations of datasets (Composite Solutions). "Primary" Solutions have information on the datafiles that were part of the dataset and on the heavy-atom sites for this dataset. Composite Solutions are simply sets of Primary Solutions, with associated origin shifts. The hand of the heavy-atom or anomalously-scattering atom substructure is part of a Solution, so if you have two datatsets, each with two Solutions related by inversion, then AutoSol would normally construct four different Composite Solutions from these and score each one as described below.

Analyzing and scaling the data

The AutoSol Wizard analyzes input datasets with phenix.xtriage to identify twinning and other conditions that may require special care. The data is scaled with SOLVE. For MAD data, FA values are calculated as well.

Note on anisotropy corrections:

The AutoSol wizard will apply an anistropy correction to all the raw experimental data if any of the files in the first dataset read in have a very strong anisotropy. You can tell the Wizard how much anisotropy there must be before applying this correction by default using the keywords

correct_aniso=True  # (if True or False then always or never apply correction)

delta_b_for_auto_correct_aniso=20  # correct if range of anisotropic B 
                                   #is greater than 20

ratio_b_for_auto_correct_aniso=1.5  #correct if the ratio of the largest 
                                  #to smallest anisotropic B is greater than 1.5

If an anisotropy correction is applied then a separate refinement file must be specified if refinement is to be carried out. This is because it is best to refine against data that have not been corrected for anisotropy (instead applying the correction as part of refinement).

Finding heavy-atom (anomalously-scattering atom) sites

The AutoSol Wizard uses HYSS to find heavy-atom sites. The result of this step is a list of possible heavy-atom solutions for a dataset. For SIR or SAD data, the isomorphous or anomalous differences, respectively are used as input to HYSS. For MAD data, the anomalous differences at each wavelength, and the FA estimates of complete heavy-atom structure factors from SOLVE are each used as separate inputs to HYSS. Each heavy-atom substructure obtained from HYSS corresponds to a potential solution. In space groups where the heavy-atom structure can be either hand, a pair of enantiomorphic solutions is saved for each run of HYSS.

Running AutoSol separately in each possible space group

AutoSol will check for the opposite hand of the heavy-atom solution, but it will not check for the opposite hand of your space group. If you have a space group that is enantiomorphic (i.e., P61), then you will need to run AutoSol once using each of the two possible space groups (i.e., P61 and P65 in this example.) If there are more possibilities for your space group, then you should test them all. For example if you were not able to measure 00l reflections in a hexagonal space group, your space group might be P6, P61, P62, P63, P64 or P65. In this case you should run it in all these space groups. Normally only one of these will give a plausible solution.

Scoring of heavy-atom solutions

Potential heavy-atom solutions are scored based on a set of criteria (CC, RFACTOR, SKEW, FOM, NCS_OVERLAP, TRUNCATION, REGIONS, SD; described below), using either a Bayesian estimate, a linear regression, or a Z-score system to put all the scores on a common scale and to combine them into a single overall score.

The overall scoring method chosen (BAYES-CC, CC-SCORE, or Z-SCORE) is determined by the value of the keyword overall_score_method. The default is BAYES-CC.

Note that for all scoring methods, the map that is being evaluated, and the estimates of map-perfect-model correlation, refer to the experimental electron density map, not the density-modified map.

Bayesian CC scores (BAYES-CC). Bayesian estimates of the quality of experimental electron density maps are obtained using data from a set of previously-solved datasets. The standard scoring criteria were evaluated for 1905 potential solutions in a set of 246 MAD, SAD, and MIR datasets. As each dataset had previously been solved, the correlation between the refined model and each experimental map (CC_PERFECT) could be calculated for each solution (after offsetting the maps to account for origin differences). Histograms were tabulated of the number of instances that a scoring criterion (e.g., SKEW) had various possible values, as a function of the CC_PERFECT of the corresponding experimental map to the refined model. These histograms yield the relative probability of measuring a particular value of that scoring criterion (SKEW), given the value of CC_PERFECT. Using Bayes' rule, these probabilities can be used to estimate the relative probabilities of values of CC_PERFECT given the value of each scoring criterion for a particular electron density map. The mean estimate (BAYES-CC) is reported (multiplied x 100), with a +/-2SD estimate of the uncertainty in this estimate of CC_PERFECT.

The BAYES-CC values are estimated independently for each scoring criterion used, and also from all those selected with the keyword score_type_list and not selected with the keyword skip_score_list.

Regression-based CC-scores (CC-SCORE). A linear regression is used to estimate the value of the square of the correlation of the experimental map and the (unknown) perfect map. The regression is based on values of the scoring criteria obtained in test runs of the AutoSol Wizard, along with the square of the actual correlation of the maps to model maps. These data were generated from 52 MAD, SAD, and MIR experimental datasets run through the AutoSol Wizard, and include 489 sets of values of scoring criteria with an associated model-map correlation. The resulting coefficients can then be used to generate and estimate of the square of the correlation of any experimental map to the correct map for this structure. The square of the correlation is estimated in this process because it is more closely linearly related to the scoring criteria than is the correlation itself. Taking the square root gives an estimate of the correlation of the experimental map to the correct map for the structure. When multiplied by 100, this estimated CC is referred to as the CC-SCORE. The mean estimate (CC-SCORE) is reported, with a +/-2SD estimate of the uncertainty in this estimate of CC_PERFECT.

The CC-SCORE values are estimated independently for each scoring criterion used, and also from all those selected with the keyword score_type_list and not selected with the keyword skip_score_list. Coefficients for these regressions can be entered with the keywords score_weight_list, score_overall_scale, score_overall_offset, score_overall_sd, and score_individual_scale_list, score_individual_offset_list, score_individual_sd_list.

Z-scores (Z-SCORE). The Z-score for one criterion for a particular solution is given by,

Z= (Score - mean_random_solution_score)/(SD_of_random_solution_scores)
where Score is the score for this solution, mean_random_solution_score is the mean score for a solution with randomized phases, and SD_of_random_solution_scores is the standard deviation of the scores of solutions with randomized phases.

To create a total score based on Z-scores, the Z-scores for each criterion are simply summed.

The principal scoring criteria are:

(1) Correlation of map-phased electron density map with experimentally- phased map (CC). The statistical density modification in RESOLVE allows the calculation of map-based phases that are (mostly) independent of the experimental phases. The phase information in statistical density modification comes from two sources: your experimental phases and maximization of the agreement of the map with expectations (such as a flat solvent region). Normally the phase probabilities from these two sources are merged together, yielding your density-modified phases. This score is calculated based on the correlation of the phase information from these two sources before combining them, and is a good indication of the quality of the experimental phases. This criterion is used in scoring by default.

(2) The R-factor for density modification (R-Factor). Statistical density modification provides an estimate of structure factors that is (mostly) independent of the measured structure factors, so the R-factor between FC and Fobs is a good measure of the quality of experimental phases. This criterion is used in scoring by default.

(3) The skew (third moment or normalized <rho**3>) of the density in an electron density map is a good measure of its quality, because a random map has a skew of zero (density histograms look like a Gaussian), while a good map has a very positive skew (density histograms very strong near zero, but many points with very high density). This criterion is used in scoring by default.

(4) Non-crystallographic symmetry (NCS overlap). The presence of NCS in a map is a nearly-positive indication that the map is good, or has some correct features. The AutoSol Wizard uses symmetry in heavy-atom sites to suggest NCS, and RESOLVE identifies the actual correlation of NCS-related density for the NCS overlap score. This score is used by default if NCS is present in the Z-score method of scoring. By default it is printed out but not used in scoring in the CC-SCORE method.

(5) Figure of merit (FOM). The figure of merit of phasing is a good indicator of the internal consistency of a solution. This score is not normalized by the SD of randomized phase sets (as that has no meaning; rather a standard SD=0.05 is used). This score is used by default if NCS is present in the Z-score method of scoring and in the Bayesian CC estimate method. By default it is printed out but not used in scoring in the CC-SCORE method.

(6) Map correlation after truncation (TRUNCATION). Dummy atoms (the same number as estimated non-hydrogen atoms in the structure) are placed in positions of high density of the map, and a new map is calculated based on these atomic positions. The correlation of these maps is calculated after adjusting an overall B-value for the dummy atoms to maximize the correlation. A good map will show a high correlation of these maps. This score is by default not used.

(7) Number of contiguous regions per 100 A**3 comprising top 5% of density in map (REGIONS). The top 5% of points in the map are marked, and the number of contiguous regions that result are counted, and divided by the volume of the asymmetric unit, then multiplied by 100. A good map will have just a few contiguous regions at a high contour level, a poor map will have many isolated peaks. This score is by default not used.

(8) Standard deviation of local rms density (SD). The local rms density in the map is calculated using a smoothing radius of 3 times the high-resolution cutoff (or 6 A, if less than 6A). Then the standard deviation of the local rms, normalized to the mean value of the local rms, is reported. This criteria will be high if there are regions of high local rms (the macromolecule) and separate regions of low local rms (the solvent) and low if the map is random. This score is by default not used.

Phasing

The AutoSol Wizard uses Phaser to calculate experimental phases from SAD data, and SOLVE to calculate phases from MIR, MAD, and multiple-dataset cases.

Density modification (including NCS averaging)

The AutoSol Wizard uses RESOLVE to carry out density modification. It identifies NCS from symmetries in heavy-atom sites with RESOLVE and applies this NCS if it is present in the electron density map.

Preliminary model-building and refinement

The AutoSol Wizard carries out one cycle of model-building and refinement after obtaining density-modified phases. The model-building can be with RESOLVE or with TEXTAL. The refinement is carried out with phenix.refine.

Resolution limits in AutoSol

There are several resolution limits used in AutoSol. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name

Description

How default value is set

resolution

Overall resolution for a dataset

Highest resolution for any datafile in this dataset. For multiple datasets, the highest resolution for any dataset

refinement_resolution

Resolution for refinement

value of "resolution"

resolution_build

Resolution for model-building

value of "resolution"

res_phase

Resolution for phasing for a dataset

If phase_full_resolution=True then use value of "resolution". Otherwise, use value of "recommended_resolution" based on analysis of signal-to-noise in dataset.

res_eval

Resolution for evaluation of solution quality

value of "resolution" or 2.5 A, whichever is lower resolution.

Output files from AutoSol

When you run AutoSol the output files will be in a subdirectory with your run number:

AutoSol_run_1_/

The key output files that are produced are:

  • A summary file listing the results of the run and the other files produced:
    AutoSol_summary.dat  # overall summary
    

  • A warnings file listing any warnings about the run
    AutoSol_warnings.dat  # any warnings
    

  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    AutoSol_Facts.dat   # all Facts about the run
    

  • NCS information (if any)
    AutoSol_15.ncs_spec   # NCS information. The number is the solution number
    

  • Experimental phases and HL coefficients
    solve_15.mtz  # either solve or phaser depending on which was run
    phaser_15.mtz
    

  • Density-modified phases from RESOLVE
    current_cycle_map_coeffs.mtz  # map coefficients (density modified phases)
    resolve_15.mtz   # density-modified phases; same as above
    
    For either of these, use FP PHIM FOMM for PHI F FOM.

  • An mtz file for use in refinement
    exptl_fobs_phases_freeR_flags_15.mtz  # F Sigma HL coeffs, freeR-flags for refinement
    

  • Heavy atom sites in PDB format
    ha_15.pdb_formatted.pdb
    

  • Current preliminary model and evaluation of model
    current_cycle.pdb
    current_cycle_eval.log
    

How to run the AutoSol Wizard

Running the AutoSol Wizard is easy. From the command-line you can type:

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5

The AutoSol Wizard will assume that w1.sca is a datafile (because it ends in .sca and is a file) and that seq.dat is a sequence file, that there are 2 heavy-atom sites, and that the heavy-atom is Se. The f_prime and f_double_prime values are set explicitly

You can also specify each of these things directly:

phenix.autosol data=w1.sca seq_file=seq.dat sites=2 \
   atom_type=Se f_prime=-8 f_double_prime=4.5

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

sites=3     # 3 sites
sites_file=sites.pdb  # ha sites in PDB or fractional xyz format
atom_type=Se   # Se is the heavy-atom
seq_file=seq.dat   # sequence file (1-aa code, separate chains with >>>>)
quick=True  # try to find sites quickly
data=w1.sca  # input datafile
f_prime=-5  # f-prime value for SAD
f_double_prime=4.5  # f-double-prime value for SAD

Model viewing during model-building with the Coot-PHENIX interface

The AutoSol Wizard allows you to view the current best model that is produced by the automated model-building process. This capability is identical to the view/edit model procedure available in the AutoBuild Wizard. Normally you would use it just to view the model in AutoSol, and to view and edit a model in AutoBuild .

The PHENIX-Coot interface is accessible through the GUI and via the command-line. Using the GUI, when a model has been produced by the AutoSol Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start Coot with your current map and model. If you are running from the command-line, you can open a new window and type:

phenix.autobuild coot 
which will do the same (provided the necessary map and model are ready).

When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. If you want, you can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. In AutoSol, only the main-chain atoms of the model you save are considered, and the side-chains are ignored. Ligands and solvent in the model are ignored as well.

As the AutoSol Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX.

Examples

SAD dataset

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5
The sequence file is used to estimate the solvent content of the crystal and for model-building. Note that for a SAD dataset the value of f_prime and f_double_prime are not critical. If you are off by a factor of 2 on f_double_prime, the refined occupancies of heavy-atom sites might be 1/2 their correct values.

SAD dataset specifying solvent fraction

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
    solvent_fraction=0.45
This will force the solvent fraction to be 0.45. This illustrates a general feature of the Wizards: they will try to estimate values of parameters, but if you input them directly, they will use your input values.

SAD dataset without model-building

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
    build=False
This will carry out the usual structure solution, but will skip model-building

SAD dataset, building RNA instead of protein

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
    chain_type=RNA
This will carry out the usual structure solution, but will build an RNA chain. For DNA, specify chain_type=DNA. You can only build one type of chain at a time in the AutoSol Wizard. To build protein and DNA, use the AutoBuild Wizard and run it first with chain_type=PROTEIN, then run it again specifying the protein model as input_lig_file_list=proteinmodel.pdb and with chain_type=DNA.

SAD dataset, selecting a particular dataset from an MTZ file

If you have an input MTZ file with more than one anomalous dataset, you can type something like:

phenix.autosol w1.mtz seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
labels='F SIGF DANO SIGDANO'
This will carry out the usual structure solution, but will choose the input data columns based on the labels: 'F SIGF DANO SIGDANO'. If you run the AutoSol Wizard with SAD data and an MTZ file containing more than one anomalous dataset and don't tell it which one to use, all possible values of labels are printed out for you so that you can just paste the one you want in.

You can also find out all the possible label strings to use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss

phenix.autosol 11 Pb data=deriv.sca seq_file=seq.dat \
  sites_file=deriv_hyss_consensus_model.pdb 
This will carry out the usual structure solution process, but will read sites from deriv_hyss_consensus_model.pdb, try both hands, and carry on from there. If you know the hand of the substructure, you can fix it with have_hand=True.

MAD dataset

The inputs for a MAD dataset need to specify f_prime and f_double_prime for each wavelength. It also must be clear what datafile goes with which wavelength. If you input an MTZ file with multiple datasets, then the order of those datasets is assumed to be the same as the order of the wavelengths. You may want to either select particular datasets from your MTZ file (see below) or split such an MTZ file into separate files for each dataset if this does not work in the way you expect.

phenix.autosol  seq_file=seq.dat sites=2 atom_type=Se  \
peak.data=w1.sca   peak.f_prime=-8   peak.f_double_prime=4.5 \
infl.data=w2.sca   infl.f_prime=-9   infl.f_double_prime=1.9 \
high.data=w3.sca   high.f_prime=-5   high.f_double_prime=3.0 

MAD dataset, selecting particular datasets from an MTZ file

This is similar to the case for SAD data.If you have an input MTZ file with more than one anomalous dataset, you can type something like:

phenix.autosol  seq_file=seq.dat sites=2 atom_type=Se  \
peak.data=all_data.mtz   peak.f_prime=-8   peak.f_double_prime=4.5 \
high.data=all_data.mtz   high.f_prime=-5   high.f_double_prime=3.0 \
peak.labels='Fpeak SIGFpeak DANOpeak SIGDANOpeak' \
high.labels='Fhigh SIGFhigh DANOhigh SIGDANOhigh' 
This will carry out the usual structure solution, but will choose the input peak data columns based on the labels: 'Fpeak SIGFpeak DANOpeak SIGDANOpeak', and the high data from the ones labelled 'Fhigh SIGFhigh DANOhigh SIGDANOhigh'.

As in the SAD case, you can find out all the possible label strings to use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

SIR dataset

The standard inputs for an SIR dataset are the native and derivative, the sequence file, the heavy-atom type, and the number of sites, as well as whether to use anomalous differences (or just isomorphous differences):

phenix.autosol native.data=native.sca deriv.data=deriv.sca \
   deriv.atom_type=I deriv.sites=2 deriv.inano=inano
This will set the heavy-atom type to Iodine, look for 2 sites, and include anomalous differences.

SAD with more than one anomalously-scattering atom

You can tell the AutoSol wizard to look for more than one anomalously- scattering atom. Specify one atom type (Se) in the usual way. Then specify any additional ones like this if you are running AutoSol from the command line:

mad_ha_add_list="Br Pt"
mad_ha_add_f_prime_list=" -7 -10"
mad_ha_add_f_double_prime_list=" 4.2 12"
There must be the same number of entries in each of these three keyword lists. During phasing Phaser will try to add whichever atom types best fit the scattering from each new site. This option is available for SAD phasing only.

MIR dataset

An MIR dataset is a set of more than one datasets. This cannot be readily expressed in the command-line inputs, but you can specify it easily with the PHENIX AutoSol GUI or with a script. In a script file you can say:

cell 93.796  79.849  43.108  90.000  90.000  90.00   # cell params
resolution 2.8                             #  Resolution 
expt_type       sir                        # MIR dataset is set of SIR datasets
input_seq_file sequence.dat
############## DATASET 1 ################
input_file_list  rt_rd_1.sca auki_rd_1.sca #  Native  and deriv 1
nat_der_list    Native  Au                 # identify files by ha type
inano_list      noinano inano              # say if ano diffs to be used 
n_ha_list       0    5                     # number of heavy-atoms 
run_list        start                      # read in datafiles for dataset
run_list        read_another_dataset       # about to start a new dataset here
############## DATASET 2 ################
input_file_list  rt_rd_1.sca hgki_rd_1.sca # Native and deriv 2
nat_der_list    Native Hg                  
inano_list      noinano inano              
n_ha_list       0    5  
#########################################

The script file carries out steps in the order that they are input. This allows us to read in one entire dataset, save it, then read in another one. The AutoSol Wizard will solve each dataset and then combine them and phase the combined datset with SOLVE Bayesian correlated phasing, taking into account any correlations among the non-isomorphism and heavy-atom sites for the various derivatives.

SIR + SAD datasets

A combination of SIR and SAD datasets is almost the same as an MIR dataset in the AutoSol Wizard. You specify each dataset separately, and put "start" and "read_another_dataset" between the datasets:

cell 93.796  79.849  43.108  90.000  90.000  90.00   # cell params
resolution 2.8                             #  Resolution 
input_seq_file sequence.dat
############## DATASET 1 ################
expt_type       sir                        # MIR dataset is set of SIR datasets
input_file_list  rt_rd_1.sca auki_rd_1.sca #  Native  and deriv 1
nat_der_list    Native  Au                 # identify files by ha type
inano_list      noinano inano              # say if ano diffs to be used 
n_ha_list       0    5                     # number of heavy-atoms 
run_list        start                      # read in datafiles for dataset
run_list        read_another_dataset       # about to start a new dataset here
############## DATASET 2 ################
expt_type       sad                        # our second dataset is SAD
input_file_list  hgki_rd_1.sca             # anom diffs for SAD dataset
mad_ha_n  5                                # 5 sites
#########################################

The SIR and SAD datasets will be solved separately (but whichever one is solved first will use difference Fourier or anomalous difference Fourier's to locate sites for the other). Then phases will be combined by addition of Hendrickson-Lattman coefficients and the combined phases will be density modified.

Possible Problems

General limitations

Specific limitations and problems

  • The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoSol wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.

  • The command-line version of AutoSol cannot be used for MIR or for combining multiple datasets. The script and GUI versions can be used instead for these cases.

  • The AutoSol Wizard can take a maximum of 6 derivatives for MIR.

  • The AutoSol Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Simple algorithm for a maximum-likelihood SAD function. A..J. McCoy, L.C. Storoni and R.J. Read. Acta Cryst. D60, 1220-1228 (2004)
[pdf]
Substructure search procedures for macromolecular structures. R.W. Grosse-Kunstleve and P.D. Adams. Acta Cryst. D59, 1966-1973 (2003)
[pdf]
MAD phasing: Bayesian estimates of FA T. C. Terwilliger Acta Cryst. D50 , 11-16 (1994)
[pdf]

Additional information

List of all AutoSol keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
autosol
   write_run_directory_to_file= None Writes the full name of a run directory
                                to the specified file. This can be used as a
                                call-back to tell a script where the output is
                                going to go. (Command-line only)
   coot= None Set coot to True and optionally run=[run-number] to run Coot
         with the current model and map for run run-number. In some wizards
         (AutoBuild) you can edit the model and give it back to PHENIX to use
         as part of the model-building process. If you just say coot then the
         facts for the highest-numbered existing run will be shown.
         (Command-line only)
   ignore_blanks= None ignore_blanks allows you to have a command-line keyword
                  with a blank value like "input_lig_file_list="
   stop= None You can stop the current wizard with "stopwizard" or "stop". If
         you type "phenix.autobuild run=3 stop" then this will stop run 3 of
         autobuild. (Command-line only)
   display_facts= None Set display_facts to True and optionally
                  run=[run-number] to display the facts for run run-number. If
                  you just say display_facts then the facts for the
                  highest-numbered existing run will be shown. (Command-line
                  only)
   display_summary= None Set display_summary to True and optionally
                    run=[run-number] to show the summary for run run-number.
                    If you just say display_summary then the summary for the
                    highest-numbered existing run will be shown. (Command-line
                    only)
   carry_on= None Set carry_on to True to carry on with highest-numbered run
             from where you left off. (Command-line only)
   run= None Set run to n to continue with run n where you left off.
        (Command-line only)
   copy_run= None Set copy_run to n to copy run n to a new run and continue
             where you left off. (Command-line only)
   display_runs= None List all runs for this wizard. (Command-line only)
   delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
   display_labels= None display_labels=test.mtz will list all the labels that
                   identify data in test.mtz. You can use the label strings
                   that are produced in AutoSol to identify which data to use
                   from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" #
                   the entire string in quotes counts here You can use the
                   individual labels from these strings as identifiers for
                   data columns in AutoSol and AutoBuild like this:
                   input_refinement_labels="FP SIGFP FreeR_flags" # each
                   individual label counts
   dry_run= False Just read in and check parameter names
   sites= None Number of heavy-atom sites. This is an alias for the keyword
          mad_ha_n. (Command-line only)
   sites_file= None PDB or plain-text file with ha sites. This is an alias for
               the keyword ha_sites_file. (Command-line only)
   atom_type= None Anomalously-scattering atom type. This is an alias for the
              keyword mad_ha_type. (Command-line only)
   seq_file= Auto Sequence file . This is an alias for the keyword
             input_seq_file. (Command-line only)
   quick= *None True False Run everything quickly (thoroughness=quick)
          (Command-line only)
   data= None Datafile. For command_line input it is easiest if each
         wavelength of data is in a separate data file with obvious data
         columns. File types that are easy to read include Scalepack sca files
         , CNS hkl files, mtz files with just one wavelength of data, or just
         native or just derivative. In this case the Wizard can read your data
         without further information. If you have a datafile with many
         columns, you can use the "labels" keyword to specify which data
         columns to read. (It may be easier in some cases to use the GUI or to
         split it with phenix.reflection_file_converter first, however.)
         (Command-line only)
   labels= None Specification string for data labels (Command_line only). To
           find out what the appropriate strings are, type "
           phenix.autosol display_labels=your-datafile-here.mtz "
   f_prime= None F-prime value for any wavelength. (Command-line only)
   f_double_prime= None F-doubleprime value for any wavelength. (Command_line
                   only)
   acceptable_quality= 40.0 You can specify the minimum overall quality of a
                       model (as defined by overall_score_method) to be
                       considered acceptable
   acceptable_secondary_structure_cc= 0.35 You can specify the minimum
                                      correlation of density from a secondary
                                      structure model to be considered
                                      acceptable
   add_sidechains= *Yes No True False Add side chains on to main-chain in
                   Textal model-building. This requires a sequence file
   b_overall= None If an anisotropy correction is applied, you can choose to
              set the overall B of the data to a specific value with
              b_overall. See also "correct_aniso"
   background= *Yes No True False When you specify nproc=nn, you can run the
               jobs in background (default if nproc is greater than 1) or
               foreground (default if nproc=1). If you set run_command=qsub
               (or otherwise submit to a batch queue), then you should set
               background=False, so that the batch queue can keep track of
               your runs. There is no need to use background=True in this case
               because all the runs go as controlled by your batch system. If
               you use run_command=csh (or similar, csh is default) then
               normally you will use background=True so that all the jobs run
               simultaneously.
   build= *Yes No True False Build model after density modification?
   build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build
               models with RESOLVE and TEXTAL or either one, and how many
               different models to build with RESOLVE. The more you build, the
               more likely to get a complete model. Note that rebuild_in_place
               can only be carried out with RESOLVE model-building
   capra= *Yes No True False CAPRA is used to place CA atoms
   cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta gamma
   chain_type= *Auto PROTEIN DNA RNA You can specify whether to build protein,
               DNA, or RNA chains. At present you can only build one of these
               in a single run. If you have both DNA and protein, build one
               first, then run AutoBuild again, supplying the prebuilt model
               in the "input_lig_file_list" and build the other. NOTE: default
               for this keyword is Auto, which means "carry out normal process
               to guess this keyword". The process is to look at the sequence
               file and/or input pdb file to see what the chain type is. If
               there are more than one type, the type with the larger number
               of residues is guessed. If you want to force the chain_type,
               then set it to PROTEIN RNA or DNA.
   change_sg= Yes *No True False You can change the space group. In AutoSol
              the Wizard will use ImportRawData and let you specify the sg and
              cell. In AutoMR the wizard will give you an entry form to
              specify them. NOTE: This only applies when reading in new
              datasets. It does nothing when changed after datasets are read
              in.
   cif_def_file_list= None You can enter any number of CIF definition files.
                      These are normally used to tell phenix.refine about the
                      geometry of a ligand or unusual residue. You usually
                      will use these in combination with "PDB file with
                      metals/ligands" (keyword "input_lig_file_list" ) which
                      allows you to attach the contents of any PDB file you
                      like to your model just before it gets refined. You can
                      use phenix.elbow to generate these if you do not have a
                      CIF file and one is requested by phenix.refine
   clean_up= Yes *No True False At the end of the entire run the TEMP
             directories will be removed if clean_up is True. The default is
             No, keep these directories. If you want to remove them after your
             run is finished use a command like "phenix.autobuild run=1
             clean_up=True"
   coot_name= coot If your version of coot is called something else, then you
              can specify that here.
   correct_aniso= *Auto Yes No True False Choose if you want to apply a
                  correction for anisotropy to the data. Yes means always
                  apply correction, No means never apply it, Auto means apply
                  it if the data is severely anisotropic (recommended=Auto).
                  If you set correct_aniso=Auto then if the range of
                  anisotropic B-factors is greater than
                  delta_b_for_auto_correct_aniso and the ratio of the largest
                  to the smallest less than ratio_b_for_auto_correct_aniso
                  then the correction will be applied. Anisotropy correction
                  will be applied to all input data before scaling. The
                  default overall B factor will be the minimum of the
                  b-factors in any direction of the original data. To set this
                  to another value, use "b_overall"
   create_scoring_table= Yes *No True False Choose whether you want a scoring
                         table for solutions A scoring table is slower but
                         better
   d_max_textal= 1000.0 This low-resolution limit is only used for Textal
                 model-building
   d_min_textal= 2.8 Textal has an optimal high-resolution limit of 2.8 A This
                 limit is only used for Textal model-building
   data_quality= *moderate strong weak The defaults are set for you depending
                 on the anticipated data quality. You can choose "moderate" if
                 you are unsure.
   debug= Yes *No True False You can have the wizard stop with error messages
          about the code if you use debug. NOTE: you cannot use Pause with
          debug.
   delta_b_for_auto_correct_aniso= 20.0 Choose what range of aniso B values is
                                   so big that you want to correct for
                                   anisotropy by default. Both ratio_b and
                                   delta_b must be large to correct. see also
                                   ratio_b_for_auto_correct_aniso See also
                                   "correct_aniso" which overrides this
                                   default if set to "Yes"
   desired_coverage= 0.8 Choose what probability you want to have that the
                     correct solution is in your current list of top
                     solutions. A good value is 0.80. If you set a low value
                     (0.01) then only one solution will be kept at any time;
                     if you set a high value, then many solutions will be kept
                     (and it will take longer).
   do_madbst= *Yes No True False Choose whether you want to skip FA
              calculation (speeds it up)
   expt_type= *Auto mad sir sad Experiment type (MAD SIR SAD) NOTE: Please
              treat MIR experiments as a set of SIR experiments. NOTE: The
              default for this keyword is Auto which means "carry out normal
              process to guess this keyword". If you have a single file, then
              it is assumed to be SAD. If you specify native.data and
              deriv.data it is SIR, if you specify peak.data and infl.data it
              is MAD. If the Wizard does not guess correctly, you can set it
              with this keyword.
   extra_verbose= Yes *No True False Facts and possible commands will be
                  printed every cycle if Yes
   f_doubleprime_list= None Enter f" for the heavy-atom for this dataset
   f_prime_list= None Enter f' for the heavy-atom for this dataset
   find_ncs= Auto *Yes No True False This script normally deduces ncs
             information from the NCS in chains of models that are built
             during iterative model-building. The update is done each cycle in
             which an improved model is obtained. Say No to skip this. See
             also "input_ncs_file" which can be used to specify NCS at the
             start of the process. If find_ncs="No" then only this starting
             NCS will be used and it will not be updated. You can use find_ncs
             "No" to specify exactly what residues will be used in NCS
             refinement and exactly what NCS operators to use in density
             modification. You can use the function
             $PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to help
             you set up an input_ncs_file that has your specifications in it.
   fit_loops= *Yes No True False You can fit loops automatically if sequence
              alignment has been done.
   fix_xyz= Yes *No True False You can choose to not refine coordinates, and
            instead to fix them to the values found by the heavy-atom search.
   fix_xyz_after_denmod= Yes *No True False When sites are found after density
                         modification you can choose whether you want to fix
                         the coordinates to the values found in that map.
   fixscattfactors= *Yes No True False For SOLVE phasing and MAD data you can
                    choose whether scattering factors are to be fixed by
                    choosing 'Yes' to fix them or 'No' to refine them.
                    Normally choose 'Yes' (fix) if the data are weak and 'No'
                    (refine) if the data are strong.
   group_ca_length= 4 In resolve building you can specify how short a fragment
                    to keep. Normally 4 or 5 residues should be the minimum.
   group_labels_list= None For command-line and script running of AutoSol, you
                      may wish to use keywords to specify which set of data
                      columns to be used from an MTZ or other file type with
                      multiple datasets. (From the GUI, it is easy because you
                      are prompted with the column labels). You can do this by
                      specifying a string that identifies which dataset to
                      include. All allowed values of this identification
                      string will be written out any time AutoSol is run on
                      this dataset like this: " NOTE: To specify a
                      particular set of data you can specify one of the
                      following (this example is for MAD data, specifying data
                      for peak wavelength): ...: " peak.labels='F SIGF
                      DANO SIGDANO' peak.labels='F(+) SIGF(+) F(-) SIGF(-)'
                      " You can then use one of the above commands on the
                      command-line to identify the dataset of interest. If you
                      want to use a script instead, you can specify N files in
                      your input_data_file_list, and then specify N values for
                      group_labels_list like this: " group_labels_list
                      'F,SIGF,DANO,SIGDANO' 'F(+),SIGF(+),F(-),SIGF(-)' "
                      This will take 'F,SIGF,DANO,SIGDANO' as the data for
                      datafile 1 and 'F(+),SIGF(+),F(-),SIGF(-)' for datafile
                      2 You can identify one dataset from each input file in
                      this way. If you want more than one, then please use
                      phenix.reflection_file_converter to split your input
                      file, or else use the GUI version of AutoSol in which
                      you can select any subset of the data that you wish.
   group_length= 2 In resolve building you can specify how many fragments must
                 be joined to make a connected group that is kept. Normally 2
                 fragments should be the minimum.
   ha_iteration= Yes *No True False Choose whether you want to iterate the
                 heavy-atom search. With iteration, sites are found with HYSS,
                 then used to phase and carry out quick density-modification,
                 then difference Fourier is used to find sites again and
                 improve their accuracy.
   ha_sites_file= None Input sites file... with xyz in fractional coordinates
                  or a PDB file with coordinates NOTE: This file is optional
                  if you specify a partial model file
   have_hand= Yes *No True False Normally you will not know the hand of the
              heavy-atom substructure, so have_hand=False. However if you do
              know it (you got the sites from a difference Fourier or you know
              the answer another way) you can specify that the hand is known.
   helices_strands_only= *Yes No True False You can choose to use a quick
                         model-building method that only builds secondary
                         structure. At low resolution this may be both quicker
                         and more accurate than trying to build the entire
                         structure Normally you should choose 'Yes' and use
                         the quick model-building. Then when your structure is
                         solved by AutoSol, go on to AutoBuild and build a
                         more complete model.
   hklperfect= None Enter an mtz file with idealized coefficients for map This
               will be compared with all maps calculated during structure
               solution
   hl_in_resolve= Yes *No True False AutoSol normally does not write out HL
                  coefficients in the resolve.mtz file with density-modified
                  phases. You can turn them on with hl_in_resolve=True
   hyss_enable_early_termination= *Yes No True False You can specify whether
                                  to stop HYSS as soon as it finds a
                                  convincing solution (Yes, default) or to
                                  keep trying...
   hyss_general_positions_only= *Yes No True False Select Yes if you want HYSS
                                only to consider general positions and ignore
                                sites on special positions. This is
                                appropriate for SeMet or S-Met solutions, not
                                so appropriate for heavy-atom soaks
   hyss_min_distance= 3.5 Enter the minimum distance between heavy-atom sites
                      to keep them in HYSS
   hyss_n_fragments= 3 Enter the number of fragments in HYSS
   hyss_n_patterson_vectors= 33 Enter the number of Patterson vectors to
                             consider in HYSS
   hyss_random_seed= 792341 Enter an integer as random seed for HYSS
   i_ran_seed= 588459 Random seed (positive integer) for model-building and
               simulated annealing refinement
   id_scale_ref= None By default the datafile with the highest resolution is
                 used for the first step in scaling of MAD data. You can
                 choose to use any of the datafiles in your MAD dataset.
   ikeepflag= 1 You can choose to keep all reflections in merging steps. This
              is separate from rejecting reflections with high iso or ano
              diffs. Default=1 (keep them)
   inano_list= None Choose 'inano' for including anomalous differences and
               'noinano' not to include them and 'anoonly' for just anomalous
               differences (no isomorphous differences)
   input_compare_file= None If you are rebuilding a model or already think you
                       know what the model should be, you can include a
                       comparison file in rebuilding. The model is not used
                       for anything except to write out information on
                       coordinate differences in the output log files. NOTE:
                       this feature does not always work correctly.
   input_file_list= None Input data files: Any standard format is fine. If all
                    files are Scalepack premerged or all are Scalepack
                    unmerged original index then they will be used as is. In
                    all other cases all files are converted next to Scalepack
                    premerged.
   input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set then
                  density at sites specified with input_ha_file is truncated
                  to improve the density modification procedure.
   input_partpdb_file= None You can enter a PDB file (usually from molecular
                       replacement) for use in identifying heavy-atom sites
                       and phasing. NOTE 1: This procedure works best if the
                       model is refined. NOTE 2: This file is only used in SAD
                       phasing with Phaser on a single dataset. In all other
                       cases it is ignored. NOTE 3: The output phases in
                       phaser_xx.mtz will contain both SAD and model
                       information. They are not completely suitable for use
                       with AutoBuild or other iterative model-building
                       procedures because the phases are not entirely
                       experimental (but they may work).
   input_phase_file= None MTZ data file with FC PHIC or equivalent to use for
                     finding heavy-atom sites with difference Fourier methods.
   input_phase_labels= None Labels for FC and PHIC for data file with FC PHIC
                       or equivalent to use for finding heavy-atom sites with
                       difference Fourier methods.
   input_refinement_file= None Data file to use for refinement. The data in
                          this file should not be corrected for anisotropy. It
                          will be combined with experimental phase information
                          for refinement. If you leave this blank, then the
                          output of phasing will be used in refinement (see
                          below). If no anisotropy correction is applied to
                          the data you do not need to specify a datafile for
                          refinement. If an anisotropy correction is applied
                          to the data files, then you must enter a datafile
                          for refinement if you want to refine your model.
                          (See "correct_aniso" for specifying whether an
                          anisotropy correction is applied. In most cases it
                          is not.) If an anisotropy correction is applied and
                          no refinement datafile is supplied, then no
                          refinement will be carried out in the model-building
                          step. You can choose any of your datafiles to be the
                          refinement file, or a native that is not part of the
                          datasets for structure solution. If there are more
                          than one dataset you will be asked each time for a
                          refinement file, but only the last one will be used.
                          Any standard format is fine; normally only F and
                          sigF will be used. Bijvoet pairs and duplicates will
                          be averaged. If an mtz file is provided then a free
                          R flag can be read in as well. If you do not provide
                          a refinement file then the structure factors from
                          the phasing step will be used in refinement. This is
                          normally satisfactory for SAD data and MIR data. For
                          MAD data you may wish to supply a refinement file
                          because the structure factors from phasing are a
                          combination of data from different wavelengths of
                          data. It is better if you choose your best
                          wavelength of data for refinement.
   input_refinement_labels= None Labels for input refinement file columns (FP
                            SIGFP FreeR_flag)
   input_seq_file= Auto Enter name of file with 1-letter code of protein
                   sequence NOTES: 1. lines starting with >>> are
                   ignored and separate chains 2. FASTA format is fine 3. If
                   there are multiple copies of a chain, just enter one copy.
                   4. If you enter a PDB file for rebuilding and it has the
                   sequence you want, then the sequence file is not necessary.
                   NOTE: You can also enter the name of a PDB file that
                   contains SEQRES records, and the sequence from the SEQRES
                   records will be read, written to
                   seq_from_seqres_records.dat, and used as your input
                   sequence. NOTE: for AutoBuild you can specify
                   start_chains_list on the first line of your sequence file:
                   >>> start_chains_list 23 11 5 NOTE: default for
                   this keyword is Auto, which means "carry out normal process
                   to guess this keyword". This means if you specify
                   "after_autosol" in AutoBuild, AutoBuild will automatically
                   take the value from AutoSol. If you do not want this to
                   happen, you can specify None which means "No file"
   loop_cc_min= 0.4 You can specify the minimum correlation of density from a
                loop with the map.
   mad_ha_add_f_double_prime_list= None F-double_prime values of additional
                                   heavy-atom types. You must specify the same
                                   number of entries of
                                   mad_ha_add_f_double_prime_list as you do
                                   for mad_ha_add_f_prime_list and for
                                   mad_ha_add_list.
   mad_ha_add_f_prime_list= None F-prime values of additional heavy-atom
                            types. You must specify the same number of entries
                            of mad_ha_add_f_prime_list as you do for
                            mad_ha_add_f_double_prime_list and for
                            mad_ha_add_list.
   mad_ha_add_list= None You can specify heavy atom types in addition to the
                    one you named in mad_ha_type. The heavy-atoms found in
                    initial HySS searches will be given the type of
                    mad_ha_type, and Phaser (if used for phasing) will try to
                    find additional heavy atoms of both the type mad_ha_type
                    and any listed in mad_ha_add_list. You must also specify
                    the same number of mad_ha_add_f_prime_list entries and of
                    mad_ha_add_f_double_prime_list entries.
   mad_ha_n= None Number of heavy atoms (anomalously-scattering atoms) in the
             au
   mad_ha_type= Se Enter the anomalously-scattering or heavy atom type. For
                example, Se or Au. NOTE: if you want Phaser to add additional
                heavy-atoms of other types, you can specify them with
                mad_ha_add_list.
   mask_cycles= 5 Number of mask cycles in density modification (5 is usual
                for thorough density modification
   mask_type= *histograms probability wang Choose method for obtaining
              probability that a point is in the protein vs solvent region.
              Default is "histograms". If you have a SAD dataset with a heavy
              atom such as Pt or Au then you may wish to choose "wang" because
              the histogram method is sensitive to very high peaks. Options
              are: histograms: compare local rms of map and local skew of map
              to values from a model map and estimate probabilities. This one
              is usually the best. probability: compare local rms of map to
              distribution for all points in this map and estimate
              probabilities. In a few cases this one is much better than
              histograms. wang: take points with highest local rms and define
              as protein.
   max_cc_extra_unique_solutions= 0.5 Specify the maximum value of CC between
                                  experimental maps for two solutions to
                                  consider them substantially different.
                                  Solutions that are within the range for
                                  consideration based on desired_coverage, but
                                  are outside of the number of allowed
                                  max_choices, will be considered, up to
                                  max_extra_unique_solutions, if they have a
                                  correlation of no more than
                                  max_cc_extra_unique_solutions with all other
                                  solutions to be tested.
   max_choices= 3 Number of choices for solutions to put on screen
   max_composite_choices= 8 Number of choices for composite solutions to
                          consider
   max_extra_unique_solutions= 2 Specify the maximum number of solutions to
                               consider based on their uniqueness as well as
                               their high scores. Solutions that are within
                               the range for consideration based on
                               desired_coverage, but are outside of the number
                               of allowed max_choices, will be considered, up
                               to max_extra_unique_solutions, if they have a
                               correlation of no more than
                               max_cc_extra_unique_solutions with all other
                               solutions to be tested.
   max_ha_iterations= 2 Number of iterations of difference Fouriers in
                      searching for heavy-atom sites
   max_single_sites= 5 In sites_from_denmod a core set of sites that are
                     strong is identified. If the hand of the solution is
                     known then additional sites are added all at once up to
                     the expected number of sites. Otherwise sites are added
                     one at a time, up to a maximum number of tries of
                     max_single_sites
   max_wait_time= 100.0 You can specify the length of time (seconds) to wait
                  when testing the run_command. If you have a cluster where
                  jobs do not start right away you may need a longer time to
                  wait.
   min_fom= 0.05 Minimum fom of a solution to keep it at all
   min_fom_for_dm= 0.2 Minimum fom of a solution to density modify (otherwise
                   just copy over phases). This is useful in cases where the
                   phasing is so weak that density modification does nothing
                   or makes the phases worse.
   min_hyss_cc= 0.05 Minimum CC of a heavy-atom solution in HYSS to keep it at
                all
   minimum_improvement= 0.0 Minimum improvement in score to continue ha
                        iteration
   minor_cycles= 10 Number of minor cycles in density modification for each
                 mask cycle (10 is usual for thorough density modification
   n_cycle_build= 3 Choose number of cycles (3). This does not apply if TEXTAL
                  is selected for build_type
   n_ha_list= None Enter a guess of number of HA sites
   n_random= 6 Number of random solutions to generate when setting up scoring
             table
   n_random_frag= 0 In resolve building you can randomize each fragment
                  slightly so as to generate more possibilities for tracing
                  based on extending it.
   n_random_loop= 3 Number of randomized tries from each end for building
                  loops If 0, then one try. If N, then N additional tries with
                  randomization based on rms_random_loop.
   nat_der_list= None Enter 'Native' or a heavy-atom symbol (Pt, Se)
   nbatch= 1 You can specify the number of processors to use (nproc) and the
           number of batches to divide the data into for parallel jobs.
           Normally you will set nproc to the number of processors available
           and leave nbatch alone. If you leave nbatch as None it will be set
           automatically, with a value depending on the Wizard. This is
           recommended. The value of nbatch can affect the results that you
           get, as the jobs are not split into exact replicates, but are
           rather run with different random numbers. If you want to get the
           same results, keep the same value of nbatch.
   ncs_copies= None Number of copies of the molecule in the au (note: only one
               type of molecule allowed at present)
   ncs_refine_coord_sigma_from_rmsd= Yes *No True False You can choose to use
                                     the current NCS rmsd as the value of the
                                     sigma for NCS restraints. See also
                                     ncs_refine_coord_sigma_from_rmsd_ratio
   ncs_refine_coord_sigma_from_rmsd_ratio= 1.0 You can choose to multiply the
                                           current NCS rmsd by this value
                                           before using it as the sigma for
                                           NCS restraints See also
                                           ncs_refine_coord_sigma_from_rmsd
   ncycle_refine= 3 Choose number of refinement cycles (3)
   nproc= 1 You can specify the number of processors to use (nproc) and the
          number of batches to divide the data into for parallel jobs.
          Normally you will set nproc to the number of processors available
          and leave nbatch alone. If you leave nbatch as None it will be set
          automatically, with a value depending on the Wizard. This is
          recommended. The value of nbatch can affect the results that you
          get, as the jobs are not split into exact replicates, but are rather
          run with different random numbers. If you want to get the same
          results, keep the same value of nbatch.
   number_of_builds= 2 Number of different solutions to build models for
   number_of_models= 3 This parameter lets you choose how many initial models
                     to build with RESOLVE within a single build cycle. This
                     parameter is now superseded by number_of_parallel_models,
                     which sets the number of models (but now entire build
                     cycles) to carry out in parallel. A zero means set it
                     automatically. That is what you normally should use. The
                     number_of_models is by default set to 1 and
                     number_of_parallel_models is set to the value of nbatch
                     (typically 4).
   number_of_solutions_to_display= 1 Number of solutions to put on screen and
                                   to write out
   offsets_list= 53 7 23 You can specify an offset for the orientation of the
                 helix and strand templates in building. This is used in
                 generating different starting models.
   optimize_ncs= *Yes No True False This script normally deduces ncs
                 information from the NCS in chains of models that are built
                 during iterative model-building. Optimize NCS adds a step to
                 try and make the molecule formed by NCS as compact as
                 possible, without losing any point-group symmetry.
   ordered_solvent_low_resolution= None You can choose what resolution cutoff
                                   to use fo placing ordered solvent in
                                   phenix.refine. If the resolution of
                                   refinement is greater than this cutoff,
                                   then no ordered solvent will be placed,
                                   even if
                                   refinement.main.ordered_solvent=True.
   overall_score_method= *BAYES-CC CC-SCORE Z-SCORE You have 3 choices for an
                         overall scoring method: (1) Sum of individual
                         Z-scores (Z-SCORE) (2) Estimated CC of map to perfect
                         model (CC-SCORE) (3) Bayesian estimate of CC of map
                         to perfect model (BAYES-CC) You can specify which
                         scoring criteria to include with score_type_list
                         (default is CC RFACTOR SKEW for CC-SCORE, and CC
                         RFACTOR SKEW FOM for BAYES-CC and Z-SCORE.
                         Additionally, if NCS is present, NCS_OVERLAP is used
                         by default in the Z-SCORE method).
   overallscale= Yes *No True False You can choose to have only an overall
                 scale factor for this dataset (no local scaling applied). Use
                 this if your data is already fully scaled.
   partpdb_rms= 1.0
   perfect_labels= None Labels for input data columns for hklperfect Typical
                   value: "FP PHIC FOM"
   phase_full_resolution= *Yes No True False You can choose to use the full
                          resolution of the data in phasing, instead of using
                          the recommended_resolution. This is always a good
                          idea with Phaser phases.
   phaser_completion= *Yes No True False You can choose to use phaser
                      log-likelihood gradients to complete your heavy-atom
                      sites. This can be used with or without the ha_iteration
                      option.
   phasing_method= SOLVE *PHASER You can choose to phase with SOLVE or with
                   Phaser. (Only applies to SAD phasing at present)
   place_waters= *Yes No True False You can choose whether phenix.refine
                 automatically places ordered solvent (waters) during the
                 refinement process.
   quick_build= Yes *No True False Choose whether you want to go for quick
                model-building (speeds it up, and for poor maps, is sometimes
                better)
   r_switch= 0.4 R-value criteria for deciding whether to use R-value or
             residues built A good value is 0.40
   ratio_b_for_auto_correct_aniso= 1.5 Choose what ratio aniso B values is so
                                   big that you want to correct for anisotropy
                                   by default. Both ratio_b and delta_b must
                                   be large to correct. see also
                                   delta_b_for_auto_correct_aniso See also
                                   "correct_aniso" which overrides this
                                   default if set to "Yes"
   ratio_out= 3.0 You can choose the ratio of del ano or del iso to the rms in
              the shell for rejection of a reflection. Default = 4.
   read_sites= Yes *No True False Choose if you want to enter ha sites from a
               file The name of the file will be requested after scaling is
               finished. The file can have sites in fractional coordinates or
               be a PDB file.
   rebuild_side_chains= Yes *No True False You can choose to replace side
                        chains (with extend_only) before rebuilding the model
                        (not normally used)
   refine= Yes *No True False This script normally refines the model during
           building. Say No to skip refinement
   refine_b= *Yes No True False You can choose whether phenix.refine is to
             refine individual atomic displacement parameters (B values)
   refine_eff_file_list= None You can enter any number of refinement parameter
                         files. These are normally used to tell phenix.refine
                         defaults to apply, as well as creating specialized
                         definitions such as unusual amino acid residues and
                         linkages. These parameters override the normal
                         phenix.refine defaults. They themselves can be
                         overridden by parameters set by the Wizard and by
                         you, controlling the Wizard. NOTE: Any parameters set
                         by AutoBuild directly (such as
                         number_of_macro_cycles, high_resolution, etc...) will
                         not be taken from this parameters file. This is
                         useful only for adding extra parameters not normally
                         set by AutoBuild.
   refine_se_occ= *Yes No True False You can choose to refine the occupancy of
                  SE atoms in a SEMET structure (default=Yes). This only
                  applies if semet=true
   refine_with_ncs= *Yes No True False This script can allow phenix.refine to
                    automatically identify NCS and use it in refinement. NOTE:
                    ncs refinement and placing waters automatically are
                    mutually exclusive at present.
   refinement_resolution= 0.0 Enter the high-resolution limit for refinement
                          only. This high-resolution limit can be different
                          than the high-resolution limit for other steps. The
                          default ("None" or 0.0) is to use the overall
                          high-resolution limit for this run (as set by
                          'resolution')
   require_nat= *Yes No True False Choose yes to skip any reflection with no
                native (for SIR) or no data (MAD/SAD) or where anom difference
                is very large. This keyword (default=Yes) allows the routines
                in SOLVE to remove reflections with an implausibly large
                anomalous difference (greater than ratio_out times the rms
                anomalous difference).
   res_eval= 0.0 Resolution for running resolve evaluation (usually 2.5 A)
   res_hyss= None Resolution for running HYSS (usually 3.5 A is fine)
   res_phase= 0.0 Enter the high-resolution limit for phasing
   residues= None Number of amino acid residues in the au (or equivalent)
   resolution= 0.0 High-resolution limit.Used as resolution limit for density
               modification and as general default high-resolution limit. If
               resolution_build or refinement_resolution are set then they
               override this for model-building or refinement. If
               overall_resolution is set then data beyond that resolution is
               ignored completely.
   resolution_build= 0.0 Enter the high-resolution limit for model-building.
                     If 0.0, the value of resolution is used as a default.
   resolve_command_list= None Commands for resolve. One per line in the form:
                         keyword value value can be optional Examples:
                         coarse_grid resolution 200 2.0 hklin test.mtz NOTE:
                         for command-line usage you need to enclose the whole
                         set of commands in double quotes (") and each
                         individual command in single quotes (') like this:
                         resolve_command_list="'no_build' 'b_overall 23' "
   resolve_size= _giant _huge extra_huge *None Size for solve/resolve
                 ("","_giant","_huge","extra_huge")
   retrace_before_build= Yes *No True False You can choose to retrace your
                         model n_mini times and use a map based on these
                         retraced models to start off model-building. This is
                         the default for rebuilding models if you are not
                         using rebuild_in_place. You can also specify
                         n_iter_rebuild, the number of cycles of
                         retrace-density-modify-build before starting the main
                         build.
   rms_random_frag= None Rms random position change added to residues on ends
                    of fragments when extending them If you enter a negative
                    number, defaults will be used.
   rms_random_loop= None Rms random position change added to residues on ends
                    of loops in tries for building loops If you enter a
                    negative number, defaults will be used.
   run_command= csh When you specify nproc=nn, you can run the subprocesses as
                jobs in background with csh (default) or submit them to a
                queue with the command of your choice (i.e., qsub ). If you
                have a multi-processor machine, use csh. If you have a
                cluster, use qsub or the equivalent command for your system.
                NOTE: If you set run_command=qsub (or otherwise submit to a
                batch queue), then you should set background=False, so that
                the batch queue can keep track of your runs. There is no need
                to use background=True in this case because all the runs go as
                controlled by your batch system. If you use run_command=csh
                (or similar, csh is default) then normally you will use
                background=True so that all the jobs run simultaneously.
   score_individual_offset_list= -0.161239009144 0.916006476685
                                 0.0294415756257 0.0 0.0 Offsets for
                                 individual scores in CC-scoring. Each score
                                 will be multiplied by the
                                 score_individual_scale_list value, then
                                 score_individual_offset_list value is added,
                                 to estimate the CC**2 value using this score
                                 by itself. The uncertainty in the CC**2 value
                                 is given by score_individual_sd_list. NOTE:
                                 These scores are not used in calculation of
                                 the overall score. They are for information
                                 only
   score_individual_scale_list= 0.854120469905 -1.30027385877 0.952184960354
                                0.0 0.0 Scale factors for individual scores in
                                CC-scoring. Each score will be multiplied by
                                the score_individual_scale_list value, then
                                score_individual_offset_list value is added,
                                to estimate the CC**2 value using this score
                                by itself. The uncertainty in the CC**2 value
                                is given by score_individual_sd_list. NOTE:
                                These scores are not used in calculation of
                                the overall score. They are for information
                                only
   score_individual_sd_list= 0.0865028903052 0.110782528071 0.0648061559968
                             0.0 0.0 Uncertainties for individual scores in
                             CC-scoring. Each score will be multiplied by the
                             score_individual_scale_list value, then
                             score_individual_offset_list value is added, to
                             estimate the CC**2 value using this score by
                             itself. The uncertainty in the CC**2 value is
                             given by score_individual_sd_list. NOTE: These
                             scores are not used in calculation of the overall
                             score. They are for information only
   score_overall_offset= 0.13714164894 Overall offset for scores in
                         CC-scoring. The weighted scores will be summed, then
                         all multiplied by score_overall_scale, then
                         score_overall_offset will be added.
   score_overall_scale= 0.49665776012 Overall scale factor for scores in
                        CC-scoring. The weighted scores will be summed, then
                        all multiplied by score_overall_scale, then
                        score_overall_offset will be added.
   score_overall_sd= 0.0534393526258 Overall SD of CC**2 estimate for scores
                     in CC-scoring. The weighted scores will be summed, then
                     all multiplied by score_overall_scale, then
                     score_overall_offset will be added. This is an estimate
                     of CC**2, with uncertainty about score_overall_sd. Then
                     the square root is taken to estimate CC and SD(CC), where
                     SD(CC) now depends on CC due to the square root.
   score_type_list= CC RFACTOR SKEW FOM You can choose what scoring methods to
                    include in scoring of solutions in AutoSol. (The choices
                    available are: CC RFACTOR SKEW NCS_COPIES NCS_IN_GROUP
                    TRUNCATE REGIONS SD FOM ) NOTE: If you are using Z-SCORE
                    or BAYES-CC scoring, The default is CC RFACTOR SKEW FOM
                    (and NCS_OVERLAP if ncs_copies >1). NOTE 2: If you are
                    using CC-SCORE (regression-based estimated CC of map to
                    perfect model) scoring the default is CC RFACTOR SKEW. If
                    you are using CC-scores and you set score_type_list, you
                    must also set score_weight_list,score_overall_scale, and
                    score_overall_offset.
   score_weight_list= 0.481920468212 -0.570545661763 1.36694455397 0.0 0.0 
                     Weights on scores for CC-scoring. Enter the weight on
                      each score in score_type_list. The weighted scores will
                      be summed, then all multiplied by score_overall_scale,
                      then score_overall_offset will be added.
   semet= Yes *No True False You can specify that the dataset that is used for
          refinement is a selenomethionine dataset, and that the model should
          be the SeMet version of the protein, with all SD of MET replaced
          with Se of MSE.
   sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
   skip_score_list= NCS_OVERLAP You can evaluate some scores but not use them.
                    Include the ones you do not want to use in the final score
                    in skip_score_list.
   skip_xtriage= Yes *No True False You can bypass xtriage if you want. This
                 will prevent you from applying anisotropy corrections,
                 however.
   solution_to_display= 0 Solution number of the solution to display and write
                        out ( use 0 to let the wizard display the top
                        solution)
   solve_command_list= None Commands for solve. One per line in the form:
                       keyword value value can be optional Examples: verbose
                       resolution 200 2.0
   solvent_fraction= None Solvent fraction (typically 0.4 - 0.6)
   start_chains_list= None You can specify the starting residue number for
                      each of the unique chains in your structure. If you use
                      a sequence file then the unique chains are extracted and
                      the order must match the order of your starting residue
                      numbers. For example, if your sequence file has chains A
                      and B (identical) and chains C and D (identical to each
                      other, but different than A and B) then you can enter 2
                      numbers, the starting residues for chains A and C. NOTE:
                      you need to specify an input sequence file for
                      start_chains_list to be applied.
   temp_dir= None Define a temporary directory (it must exist)
   test_correct_aniso= *Yes No True False Choose whether you want to try
                       applying or not applying an anisotropy correction if
                       the run fails. First your original selection for
                       applying or not will be tried, and then the opposite
                       will be tried if the run fails.
   test_mask_type= *Yes No True False You can choose to have AutoSol test
                   histograms/wang methods for identifying solvent region
                   based on the final density modification r-factor.
   thorough_denmod= Yes *No True False Choose whether you want to go for quick
                    density modification (speeds it up and for a terrible map
                    is sometimes better)
   thorough_loop_fit= *Yes No True False Try many conformations and accept
                      them even if the fit is not perfect? If you say Yes the
                      parameters for thorough loop fitting are:
                      n_random_loop=100 rms_random_loop=0.3 rho_min_main=0.5
                      while if you say No those for quick loop fitting are:
                      n_random_loop=20 rms_random_loop=0.3 rho_min_main=1.0
   thoroughness= quick *thorough You can try to run quickly and see if you can
                 get a solution ("quick") or more thoroughly to get the best
                 possible solution ("thorough").
   title= Run 1 AutoSol Mon May 26 12:09:02 2008 Enter any text you like to
          help identify what you did in this run
   top_output_dir= None This is used in subprocess calls of wizards and to
                   tell the Wizard where to look for the STOPWIZARD file.
   trace_as_lig= Yes *No True False You can specify that in building steps the
                 ends of chains are to be extended using the LigandFit
                 algorithm. This is default for nucleic acid model-building.
   truncate_ha_sites_in_resolve= Auto *Yes No True False You can choose to
                                 truncate the density near heavy-atom sites at
                                 a maximum of 2.5 sigma. This is useful in
                                 cases where the heavy-atom sites are very
                                 strong, and rarely hurts in cases where they
                                 are not. The heavy-atom sites are specified
                                 with "input_ha_file"
   use_any_side= Yes *No True False You can choose to have resolve
                 model-building place the best-fitting side chain at each
                 position, even if the sequence is not matched to the map.
   use_met_in_align= Auto *Yes No True False You can use the heavy-atom
                     positions in input_ha_file as markers for Met SD
                     positions.
   use_mlhl= *Yes No True False This script normally uses information from the
             input file (HLA HLB HLC HLD) in refinement. Say No to only refine
             on Fobs
   use_ncs_in_denmod= *Yes No True False This script normally uses available
                      ncs information in density modification. Say No to skip
                      this. See also find_ncs
   use_perfect= Yes *No True False You can use the CC between each solution
                and hklperfect in scoring. This is only for methods
                development purposes.
   use_phaser_hklstart= *Yes No True False You can choose to start density
                        modification with FWT PHWT from Phaser (Only applies
                        to SAD phasing at present)
   verbose= Yes *No True False Command files and other verbose output will be
            printed
   wavelength_list= None Enter wavelength of x-ray data (A)
   peak
      data= None Datafile for peak wavelength. (Command_line only)
      labels= None Specification string for data labels for peak wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type " phenix.autosol
              display_labels=your-datafile-here.mtz "
      f_prime= None F-prime value for peak wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for peak wavelength.
                      (Command_line only)
   infl
      data= None Datafile for infl wavelength. (Command_line only)
      labels= None Specification string for data labels for infl wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type " phenix.autosol
              display_labels=your-datafile-here.mtz "
      f_prime= None F-prime value for infl wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for infl wavelength.
                      (Command_line only)
   high
      data= None Datafile for high wavelength. (Command_line only)
      labels= None Specification string for data labels for high wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type " phenix.autosol
              display_labels=your-datafile-here.mtz "
      f_prime= None F-prime value for high wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for high wavelength.
                      (Command_line only)
   low
      data= None Datafile for low wavelength. (Command_line only)
      labels= None Specification string for data labels for low wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type " phenix.autosol
              display_labels=your-datafile-here.mtz "
      f_prime= None F-prime value for low wavel