phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated structure solution with AutoSol

Author(s)
Purpose
Usage
How the AutoSol Wizard works
Setting up inputs
Datasets and Solutions in AutoSol
Analyzing and scaling the data
Finding heavy-atom (anomalously-scattering atom) sites
Running AutoSol separately in related space groups
Scoring of heavy-atom solutions
Phasing
Density modification (including NCS averaging)
Preliminary model-building and refinement
Resolution limits in AutoSol
Output files from AutoSol
How to run the AutoSol Wizard
Model viewing during model-building with the Coot-PHENIX interface
Examples
SAD dataset
SAD dataset specifying solvent fraction
SAD dataset without model-building
SAD dataset, building RNA instead of protein
SAD dataset, selecting a particular dataset from an MTZ file
MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model
Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)
SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss
MAD dataset
MAD dataset, selecting particular datasets from an MTZ file
SIR dataset
SAD with more than one anomalously-scattering atom
MIR dataset
SIR + SAD datasets
Possible Problems
General limitations
Specific limitations and problems
Literature
Additional information
List of all AutoSol keywords

Author(s)

  • AutoSol Wizard: Tom Terwilliger
  • PHENIX GUI and PDS Server: Nigel W. Moriarty
  • HYSS: Ralf W. Grosse-Kunstleve and Paul D. Adams
  • Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
  • SOLVE: Tom Terwilliger
  • RESOLVE: Tom Terwilliger
  • TEXTAL: K. Gopal, T.R. Ioerger, R.K. Pai, T.D. Romo, J.C. Sacchettini
  • phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams
  • phenix.xtriage: Peter Zwart

Purpose

The AutoSol Wizard uses HYSS, SOLVE, Phaser, RESOLVE, TEXTAL, xtriage and phenix.refine to solve a structure and generate experimental phases with the MAD, MIR, SIR, or SAD methods. The Wizard begins with datafiles (.sca, .hkl, etc) containing amplitidues of structure factors, identifies heavy-atom sites, calculates phases, carries out density modification and NCS identification, and builds and refines a preliminary model.

Usage

The AutoSol Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line version will be described here, except for MIR and multiple datasets, which can only be run with the GUI or with a script.

How the AutoSol Wizard works

The basic steps that the AutoSol Wizard carries out are described below. They are: Setting up inputs, Analyzing and scaling the data, Finding heavy-atom (anomalously-scattering atom) sites, Scoring of heavy-atom solutions, Phasing, Density modification (including NCS averaging), and Preliminary model-building and refinement. The data for structure solution are grouped into Datasets and solutions are stored in Solution objects.

Setting up inputs

The AutoSol Wizard expects the following basic information:

(1) a datafile name (w1.sca or data=w1.sca)

(2) a sequence file (seq.dat or seq_file=seq.dat)

(3) how many sites to look for (2 or sites=2)

(4) what the anomalously-scattering atom is (Se or atom_type=Se)

(5) If you have SAD or MAD data, then it is helpful to add f_prime and f_double_prime for each wavelength.

You can also specify many other parameters, including resolution, number of sites, whether to search in a thorough or quick fashion, how thoroughly to build a model, etc. If you have a heavy-atom solution from a previous run or another approach, you can read it in directly as well.

Datasets and Solutions in AutoSol

AutoSol breaks down the data for a structure solution into datasets, where a dataset is a set of data that corresponds to a single set of heavy-atom sites. An entire MAD dataset is a single dataset. An MIR structure solution consists of several datasets (one for each native-derivative combination). A MAD + SIR structure has one dataset for the MAD data and a second dataset for the SIR data. The heavy-atom sites for each dataset are found separately (but using difference Fouriers from any previously-solved datasets to help). In the phasing step all the information from all datasets is merged into a single set of phases.

The AutoSol wizard uses a "Solution" object to keep track of heavy-atom solutions and the phased datasets that go with them. There are two types of Solutions: those which consist of a single dataset (Primary Solutions) and those that are combinations of datasets (Composite Solutions). "Primary" Solutions have information on the datafiles that were part of the dataset and on the heavy-atom sites for this dataset. Composite Solutions are simply sets of Primary Solutions, with associated origin shifts. The hand of the heavy-atom or anomalously-scattering atom substructure is part of a Solution, so if you have two datatsets, each with two Solutions related by inversion, then AutoSol would normally construct four different Composite Solutions from these and score each one as described below.

Analyzing and scaling the data

The AutoSol Wizard analyzes input datasets with phenix.xtriage to identify twinning and other conditions that may require special care. The data is scaled with SOLVE. For MAD data, FA values are calculated as well.

Note on anisotropy corrections:

The AutoSol wizard will apply an anistropy correction to all the raw experimental data if any of the files in the first dataset read in have a very strong anisotropy. You can tell the Wizard how much anisotropy there must be before applying this correction by default using the keywords

correct_aniso=True  # (if True or False then always or never apply correction)

delta_b_for_auto_correct_aniso=20  # correct if range of anisotropic B 
                                   #is greater than 20

ratio_b_for_auto_correct_aniso=1.5  #correct if the ratio of the largest 
                                  #to smallest anisotropic B is greater than 1.5

If an anisotropy correction is applied then a separate refinement file must be specified if refinement is to be carried out. This is because it is best to refine against data that have not been corrected for anisotropy (instead applying the correction as part of refinement).

Finding heavy-atom (anomalously-scattering atom) sites

The AutoSol Wizard uses HYSS to find heavy-atom sites. The result of this step is a list of possible heavy-atom solutions for a dataset. For SIR or SAD data, the isomorphous or anomalous differences, respectively are used as input to HYSS. For MAD data, the anomalous differences at each wavelength, and the FA estimates of complete heavy-atom structure factors from SOLVE are each used as separate inputs to HYSS. Each heavy-atom substructure obtained from HYSS corresponds to a potential solution. In space groups where the heavy-atom structure can be either hand, a pair of enantiomorphic solutions is saved for each run of HYSS.

Running AutoSol separately in related space groups

AutoSol will check for the opposite hand of the heavy-atom solution, and at the same time it will check for the opposite hand of your space group (It will invert the heavy-atom solution from HYSS and invert the hand of the space group at the same time). Therefore you do not need to run AutoSol twice for space groups that are chiral (for example P41). The corresponding inverse space groups will be checked automatically (P43 ). If there are possibilities for your space group other than the inverse hand of the space group, then you should test them all, one at a time. For example if you were not able to measure 00l reflections in a hexagonal space group, your space group might be P6, P61, P62, P63, P64 or P65. In this case you would have to run it in P6, P61 P62 and P63 (and then P65 and P64 will be done automatically as the inverses of P61 and P62). Normally only one of these will give a plausible solution.

Scoring of heavy-atom solutions

Potential heavy-atom solutions are scored based on a set of criteria (CC, RFACTOR, SKEW, FOM, NCS_OVERLAP, TRUNCATION, REGIONS, SD; described below), using either a Bayesian estimate, a linear regression, or a Z-score system to put all the scores on a common scale and to combine them into a single overall score. The overall scoring method chosen (BAYES-CC or Z-SCORE) is determined by the value of the keyword overall_score_method. The default is BAYES-CC. Note that for all scoring methods, the map that is being evaluated, and the estimates of map-perfect-model correlation, refer to the experimental electron density map, not the density-modified map.

Bayesian CC scores (BAYES-CC). Bayesian estimates of the quality of experimental electron density maps are obtained using data from a set of previously-solved datasets. The standard scoring criteria were evaluated for 1905 potential solutions in a set of 246 MAD, SAD, and MIR datasets. As each dataset had previously been solved, the correlation between the refined model and each experimental map (CC_PERFECT) could be calculated for each solution (after offsetting the maps to account for origin differences). Histograms were tabulated of the number of instances that a scoring criterion (e.g., SKEW) had various possible values, as a function of the CC_PERFECT of the corresponding experimental map to the refined model. These histograms yield the relative probability of measuring a particular value of that scoring criterion (SKEW), given the value of CC_PERFECT. Using Bayes' rule, these probabilities can be used to estimate the relative probabilities of values of CC_PERFECT given the value of each scoring criterion for a particular electron density map. The mean estimate (BAYES-CC) is reported (multiplied x 100), with a +/-2SD estimate of the uncertainty in this estimate of CC_PERFECT. The BAYES-CC values are estimated independently for each scoring criterion used, and also from all those selected with the keyword score_type_list and not selected with the keyword skip_score_list.

Z-scores (Z-SCORE). The Z-score for one criterion for a particular solution is given by,

Z= (Score - mean_random_solution_score)/(SD_of_random_solution_scores)
where Score is the score for this solution, mean_random_solution_score is the mean score for a solution with randomized phases, and SD_of_random_solution_scores is the standard deviation of the scores of solutions with randomized phases.

To create a total score based on Z-scores, the Z-scores for each criterion are simply summed.

The principal scoring criteria are:

(1) Correlation of map-phased electron density map with experimentally- phased map (CC). The statistical density modification in RESOLVE allows the calculation of map-based phases that are (mostly) independent of the experimental phases. The phase information in statistical density modification comes from two sources: your experimental phases and maximization of the agreement of the map with expectations (such as a flat solvent region). Normally the phase probabilities from these two sources are merged together, yielding your density-modified phases. This score is calculated based on the correlation of the phase information from these two sources before combining them, and is a good indication of the quality of the experimental phases. This criterion is used in scoring by default.

(2) The R-factor for density modification (R-Factor). Statistical density modification provides an estimate of structure factors that is (mostly) independent of the measured structure factors, so the R-factor between FC and Fobs is a good measure of the quality of experimental phases. This criterion is used in scoring by default.

(3) The skew (third moment or normalized <rho**3>) of the density in an electron density map is a good measure of its quality, because a random map has a skew of zero (density histograms look like a Gaussian), while a good map has a very positive skew (density histograms very strong near zero, but many points with very high density). This criterion is used in scoring by default.

(4) Non-crystallographic symmetry (NCS overlap). The presence of NCS in a map is a nearly-positive indication that the map is good, or has some correct features. The AutoSol Wizard uses symmetry in heavy-atom sites to suggest NCS, and RESOLVE identifies the actual correlation of NCS-related density for the NCS overlap score. This score is used by default if NCS is present in the Z-score method of scoring.

(5) Figure of merit (FOM). The figure of merit of phasing is a good indicator of the internal consistency of a solution. This score is not normalized by the SD of randomized phase sets (as that has no meaning; rather a standard SD=0.05 is used). This score is used by default if NCS is present in the Z-score method of scoring and in the Bayesian CC estimate method.

(6) Map correlation after truncation (TRUNCATION). Dummy atoms (the same number as estimated non-hydrogen atoms in the structure) are placed in positions of high density of the map, and a new map is calculated based on these atomic positions. The correlation of these maps is calculated after adjusting an overall B-value for the dummy atoms to maximize the correlation. A good map will show a high correlation of these maps. This score is by default not used.

(7) Number of contiguous regions per 100 A**3 comprising top 5% of density in map (REGIONS). The top 5% of points in the map are marked, and the number of contiguous regions that result are counted, and divided by the volume of the asymmetric unit, then multiplied by 100. A good map will have just a few contiguous regions at a high contour level, a poor map will have many isolated peaks. This score is by default not used. (8) Standard deviation of local rms density (SD). The local rms density in the map is calculated using a smoothing radius of 3 times the high-resolution cutoff (or 6 A, if less than 6A). Then the standard deviation of the local rms, normalized to the mean value of the local rms, is reported. This criteria will be high if there are regions of high local rms (the macromolecule) and separate regions of low local rms (the solvent) and low if the map is random. This score is by default not used.

Phasing

The AutoSol Wizard uses Phaser to calculate experimental phases from SAD data, and SOLVE to calculate phases from MIR, MAD, and multiple-dataset cases.

Density modification (including NCS averaging)

The AutoSol Wizard uses RESOLVE to carry out density modification. It identifies NCS from symmetries in heavy-atom sites with RESOLVE and applies this NCS if it is present in the electron density map.

Preliminary model-building and refinement

The AutoSol Wizard carries out one cycle of model-building and refinement after obtaining density-modified phases. The model-building can be with RESOLVE or with TEXTAL. The refinement is carried out with phenix.refine.

Resolution limits in AutoSol

There are several resolution limits used in AutoSol. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name

Description

How default value is set

resolution

Overall resolution for a dataset

Highest resolution for any datafile in this dataset. For multiple datasets, the highest resolution for any dataset

refinement_resolution

Resolution for refinement

value of "resolution"

resolution_build

Resolution for model-building

value of "resolution"

res_phase

Resolution for phasing for a dataset

If phase_full_resolution=True then use value of "resolution". Otherwise, use value of "recommended_resolution" based on analysis of signal-to-noise in dataset.

res_eval

Resolution for evaluation of solution quality

value of "resolution" or 2.5 A, whichever is lower resolution.

Output files from AutoSol

When you run AutoSol the output files will be in a subdirectory with your run number:

AutoSol_run_1_/

The key output files that are produced are:

  • A summary file listing the results of the run and the other files produced:
    AutoSol_summary.dat  # overall summary
    

  • A warnings file listing any warnings about the run
    AutoSol_warnings.dat  # any warnings
    

  • A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
    AutoSol_Facts.dat   # all Facts about the run
    

  • NCS information (if any)
    AutoSol_15.ncs_spec   # NCS information. The number is the solution number
    

  • Experimental phases and HL coefficients
    solve_15.mtz  # either solve or phaser depending on which was run
    phaser_15.mtz
    

  • Density-modified phases from RESOLVE
    current_cycle_map_coeffs.mtz  # map coefficients (density modified phases)
    resolve_15.mtz   # density-modified phases; same as above
    
    For either of these, use FP PHIM FOMM for PHI F FOM.

  • An mtz file for use in refinement
    exptl_fobs_phases_freeR_flags_15.mtz  # F Sigma HL coeffs, freeR-flags for refinement
    

  • Heavy atom sites in PDB format
    ha_15.pdb_formatted.pdb
    

  • Current preliminary model and evaluation of model
    current_cycle.pdb
    current_cycle_eval.log
    

How to run the AutoSol Wizard

Running the AutoSol Wizard is easy. From the command-line you can type:

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5

The AutoSol Wizard will assume that w1.sca is a datafile (because it ends in .sca and is a file) and that seq.dat is a sequence file, that there are 2 heavy-atom sites, and that the heavy-atom is Se. The f_prime and f_double_prime values are set explicitly

You can also specify each of these things directly:

phenix.autosol data=w1.sca seq_file=seq.dat sites=2 \
   atom_type=Se f_prime=-8 f_double_prime=4.5

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

sites=3     # 3 sites
sites_file=sites.pdb  # ha sites in PDB or fractional xyz format
atom_type=Se   # Se is the heavy-atom
seq_file=seq.dat   # sequence file (1-aa code, separate chains with >>>>)
quick=True  # try to find sites quickly
data=w1.sca  # input datafile
f_prime=-5  # f-prime value for SAD
f_double_prime=4.5  # f-double-prime value for SAD

Model viewing during model-building with the Coot-PHENIX interface

The AutoSol Wizard allows you to view the current best model that is produced by the automated model-building process. This capability is identical to the view/edit model procedure available in the AutoBuild Wizard. Normally you would use it just to view the model in AutoSol, and to view and edit a model in AutoBuild . The PHENIX-Coot interface is accessible through the GUI and via the command-line. Using the GUI, when a model has been produced by the AutoSol Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start Coot with your current map and model. If you are running from the command-line, you can open a new window and type:

phenix.autobuild coot 
which will do the same (provided the necessary map and model are ready). When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. If you want, you can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. In AutoSol, only the main-chain atoms of the model you save are considered, and the side-chains are ignored. Ligands and solvent in the model are ignored as well. As the AutoSol Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX.

Examples

SAD dataset

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5
The sequence file is used to estimate the solvent content of the crystal and for model-building. Note that for a SAD dataset the value of f_prime and f_double_prime are not critical. If you are off by a factor of 2 on f_double_prime, the refined occupancies of heavy-atom sites might be 1/2 their correct values.

SAD dataset specifying solvent fraction

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
    solvent_fraction=0.45
This will force the solvent fraction to be 0.45. This illustrates a general feature of the Wizards: they will try to estimate values of parameters, but if you input them directly, they will use your input values.

SAD dataset without model-building

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
    build=False
This will carry out the usual structure solution, but will skip model-building

SAD dataset, building RNA instead of protein

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
    chain_type=RNA
This will carry out the usual structure solution, but will build an RNA chain. For DNA, specify chain_type=DNA. You can only build one type of chain at a time in the AutoSol Wizard. To build protein and DNA, use the AutoBuild Wizard and run it first with chain_type=PROTEIN, then run it again specifying the protein model as input_lig_file_list=proteinmodel.pdb and with chain_type=DNA.

SAD dataset, selecting a particular dataset from an MTZ file

If you have an input MTZ file with more than one anomalous dataset, you can type something like:

phenix.autosol w1.mtz seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
labels='F SIGF DANO SIGDANO'
This will carry out the usual structure solution, but will choose the input data columns based on the labels: 'F SIGF DANO SIGDANO'. If you run the AutoSol Wizard with SAD data and an MTZ file containing more than one anomalous dataset and don't tell it which one to use, all possible values of labels are printed out for you so that you can just paste the one you want in.

You can also find out all the possible label strings to use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model

If you are carrying out SAD phasing with Phaser, you can carry out a combination of molecular replacement phasing and SAD phasing (MRSAD) by adding a single new keyword to your AutoSol run:

input_partpdb_file=MR.pdb
In this case the MR.pdb file will be used as a partial model in a maximum-likelihood SAD phasing calculation with Phaser to calculate phases and identify sites in Phaser, and the combined MR+SAD phases will be written out. NOTE: At the moment the AutoBuild Wizard is not equipped to use these combined phases optimally in iterative model-building, density modification and refinement, because they contain both experimental phase information and model information. It is therefore possible that the resulting phases are biased by your MR model, and that this bias will not go away during iterative model-building because it is continually fed back in.

Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)

You can also combine MR information with SAD phases (see J. P. Schuermann and J. J. Tanner Acta Cryst. (2003). D59, 1731-1736 ) in PHENIX by running the three wizards AutoMR, AutoSol, and AutoBuild one after the other. This method does not use the partial model and the anomalous information in the SAD dataset simultaneously as the above Phaser maximum-likelihood method does. On the other hand, the phases obtained in this method are independent of the model, so that combining them afterwards does not introduce model bias. (It is not yet clear which is the better approach, so you may wish to try both.) Additionally, this approach can be used with any method for phasing. Here is a set of three simple commands to do this: First run AutoMR to find the molecular replacement solution, but don't rebuild it yet:

phenix.automr gene-5.pdb infl.sca copies=1 \
  RMS=1.5 mass=9800 rebuild_after_mr=False
Now your MR solution is in AutoMR_run_1_/MR.1.pdb and phases are in AutoMR_run_1_/MR.1.mtz. Use these phases as input to AutoSol, along with some weak SAD data, still not building any new models:
 phenix.autosol data=infl.sca \
 input_phase_file=AutoMR_run_1_/MR.1.mtz input_phase_labels="F PHIC FOM"   \
seq_file=sequence.dat build=False
note that we have specified the data columns for F PHI and FOM in the input_phase_file. For input_phase_file you must specify all three of these (if you leave out FOM it will set it to zero). AutoSol will write an MTZ file with experimental phases to phaser_xx.mtz where xx depends on how many solutions are considered during the run. The next command for running AutoBuild you will need to edit depending on the value of xx:
 phenix.autobuild data=AutoSol_run_1_/phaser_2.mtz \
  model=AutoMR_run_1_/MR.1.pdb seq_file=sequence.dat rebuild_in_place=False
AutoBuild will now take the phases from your AutoSol run and combine them with model-based information from your AutoMR MR solution, and will carry out iterative density modification, model-building and refinement to rebuild your model. Note that you may wish to set rebuild_in_place=True, depending on how good your MR model is.

SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss

phenix.autosol 11 Pb data=deriv.sca seq_file=seq.dat \
  sites_file=deriv_hyss_consensus_model.pdb 
This will carry out the usual structure solution process, but will read sites from deriv_hyss_consensus_model.pdb, try both hands, and carry on from there. If you know the hand of the substructure, you can fix it with have_hand=True.

MAD dataset

The inputs for a MAD dataset need to specify f_prime and f_double_prime for each wavelength. It also must be clear what datafile goes with which wavelength. If you input an MTZ file with multiple datasets, then the order of those datasets is assumed to be the same as the order of the wavelengths. You may want to either select particular datasets from your MTZ file (see below) or split such an MTZ file into separate files for each dataset if this does not work in the way you expect.

phenix.autosol  seq_file=seq.dat sites=2 atom_type=Se  \
peak.data=w1.sca   peak.f_prime=-8   peak.f_double_prime=4.5 \
infl.data=w2.sca   infl.f_prime=-9   infl.f_double_prime=1.9 \
high.data=w3.sca   high.f_prime=-5   high.f_double_prime=3.0 

MAD dataset, selecting particular datasets from an MTZ file

This is similar to the case for SAD data.If you have an input MTZ file with more than one anomalous dataset, you can type something like:

phenix.autosol  seq_file=seq.dat sites=2 atom_type=Se  \
peak.data=all_data.mtz   peak.f_prime=-8   peak.f_double_prime=4.5 \
high.data=all_data.mtz   high.f_prime=-5   high.f_double_prime=3.0 \
peak.labels='Fpeak SIGFpeak DANOpeak SIGDANOpeak' \
high.labels='Fhigh SIGFhigh DANOhigh SIGDANOhigh' 
This will carry out the usual structure solution, but will choose the input peak data columns based on the labels: 'Fpeak SIGFpeak DANOpeak SIGDANOpeak', and the high data from the ones labelled 'Fhigh SIGFhigh DANOhigh SIGDANOhigh'.

As in the SAD case, you can find out all the possible label strings to use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

SIR dataset

The standard inputs for an SIR dataset are the native and derivative, the sequence file, the heavy-atom type, and the number of sites, as well as whether to use anomalous differences (or just isomorphous differences):

phenix.autosol native.data=native.sca deriv.data=deriv.sca \
   deriv.atom_type=I deriv.sites=2 deriv.inano=inano
This will set the heavy-atom type to Iodine, look for 2 sites, and include anomalous differences.

SAD with more than one anomalously-scattering atom

You can tell the AutoSol wizard to look for more than one anomalously- scattering atom. Specify one atom type (Se) in the usual way. Then specify any additional ones like this if you are running AutoSol from the command line:

mad_ha_add_list="Br Pt"
mad_ha_add_f_prime_list=" -7 -10"
mad_ha_add_f_double_prime_list=" 4.2 12"
There must be the same number of entries in each of these three keyword lists. During phasing Phaser will try to add whichever atom types best fit the scattering from each new site. This option is available for SAD phasing only.

MIR dataset

An MIR dataset is a set of more than one datasets. This cannot be readily expressed in the command-line inputs, but you can specify it easily with the PHENIX AutoSol GUI or with a script. In a script file you can say:

cell 93.796  79.849  43.108  90.000  90.000  90.00   # cell params
thoroughness thorough                      # best to use thorough for MIR
resolution 2.8                             #  Resolution 
expt_type       sir                        # MIR dataset is set of SIR datasets
input_seq_file sequence.dat
############## DATASET 1 ################
input_file_list  rt_rd_1.sca auki_rd_1.sca #  Native  and deriv 1
nat_der_list    Native  Au                 # identify files by ha type
inano_list      noinano inano              # say if ano diffs to be used 
n_ha_list       0    5                     # number of heavy-atoms 
run_list        start                      # read in datafiles for dataset
run_list        read_another_dataset       # about to start a new dataset here
############## DATASET 2 ################
input_file_list  rt_rd_1.sca hgki_rd_1.sca # Native and deriv 2
nat_der_list    Native Hg                  
inano_list      noinano inano              
n_ha_list       0    5  
#########################################

The script file carries out steps in the order that they are input. This allows us to read in one entire dataset, save it, then read in another one. The AutoSol Wizard will solve each dataset and then combine them and phase the combined datset with SOLVE Bayesian correlated phasing, taking into account any correlations among the non-isomorphism and heavy-atom sites for the various derivatives.

SIR + SAD datasets

A combination of SIR and SAD datasets is almost the same as an MIR dataset in the AutoSol Wizard. You specify each dataset separately, and put "start" and "read_another_dataset" between the datasets:

cell 93.796  79.849  43.108  90.000  90.000  90.00   # cell params
resolution 2.8                             #  Resolution 
input_seq_file sequence.dat
############## DATASET 1 ################
expt_type       sir                        # MIR dataset is set of SIR datasets
input_file_list  rt_rd_1.sca auki_rd_1.sca #  Native  and deriv 1
nat_der_list    Native  Au                 # identify files by ha type
inano_list      noinano inano              # say if ano diffs to be used 
n_ha_list       0    5                     # number of heavy-atoms 
run_list        start                      # read in datafiles for dataset
run_list        read_another_dataset       # about to start a new dataset here
############## DATASET 2 ################
expt_type       sad                        # our second dataset is SAD
input_file_list  hgki_rd_1.sca             # anom diffs for SAD dataset
mad_ha_n  5                                # 5 sites
#########################################

The SIR and SAD datasets will be solved separately (but whichever one is solved first will use difference Fourier or anomalous difference Fourier's to locate sites for the other). Then phases will be combined by addition of Hendrickson-Lattman coefficients and the combined phases will be density modified.

Possible Problems

General limitations

Specific limitations and problems

  • The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoSol wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.

  • The command-line version of AutoSol cannot be used for MIR or for combining multiple datasets. The script and GUI versions can be used instead for these cases.

  • The AutoSol Wizard can take a maximum of 6 derivatives for MIR.

  • The AutoSol Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Simple algorithm for a maximum-likelihood SAD function. A..J. McCoy, L.C. Storoni and R.J. Read. Acta Cryst. D60, 1220-1228 (2004)
[pdf]
Substructure search procedures for macromolecular structures. R.W. Grosse-Kunstleve and P.D. Adams. Acta Cryst. D59, 1966-1973 (2003)
[pdf]
MAD phasing: Bayesian estimates of FA T. C. Terwilliger Acta Cryst. D50 , 11-16 (1994)
[pdf]

Additional information

List of all AutoSol keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
autosol
   sites= None Number of heavy-atom sites. This is an alias for the keyword
          mad_ha_n. (Command-line only)
   sites_file= None PDB or plain-text file with ha sites. This is an alias for
               the keyword ha_sites_file. (Command-line only)
   atom_type= None Anomalously-scattering atom type. This is an alias for the
              keyword mad_ha_type. (Command-line only)
   seq_file= Auto Sequence file . This is an alias for the keyword
             input_seq_file.  (Command-line only)
   quick= None Run everything quickly (thoroughness=quick) (Command-line only)
   data= None Datafile. For command_line input it is easiest if each
         wavelength of data is in a separate data file with obvious data
         columns. File types that are easy to read include Scalepack sca files
         , CNS hkl files, mtz files with just one wavelength of data, or just
         native or just derivative. In this case the Wizard can read your data
         without further information.  If you have a datafile with many
         columns, you can use the "labels" keyword to specify which data
         columns to read. (It may be easier in some cases to use the GUI or to
         split it with phenix.reflection_file_converter first, however.)
         (Command-line only)
   labels= None Specification string for data labels (Command_line only). To
           find out what the appropriate strings are, type "phenix.autosol
           display_labels=your-datafile-here.mtz"
   f_prime= None F-prime value for any wavelength. (Command-line only)
   f_double_prime= None F-doubleprime value for any wavelength. (Command_line
                   only)
   special_keywords
      write_run_directory_to_file= None Writes the full name of a run
                                   directory to the specified file. This can
                                   be used as a call-back to tell a script
                                   where the output is going to go.
                                   (Command-line only)
   run_control
      coot= None Set coot to True and optionally run=[run-number] to run Coot
            with the current model and map for run run-number. In some wizards
            (AutoBuild) you can edit the model and give it back to PHENIX to
            use as part of the model-building process. If you just say coot
            then the facts for the highest-numbered existing run will be
            shown. (Command-line only)
      ignore_blanks= None ignore_blanks allows you to have a command-line
                     keyword with a blank value like "input_lig_file_list="
      stop= None You can stop the current wizard with "stopwizard" or "stop".
            If you type "phenix.autobuild run=3 stop" then this will stop run
            3 of autobuild. (Command-line only)
      display_facts= None Set display_facts to True and optionally
                     run=[run-number] to display the facts for run run-number.
                     If you just say display_facts then the facts for the
                     highest-numbered existing run will be shown.
                     (Command-line only)
      display_summary= None Set display_summary to True and optionally
                       run=[run-number] to show the summary for run
                       run-number. If you just say display_summary then the
                       summary for the highest-numbered existing run will be
                       shown. (Command-line only)
      carry_on= None Set carry_on to True to carry on with highest-numbered
                run from where you left off. (Command-line only)
      run= None Set run to n to continue with run n where you left off.
           (Command-line only)
      copy_run= None Set copy_run to n to copy run n to a new run and continue
                where you left off. (Command-line only)
      display_runs= None List all runs for this wizard. (Command-line only)
      delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
      display_labels= None display_labels=test.mtz will list all the labels
                      that identify data in test.mtz. You can use the label
                      strings that are produced in AutoSol to identify which
                      data to use from a datafile like this: peak.data="F+
                      SIGF+ F- SIGF-" # the entire string in quotes counts
                      here You can use the individual labels from these
                      strings as identifiers for data columns in AutoSol and
                      AutoBuild like this: input_refinement_labels="FP SIGFP
                      FreeR_flags" # each individual label counts
      dry_run= False Just read in and check parameter names
      params_only= False Just read in and return parameter defaults
      display_all= False Just read in and display parameter defaults
   peak
      data= None Datafile for peak wavelength. (Command_line only)
      labels= None Specification string for data labels for peak wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type "phenix.autosol display_labels=your-datafile-here.mtz"
      f_prime= None F-prime value for peak wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for peak wavelength.
                      (Command_line only)
   infl
      data= None Datafile for infl wavelength. (Command_line only)
      labels= None Specification string for data labels for infl wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type "phenix.autosol display_labels=your-datafile-here.mtz"
      f_prime= None F-prime value for infl wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for infl wavelength.
                      (Command_line only)
   high
      data= None Datafile for high wavelength. (Command_line only)
      labels= None Specification string for data labels for high wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type "phenix.autosol display_labels=your-datafile-here.mtz"
      f_prime= None F-prime value for high wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for high wavelength.
                      (Command_line only)
   low
      data= None Datafile for low wavelength. (Command_line only)
      labels= None Specification string for data labels for low wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type "phenix.autosol display_labels=your-datafile-here.mtz"
      f_prime= None F-prime value for low wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for low wavelength.
                      (Command_line only)
   remote
      data= None Datafile for remote wavelength. (Command_line only)
      labels= None Specification string for data labels for remote wavelength.
              (Command_line only). To find out what the appropriate strings
              are, type "phenix.autosol display_labels=your-datafile-here.mtz"
      f_prime= None F-prime value for remote wavelength. (Command_line only)
      f_double_prime= None F-doubleprime value for remote wavelength.
                      (Command_line only)
   native
      data= None Datafile for native . (Command_line only)
      labels= None Specification string for data labels for native .
              (Command_line only). To find out what the appropriate strings
              are, type "phenix.autosol display_labels=your-datafile-here.mtz
              "
      atom_type= Native Heavy-atom type for native . (Command_line only)
      sites= 0 Number of heavy-atom sites for native . (Command_line only)
      inano= *noinano inano anoonly Use anomalous differences for native .
             (Command_line only)
   deriv
      data= None Datafile for deriv . (Command_line only)
      labels= None Specification string for data labels for deriv .
              (Command_line only). To find out what the appropriate strings
              are, type "phenix.autosol display_labels=your-datafile-here.mtz
              "
      atom_type= I Heavy-atom type for deriv . (Command_line only)
      sites= 2 Number of heavy-atom sites for deriv . (Command_line only)
      inano= noinano *inano anoonly Use anomalous differences for deriv .
             (Command_line only)
   crystal_info
      cell= 0.0 0.0 0.0 0.0 0.0 0.0 Enter cell parameter a b c alpha beta
            gamma
      chain_type= *Auto PROTEIN DNA RNA  You can specify whether to build
                  protein, DNA, or RNA chains. At present you can only build
                  one of these in a single run. If you have both DNA and
                  protein, build one first, then run AutoBuild again,
                  supplying the prebuilt model in the "input_lig_file_list"
                  and build the other. NOTE: default for this keyword is Auto,
                  which means "carry out normal process to guess this
                  keyword". The process is to look at the sequence file and/or
                  input pdb file to see what the chain type is. If there are
                  more than one type, the type with the larger number of
                  residues is guessed. If you want to force the chain_type,
                  then set it to PROTEIN RNA or DNA.
      change_sg= False You can change the space group. In AutoSol the Wizard
                 will use ImportRawData and let you specify the sg and cell.
                 In AutoMR the wizard will give you an entry form to specify
                 them. NOTE: This only applies when reading in new datasets.
                 It does nothing when changed after datasets are read in.
      residues= None Number of amino acid residues in the au (or equivalent)
      resolution= 0.0 High-resolution limit.Used as resolution limit for
                  density modification and as general default high-resolution
                  limit. If resolution_build or refinement_resolution are set
                  then they override this for model-building or refinement. If
                  overall_resolution is set then data beyond that resolution
                  is ignored completely. 
      sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
      solvent_fraction= None Solvent fraction (typically 0.4 - 0.6)
   decision_making
      acceptable_quality= 40.0 You can specify the minimum overall quality of
                          a model (as defined by overall_score_method) to be
                          considered acceptable
      acceptable_secondary_structure_cc= 0.35 You can specify the minimum
                                         correlation of density from a
                                         secondary structure model to be
                                         considered acceptable
      create_scoring_table= False Choose whether you want a scoring table for
                            solutions A scoring table is slower but better
      desired_coverage= 0.8 Choose what probability you want to have that the
                        correct solution is in your current list of top
                        solutions.  A good value is 0.80.  If you set a low
                        value (0.01) then only one solution will be kept at
                        any time; if you set a high value, then many solutions
                        will be kept (and it will take longer).
      ha_iteration= False Choose whether you want to iterate the heavy-atom
                    search. With iteration, sites are found with HYSS, then
                    used to phase and carry out quick density-modification,
                    then difference Fourier is used to find sites again and
                    improve their accuracy.
      hklperfect= None Enter an mtz file with idealized coefficients for map
                  This will be compared with all maps calculated during
                  structure solution 
      max_cc_extra_unique_solutions= 0.5 Specify the maximum value of CC
                                     between experimental maps for two
                                     solutions to consider them substantially
                                     different. Solutions that are within the
                                     range for consideration based on
                                     desired_coverage, but are outside of the
                                     number of allowed max_choices, will be
                                     considered, up to
                                     max_extra_unique_solutions, if they have
                                     a correlation of no more than
                                     max_cc_extra_unique_solutions with all
                                     other solutions to be tested.
      max_choices= 3 Number of choices for solutions to put on screen
      max_composite_choices= 8 Number of choices for composite solutions to
                             consider
      max_extra_unique_solutions= 2 Specify the maximum number of solutions to
                                  consider based on their uniqueness as well
                                  as their high scores. Solutions that are
                                  within the range for consideration based on
                                  desired_coverage, but are outside of the
                                  number of allowed max_choices, will be
                                  considered, up to
                                  max_extra_unique_solutions, if they have a
                                  correlation of no more than
                                  max_cc_extra_unique_solutions with all other
                                  solutions to be tested.
      max_ha_iterations= 2 Number of iterations of difference Fouriers in
                         searching for heavy-atom sites
      max_range_to_keep= 4.0 The range of solutions to be kept is
                         range_to_keep * SD of the group of solutions. This
                         sets the maximum of range_to_keep
      min_fom= 0.05 Minimum fom of a solution to keep it at all
      min_fom_for_dm= 0.0 Minimum fom of a solution to density modify
                      (otherwise just copy over phases). This is useful in
                      cases where the phasing is so weak that density
                      modification does nothing or makes the phases worse.
      min_phased_each_deriv= 1 You can require that the wizard phase at least
                             this number of solutions from each derivative,
                             even if they are poor solutions. Usually at least
                             1 is a good idea so that one derivative does not
                             dominate the solutions.
      minimum_improvement= 0.0 Minimum improvement in score to continue ha
                           iteration
      n_random= 6 Number of random solutions to generate when setting up
                scoring table
      overall_score_method= *BAYES-CC Z-SCORE You have 2 choices for an
                            overall scoring method: (1) Sum of individual
                            Z-scores (Z-SCORE) (3) Bayesian estimate of CC of
                            map to perfect model (BAYES-CC)  You can specify
                            which scoring criteria to include with
                            score_type_list (default is SKEW CORR_RMS for
                            BAYES-CC and CC RFACTOR SKEW FOM for Z-SCORE.
                            Additionally, if NCS is present, NCS_OVERLAP is
                            used by default in the Z-SCORE method).
      perfect_labels= None Labels for input data columns for hklperfect 
                      Typical value: "FP PHIC FOM"
      r_switch= 0.4 R-value criteria for deciding whether to use R-value or
                residues built A good value is 0.40
      random_scoring= False For testing purposes you can generate random
                      scores
      res_eval= 0.0 Resolution for running resolve evaluation (usually 2.5 A)
      score_individual_offset_list= None Offsets for individual scores in
                                    CC-scoring. Each score will be multiplied
                                    by the score_individual_scale_list value,
                                    then score_individual_offset_list value is
                                    added, to estimate the CC**2 value using
                                    this score by itself. The uncertainty in
                                    the CC**2 value is given by
                                    score_individual_sd_list. NOTE: These
                                    scores are not used in calculation of the
                                    overall score. They are for information
                                    only
      score_individual_scale_list= None Scale factors for individual scores in
                                   CC-scoring. Each score will be multiplied
                                   by the score_individual_scale_list value,
                                   then score_individual_offset_list value is
                                   added, to estimate the CC**2 value using
                                   this score by itself. The uncertainty in
                                   the CC**2 value is given by
                                   score_individual_sd_list. NOTE: These
                                   scores are not used in calculation of the
                                   overall score. They are for information
                                   only
      score_individual_sd_list= None Uncertainties for individual scores in
                                CC-scoring. Each score will be multiplied by
                                the score_individual_scale_list value, then
                                score_individual_offset_list value is added,
                                to estimate the CC**2 value using this score
                                by itself. The uncertainty in the CC**2 value
                                is given by score_individual_sd_list. NOTE:
                                These scores are not used in calculation of
                                the overall score. They are for information
                                only
      score_overall_offset= None Overall offset for scores in CC-scoring. The
                            weighted scores will be summed, then all
                            multiplied by score_overall_scale, then
                            score_overall_offset will be added.
      score_overall_scale= None Overall scale factor for scores in CC-scoring.
                           The weighted scores will be summed, then all
                           multiplied by score_overall_scale, then
                           score_overall_offset will be added.
      score_overall_sd= None Overall SD of CC**2 estimate for scores in
                        CC-scoring. The weighted scores will be summed, then
                        all multiplied by score_overall_scale, then
                        score_overall_offset will be added. This is an
                        estimate of CC**2, with uncertainty about
                        score_overall_sd. Then the square root is taken to
                        estimate CC and SD(CC), where SD(CC) now depends on CC
                        due to the square root.
      score_type_list= SKEW CORR_RMS You can choose what scoring methods to
                       include in scoring of solutions in AutoSol. (The
                       choices available are:  CC_DENMOD  RFACTOR  SKEW 
                       NCS_COPIES  NCS_IN_GROUP  TRUNCATE  FLATNESS  CORR_RMS 
                       REGIONS  CONTRAST  FOM  )  NOTE: If you are using
                       Z-SCORE or BAYES-CC scoring, The default is CC_RMS
                       RFACTOR SKEW FOM (and NCS_OVERLAP if ncs_copies >1).
      score_weight_list= None Weights on scores for CC-scoring. Enter the
                         weight on each score in score_type_list. The weighted
                         scores will be summed, then all multiplied by
                         score_overall_scale, then score_overall_offset will
                         be added.
      skip_score_list= NCS_OVERLAP You can evaluate some scores but not use
                       them. Include the ones you do not want to use in the
                       final score in skip_score_list.
      use_perfect= False  You can use the CC between each solution and
                   hklperfect in scoring. This is only for methods development
                   purposes.
   density_modification
      fix_xyz= False You can choose to not refine coordinates, and instead to
               fix them to the values found by the heavy-atom search.
      fix_xyz_after_denmod= False When sites are found after density
                            modification you can choose whether you want to
                            fix the coordinates to the values found in that
                            map.
      hl_in_resolve= False AutoSol normally does not write out HL coefficients
                     in the resolve.mtz file with density-modified phases. You
                     can turn them on with hl_in_resolve=True
      mask_cycles= 5 Number of mask cycles in density modification (5 is usual
                   for thorough density modification
      mask_type= *histograms probability wang Choose method for obtaining
                 probability that a point is in the protein vs solvent region.
                 Default is "histograms". If you have a SAD dataset with a
                 heavy atom such as Pt or Au then you may wish to choose
                 "wang" because the histogram method is sensitive to very high
                 peaks. Options are: histograms: compare local rms of map and
                 local skew of map to values from a model map and estimate
                 probabilities. This one is usually the best. probability:
                 compare local rms of map to distribution for all points in
                 this map and estimate probabilities. In a few cases this one
                 is much better than histograms. wang: take points with
                 highest local rms and define as protein.
      minor_cycles= 10 Number of minor cycles in density modification for each
                    mask cycle (10 is usual for thorough density modification
      test_mask_type= True You can choose to have AutoSol test histograms/wang
                      methods for identifying solvent region based on the
                      final density modification r-factor.
      thorough_denmod= False Choose whether you want to go for quick density
                       modification (speeds it up and for a terrible map is
                       sometimes better)
      truncate_ha_sites_in_resolve= Auto *Yes No True False You can choose to
                                    truncate the density near heavy-atom sites
                                    at a maximum of 2.5 sigma. This is useful
                                    in cases where the heavy-atom sites are
                                    very strong, and rarely hurts in cases
                                    where they are not. The heavy-atom sites
                                    are specified with "input_ha_file"
      use_ncs_in_denmod= True This script normally uses available ncs
                         information in density modification. Say No to skip
                         this. See also find_ncs
   display
      number_of_solutions_to_display= 1 Number of solutions to put on screen
                                      and to write out
      solution_to_display= 0 Solution number of the solution to display and
                           write out ( use 0 to let the wizard display the top
                           solution)
   general
      background= True When you specify nproc=nn, you can run the jobs in
                  background (default if nproc is greater than 1) or
                  foreground (default if nproc=1).  If you set
                  run_command=qsub (or otherwise submit to a batch queue),
                  then you should set background=False, so that the batch
                  queue can keep track of your runs. There is no need to use
                  background=True in this case because all the runs go as
                  controlled by your batch system. If you use run_command=csh
                  (or similar, csh is default) then normally you will use
                  background=True so that all the jobs run simultaneously.
      base_path= None You can specify the base path for files (default is
                 current working directory)
      clean_up= False At the end of the entire run the TEMP directories will
                be removed if clean_up is True. The default is No, keep these
                directories. If you want to remove them after your run is
                finished use a command like "phenix.autobuild run=1
                clean_up=True"
      coot_name= coot If your version of coot is called something else, then
                 you can specify that here.
      data_quality= *moderate strong weak The defaults are set for you
                    depending on the anticipated data quality. You can choose
                    "moderate" if you are unsure.
      debug= False  You can have the wizard stop with error messages about the
             code if you use debug. NOTE: you cannot use Pause with debug.
      expt_type= *Auto mad sir sad Experiment type (MAD SIR SAD) NOTE: Please
                 treat MIR experiments as a set of SIR experiments. NOTE: The
                 default for this keyword is Auto which means "carry out
                 normal process to guess this keyword". If you have a single
                 file, then it is assumed to be SAD. If you specify
                 native.data and deriv.data it is SIR, if you specify
                 peak.data and infl.data it is MAD. If the Wizard does not
                 guess correctly, you can set it with this keyword.
      extra_verbose= False Facts and possible commands will be printed every
                     cycle if Yes
      i_ran_seed= 588459  Random seed (positive integer) for model-building
                  and simulated annealing refinement
      max_wait_time= 100.0 You can specify the length of time (seconds) to
                     wait when testing the run_command. If you have a cluster
                     where jobs do not start right away you may need a longer
                     time to wait.
      nbatch= 1 You can specify the number of processors to use (nproc) and
              the number of batches to divide the data into for parallel jobs.
              Normally you will set nproc to the number of processors
              available and leave nbatch alone. If you leave nbatch as None it
              will be set automatically, with a value depending on the Wizard.
              This is recommended. The value of nbatch can affect the results
              that you get, as the jobs are not split into exact replicates,
              but are rather run with different random numbers. If you want to
              get the same results, keep the same value of nbatch.
      nproc= 1 You can specify the number of processors to use (nproc) and the
             number of batches to divide the data into for parallel jobs.
             Normally you will set nproc to the number of processors available
             and leave nbatch alone. If you leave nbatch as None it will be
             set automatically, with a value depending on the Wizard. This is
             recommended. The value of nbatch can affect the results that you
             get, as the jobs are not split into exact replicates, but are
             rather run with different random numbers. If you want to get the
             same results, keep the same value of nbatch.
      resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
                    ("","_giant","_huge","_extra_huge")
      run_command= csh When you specify nproc=nn, you can run the subprocesses
                   as jobs in background with csh (default) or submit them to
                   a queue with the command of your choice (i.e., qsub ). If
                   you have a multi-processor machine, use csh. If you have a
                   cluster, use qsub or the equivalent command for your
                   system.  NOTE: If you set run_command=qsub (or otherwise
                   submit to a batch queue), then you should set
                   background=False, so that the batch queue can keep track of
                   your runs. There is no need to use background=True in this
                   case because all the runs go as controlled by your batch
                   system. If you use run_command=csh (or similar, csh is
                   default) then normally you will use background=True so that
                   all the jobs run simultaneously.
      skip_xtriage= False You can bypass xtriage if you want. This will
                    prevent you from applying anisotropy corrections, however.
      temp_dir= None Define a temporary directory (it must exist)
      thoroughness= *quick thorough You can try to run quickly and see if you
                    can get a solution ("quick") or more thoroughly to get the
                    best possible solution ("thorough").
      title= Run 1 AutoSol Sun Dec 7 17:46:23 2008  Enter any text you like to
             help identify what you did in this run
      top_output_dir= None This is used in subprocess calls of wizards and to
                      tell the Wizard where to look for the STOPWIZARD file. 
      verbose= False Command files and other verbose output will be printed
   heavy_atom_search
      acceptable_cc_hyss= 0.2 Hyss will be run at up to n_add_res_max+1
                          resolutions starting with res_hyss and adding
                          increments of add_res_max/n_add_res_max. If the best
                          CC value is greater than acceptable_cc_hyss then no
                          more resolutions are tried.
      add_res_max= 2.0 Hyss will be run at up to n_add_res_max+1 resolutions
                   starting with res_hyss and adding increments of
                   add_res_max/n_add_res_max. If the best CC value is greater
                   than acceptable_cc_hyss then no more resolutions are tried.
      best_of_n_hyss= 1 Hyss will be run up to best_of_n_hyss_always times at
                      a given resolution. If the best CC value is greater than
                      good_cc_hyss and the number of sites found is at least
                      min_fraction_of_sites_found times the number expected
                      and Hyss was tried at least best_of_n_hyss times, then
                      the search is ended.
      best_of_n_hyss_always= 10 Hyss will be run up to best_of_n_hyss_always
                             times at a given resolution. If the best CC value
                             is greater than good_cc_hyss and the number of
                             sites found is at least
                             min_fraction_of_sites_found times the number
                             expected and Hyss was tried at least
                             best_of_n_hyss times, then the search is ended.
      good_cc_hyss= 0.3 Hyss will be run up to best_of_n_hyss_always times at
                    a given resolution. If the best CC value is greater than
                    good_cc_hyss and the number of sites found is at least
                    min_fraction_of_sites_found times the number expected and
                    Hyss was tried at least best_of_n_hyss times, then the
                    search is ended.
      hyss_enable_early_termination= True You can specify whether to stop HYSS
                                     as soon as it finds a convincing solution
                                     (Yes, default) or to keep trying...
      hyss_general_positions_only= True Select Yes if you want HYSS only to
                                   consider general positions and ignore sites
                                   on special positions. This is appropriate
                                   for SeMet or S-Met solutions, not so
                                   appropriate for heavy-atom soaks
      hyss_min_distance= 3.5 Enter the minimum distance between heavy-atom
                         sites to keep them in HYSS
      hyss_n_fragments= 3 Enter the number of fragments in HYSS
      hyss_n_patterson_vectors= 33 Enter the number of Patterson vectors to
                                consider in HYSS
      hyss_random_seed= 792341 Enter an integer as random seed for HYSS
      mad_ha_n= None Number of heavy atoms (anomalously-scattering atoms) in
                the au
      mad_ha_type= Se Enter the anomalously-scattering or heavy atom type. For
                   example, Se or Au. NOTE: if you want Phaser to add
                   additional heavy-atoms of other types, you can specify them
                   with mad_ha_add_list.
      max_single_sites= 5 In sites_from_denmod a core set of sites that are
                        strong is identified. If the hand of the solution is
                        known then additional sites are added all at once up
                        to the expected number of sites. Otherwise sites are
                        added one at a time, up to a maximum number of tries
                        of max_single_sites
      min_fraction_of_sites_found= 1.0 Hyss will be run up to
                                   best_of_n_hyss_always times at a given
                                   resolution. If the best CC value is greater
                                   than good_cc_hyss and the number of sites
                                   found is at least
                                   min_fraction_of_sites_found times the
                                   number expected and Hyss was tried at least
                                   best_of_n_hyss times, then the search is
                                   ended.
      min_hyss_cc= 0.05 Minimum CC of a heavy-atom solution in HYSS to keep it
                   at all
      n_add_res_max= 2 Hyss will be run at up to n_add_res_max+1 resolutions
                     starting with res_hyss and adding increments of
                     add_res_max/n_add_res_max. If the best CC value is
                     greater than acceptable_cc_hyss then no more resolutions
                     are tried.
   input_files
      cif_def_file_list= None  You can enter any number of CIF definition
                         files.  These are normally used to tell phenix.refine
                         about the geometry of a ligand or unusual residue. 
                         You usually will use these in combination with "PDB
                         file with metals/ligands" (keyword
                         "input_lig_file_list" ) which allows you to attach
                         the contents of any PDB file you like to your model
                         just before it gets refined.  You can use
                         phenix.elbow to generate these if you do not have a
                         CIF file and one is requested by phenix.refine
      group_labels_list= None For command-line and script running of AutoSol,
                         you may wish to use keywords to specify which set of
                         data columns to be used from an MTZ or other file
                         type with multiple datasets. (From the GUI, it is
                         easy because you are prompted with the column
                         labels).  You can do this by specifying a string that
                         identifies which dataset to include. All allowed
                         values of this identification string will be written
                         out any time AutoSol is run on this dataset like
                         this: NOTE: To specify a particular set of data you
                         can specify one of the following (this example is for
                         MAD data, specifying data for peak wavelength): ...: 
                        peak.labels='F SIGF DANO SIGDANO' peak.labels='F(+)
                         SIGF(+) F(-) SIGF(-)'  You can then use one of the
                         above commands on the command-line to identify the
                         dataset of interest.  If you want to use a script
                         instead, you can specify N files in your
                         input_data_file_list, and then specify N values for
                         group_labels_list like this: group_labels_list
                         'F,SIGF,DANO,SIGDANO' 'F(+),SIGF(+),F(-),SIGF(-)'
                         This will take 'F,SIGF,DANO,SIGDANO' as the data for
                         datafile 1 and 'F(+),SIGF(+),F(-),SIGF(-)' for
                         datafile 2  You can identify one dataset from each
                         input file in this way. If you want more than one,
                         then please use phenix.reflection_file_converter to
                         split your input file, or else use the GUI version of
                         AutoSol in which you can select any subset of the
                         data that you wish.
      input_file_list= None Input data files: Any standard format is fine. If
                       all files are Scalepack premerged or all are Scalepack
                       unmerged original index then they will be used as is.
                       In all other cases all files are converted next to
                       Scalepack premerged.
      input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set
                     then density at sites specified with input_ha_file is
                     truncated to improve the density modification procedure.
      input_phase_file= None MTZ data file with FC PHIC or equivalent to use
                        for finding heavy-atom sites with difference Fourier
                        methods.
      input_refinement_file= None Data file to use for refinement. The data in
                             this file should not be corrected for anisotropy.
                             It will be combined with experimental phase
                             information for refinement. If you leave this
                             blank, then the output of phasing will be used in
                             refinement (see below). If no anisotropy
                             correction is applied to the data you do not need
                             to specify a datafile for refinement. If an
                             anisotropy correction is applied to the data
                             files, then you must enter a datafile for
                             refinement if you want to refine your model. (See
                             "correct_aniso" for specifying whether an
                             anisotropy correction is applied. In most cases
                             it is not.)  If an anisotropy correction is
                             applied and no refinement datafile is supplied,
                             then no refinement will be carried out in the
                             model-building step.  You can choose any of your
                             datafiles to be the refinement file, or a native
                             that is not part of the datasets for structure
                             solution. If there are more than one dataset you
                             will be asked each time for a refinement file,
                             but only the last one will be used.   Any
                             standard format is fine; normally only F and sigF
                             will be used. Bijvoet pairs and duplicates will
                             be averaged. If an mtz file is provided then a
                             free R flag can be read in as well.  If you do
                             not provide a refinement file then the structure
                             factors from the phasing step will be used in
                             refinement. This is normally satisfactory for SAD
                             data and MIR data. For MAD data you may wish to
                             supply a refinement file because the structure
                             factors from phasing are a combination of data
                             from different wavelengths of data. It is better
                             if you choose your best wavelength of data for
                             refinement.
      input_refinement_labels= None Labels for input refinement file columns
                               (FP SIGFP FreeR_flag)
      input_seq_file= Auto Enter name of file with 1-letter code of protein
                      sequence NOTES: 1. lines starting with > are ignored
                      and separate chains  2. FASTA format is fine  3. If
                      there are multiple copies of a chain, just enter one
                      copy.  4. If you enter a PDB file for rebuilding and it
                      has the sequence you want, then the sequence file is not
                      necessary.   NOTE: You can also enter the name of a PDB
                      file that contains SEQRES records, and the sequence from
                      the SEQRES records will be read, written to
                      seq_from_seqres_records.dat, and used as your input
                      sequence.  NOTE: for AutoBuild you can specify
                      start_chains_list on the first line of your sequence
                      file: >> start_chains_list 23 11 5 NOTE: default
                      for this keyword is Auto, which means "carry out normal
                      process to guess this keyword". This means if you
                      specify "after_autosol" in AutoBuild, AutoBuild will
                      automatically take the value from AutoSol. If you do not
                      want this to happen, you can specify None which means
                      "No file"
      refine_eff_file_list= None  You can enter any number of refinement
                            parameter files.  These are normally used to tell
                            phenix.refine defaults to apply, as well as
                            creating specialized definitions such as unusual
                            amino acid residues and linkages.  These
                            parameters override the normal phenix.refine
                            defaults. They themselves can be overridden by
                            parameters set by the Wizard and by you,
                            controlling the Wizard. NOTE: Any parameters set
                            by AutoBuild directly (such as
                            number_of_macro_cycles, high_resolution, etc...)
                            will not be taken from this parameters file. This
                            is useful only for adding extra parameters not
                            normally set by AutoBuild.
   model_building
      add_sidechains= True Add side chains on to main-chain in Textal
                      model-building. This requires a sequence file
      build= True Build model after density modification?
      build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build
                  models with RESOLVE and TEXTAL or either one, and how many
                  different models to build with RESOLVE. The more you build,
                  the more likely to get a complete model.  Note that
                  rebuild_in_place can only be carried out with RESOLVE
                  model-building
      capra= True CAPRA is used to place CA atoms
      cc_helix_min= None Minimum CC of helical density to map at low
                    resolution when using helices_strands_only
      cc_strand_min= None Minimum CC of strand density to map when using
                     helices_strands_only
      d_max_textal= 1000.0 This low-resolution limit is only used for Textal
                    model-building
      d_min_textal= 2.8 Textal has an optimal high-resolution limit of 2.8 A
                    This limit is only used for Textal model-building
      fit_loops= True You can fit loops automatically if sequence alignment
                 has been done.
      group_ca_length= 4 In resolve building you can specify how short a
                       fragment to keep. Normally 4 or 5 residues should be
                       the minimum.
      group_length= 2 In resolve building you can specify how many fragments
                    must be joined to make a connected group that is kept.
                    Normally 2 fragments should be the minimum.
      helices_strands_only= False You can choose to use a quick model-building
                            method that only builds secondary structure. At
                            low resolution this may be both quicker and more
                            accurate than trying to build the entire structure
                            If you are running the AutoSol Wizard, normally
                            you should choose 'Yes' and use the quick
                            model-building. Then when your structure is solved
                            by AutoSol, go on to AutoBuild and build a more
                            complete model (this time normally using
                            helices_strands_only=False).
      helices_strands_start= True You can choose to use a quick model-building
                             method that builds secondary structure as a way
                             to get started...then model completion is done as
                             usual. (Contrast with helices_strands_only which
                             only does secondary structure)
      input_compare_file= None If you are rebuilding a model or already think
                          you know what the model should be, you can include a
                          comparison file in rebuilding. The model is not used
                          for anything except to write out information on
                          coordinate differences in the output log files. 
                         NOTE: this feature does not always work correctly.
      loop_cc_min= 0.4 You can specify the minimum correlation of density from
                   a loop with the map.
      n_cycle_build= 3 Choose number of cycles (3). This does not apply if
                     TEXTAL is selected for build_type
      n_random_frag= 0 In resolve building you can randomize each fragment
                     slightly so as to generate more possibilities for tracing
                     based on extending it.
      n_random_loop= 3  Number of randomized tries from each end for building
                     loops If 0, then one try. If N, then N additional tries
                     with randomization based on rms_random_loop.
      ncycle_refine= 3 Choose number of refinement cycles (3)
      number_of_builds= 2 Number of different solutions to build models for
      number_of_models= 3 This parameter lets you choose how many initial
                        models to build with RESOLVE within a single build
                        cycle. This parameter is now superseded by
                        number_of_parallel_models, which sets the number of
                        models (but now entire build cycles) to carry out in
                        parallel. A zero means set it automatically. That is
                        what you normally should use. The number_of_models is
                        by default set to 1 and number_of_parallel_models is
                        set to the value of nbatch (typically 4).
      offsets_list= 53 7 23 You can specify an offset for the orientation of
                    the helix and strand templates in building. This is used
                    in generating different starting models.
      quick_build= False Choose whether you want to go for quick
                   model-building (speeds it up, and for poor maps, is
                   sometimes better)
      rebuild_side_chains= False  You can choose to replace side chains (with
                           extend_only) before rebuilding the model (not
                           normally used)
      refine= False This script normally refines the model during building.
              Say No to skip refinement
      resolution_build= 0.0 Enter the high-resolution limit for
                        model-building. If 0.0, the value of resolution is
                        used as a default. 
      resolve_command_list= None  Commands for resolve. One per line in the
                            form:  keyword value  value can be optional 
                            Examples:  coarse_grid  resolution 200 2.0  hklin
                            test.mtz  NOTE: for command-line usage you need to
                            enclose the whole set of commands in double quotes
                            (") and each individual command in single quotes
                            (') like this: resolve_command_list="'no_build'
                            'b_overall 23' "
      retrace_before_build= False  You can choose to retrace your model n_mini
                            times and use a map based on these retraced models
                            to start off model-building. This is the default
                            for rebuilding models if you are not using
                            rebuild_in_place. You can also specify
                            n_iter_rebuild, the number of cycles of
                            retrace-density-modify-build before starting the
                            main build.
      rms_random_frag= None  Rms random position change added to residues on
                       ends of fragments when extending them  If you enter a
                       negative number, defaults will be used.
      rms_random_loop= None  Rms random position change added to residues on
                       ends of loops in tries for building loops  If you enter
                       a negative number, defaults will be used.
      semet= False You can specify that the dataset that is used for
             refinement is a selenomethionine dataset, and that the model
             should be the SeMet version of the protein, with all SD of MET
             replaced with Se of MSE.
      solve_command_list= None  Commands for solve. One per line in the form: 
                          keyword value  value can be optional  Examples: 
                          verbose  resolution 200 2.0
      start_chains_list= None  You can specify the starting residue number for
                         each of the unique chains in your structure. If you
                         use a sequence file then the unique chains are
                         extracted and the order must match the order of your
                         starting residue numbers. For example, if your
                         sequence file has chains A and B (identical) and
                         chains C and D (identical to each other, but
                         different than A and B) then you can enter 2 numbers,
                         the starting residues for chains A and C. NOTE: you
                         need to specify an input sequence file for
                         start_chains_list to be applied.
      thorough_loop_fit= True Try many conformations and accept them even if
                         the fit is not perfect? If you say Yes the parameters
                         for thorough loop fitting are: n_random_loop=100
                         rms_random_loop=0.3 rho_min_main=0.5 while if you say
                         No those for quick loop fitting are: n_random_loop=20
                         rms_random_loop=0.3 rho_min_main=1.0
      trace_as_lig= False You can specify that in building steps the ends of
                    chains are to be extended using the LigandFit algorithm.
                    This is default for nucleic acid model-building.
      use_any_side= False  You can choose to have resolve model-building place
                    the best-fitting side chain at each position, even if the
                    sequence is not matched to the map.
      use_met_in_align= Auto *Yes No True False You can use the heavy-atom
                        positions in input_ha_file as markers for Met SD
                        positions.
   ncs
      find_ncs= Auto *Yes No True False This script normally deduces ncs
                information from the NCS in chains of models that are built
                during iterative model-building. The update is done each cycle
                in which an improved model is obtained. Say No to skip this. 
                See also "input_ncs_file" which can be used to specify NCS at
                the start of the process. If find_ncs="No" then only this
                starting NCS will be used and it will not be updated. You can
                use find_ncs "No" to specify exactly what residues will be
                used in NCS refinement and exactly what NCS operators to use
                in density modification. You can use the function
                $PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to
                help you set up an input_ncs_file that has your specifications
                in it.
      ncs_copies= None Number of copies of the molecule in the au (note: only
                  one type of molecule allowed at present)
      ncs_refine_coord_sigma_from_rmsd= False  You can choose to use the
                                        current NCS rmsd as the value of the
                                        sigma for NCS restraints.  See also
                                        ncs_refine_coord_sigma_from_rmsd_ratio
                                        
      ncs_refine_coord_sigma_from_rmsd_ratio= 1.0  You can choose to multiply
                                              the current NCS rmsd by this
                                              value before using it as the
                                              sigma for NCS restraints  See
                                              also
                                              ncs_refine_coord_sigma_from_rmsd
                                              
      optimize_ncs= True This script normally deduces ncs information from the
                    NCS in chains of models that are built during iterative
                    model-building. Optimize NCS adds a step to try and make
                    the molecule formed by NCS as compact as possible, without
                    losing any point-group symmetry.
      refine_with_ncs= True This script can allow phenix.refine to
                       automatically identify NCS and use it in refinement. 
                       NOTE: ncs refinement and placing waters automatically
                       are mutually exclusive at present.
   phasing
      do_madbst= True Choose whether you want to skip FA calculation (speeds
                 it up)
      f_doubleprime_list= None Enter f" for the heavy-atom for this dataset
      f_prime_list= None Enter f' for the heavy-atom for this dataset
      fixscattfactors= True For SOLVE phasing and MAD data you can choose
                       whether scattering factors are to be fixed by choosing
                       'Yes' to fix them or 'No' to refine them. Normally
                       choose 'Yes' (fix) if the data are weak and 'No'
                       (refine) if the data are strong.
      ha_sites_file= None Input sites file... with xyz in fractional
                     coordinates or a PDB file with coordinates NOTE: This
                     file is optional if you specify a partial model file
      have_hand= False Normally you will not know the hand of the heavy-atom
                 substructure, so have_hand=False. H