phenix_logo
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
 

Automated structure solution with AutoSol

Author(s)
Purpose
Usage
How the AutoSol Wizard works
Setting up inputs
Datafile formats in AutoSol
Datasets and Solutions in AutoSol
Analyzing and scaling the data
Finding heavy-atom (anomalously-scattering atom) sites
Running AutoSol separately in related space groups
Scoring of heavy-atom solutions
Phasing
Density modification (including NCS averaging)
Preliminary model-building and refinement
Resolution limits in AutoSol
Output files from AutoSol
How to run the AutoSol Wizard
Running from a parameters file
Model viewing during model-building with the Coot-PHENIX interface
Examples
SAD dataset
SAD dataset specifying solvent fraction
SAD dataset without model-building
SAD dataset, building RNA instead of protein
SAD dataset, selecting a particular dataset from an MTZ file
MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model
Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)
SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss
MAD dataset
MAD dataset, selecting particular datasets from an MTZ file
SIR dataset
SAD with more than one anomalously-scattering atom
MIR dataset
SIR + SAD datasets
Possible Problems
General limitations
Specific limitations and problems
Literature
Additional information
List of all AutoSol keywords

Author(s)

  • AutoSol Wizard: Tom Terwilliger
  • PHENIX GUI: Nathaniel Echols
  • HYSS: Ralf W. Grosse-Kunstleve and Paul D. Adams
  • Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
  • SOLVE: Tom Terwilliger
  • RESOLVE: Tom Terwilliger
  • phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams
  • phenix.xtriage: Peter Zwart

Purpose

The AutoSol Wizard uses HYSS, SOLVE, Phaser, RESOLVE, xtriage and phenix.refine to solve a structure and generate experimental phases with the MAD, MIR, SIR, or SAD methods. The Wizard begins with datafiles (.sca, .hkl, etc) containing amplitidues (or intensities) of structure factors, identifies heavy-atom sites, calculates phases, carries out density modification and NCS identification, and builds and refines a preliminary model.

Usage

The AutoSol Wizard can be run from the PHENIX GUI, from the command-line, and from parameters files. All three versions are identical except in the way that they take commands from the user. See Using the PHENIX Wizards for details of how to run a Wizard. The command-line version will be described here, except for MIR and multiple datasets, which can only be run with the GUI or with a parameters file. The GUI is documented separately.

How the AutoSol Wizard works

The basic steps that the AutoSol Wizard carries out are described below. They are: Setting up inputs, Analyzing and scaling the data, Finding heavy-atom (anomalously-scattering atom) sites, Scoring of heavy-atom solutions, Phasing, Density modification (including NCS averaging), and Preliminary model-building and refinement. The data for structure solution are grouped into Datasets and solutions are stored in Solution objects.

Setting up inputs

The AutoSol Wizard expects the following basic information:

(1) a datafile name (w1.sca or data=w1.sca)

(2) a sequence file (seq.dat or seq_file=seq.dat)

(3) how many sites to look for (2 or sites=2)

(4) what the anomalously-scattering atom is (Se or atom_type=Se)

(5) It is helpful to add the wavelength and f_prime and f_double_prime for each wavelength or derivative that you have as well

You can also specify many other parameters, including resolution, number of sites, whether to search in a thorough or quick fashion, how thoroughly to build a model, etc. If you have a heavy-atom solution from a previous run or another approach, you can read it in directly as well.

Your parameters can be specified on the command-line, using a GUI, or by editing a parameters file (examples below).

Datafile formats in AutoSol

AutoSol will accept the following formats of data:

  • scalepack unmerged original index: I,SIGI
  • scalepack premerged: I+,SIGI+,I-,SIGI-
  • mtz unmerged: I, SIGI, M_ISYM
  • mtz premerged: I(+), SIGI(+), I(-), SIGI(-)
  • d*trek
  • CNS

The data from any of these formats will be converted to amplitudes (F+ , sigF+, and F-, sigF-) internally.

For the best scaling results, you should supply all scalepack unmerged original index files or all mtz unmerged files. If all the files are scalepack unmerged original index or all the files are mtz unmerged and no anisotropy correction is applied, then SOLVE local scaling will be applied to the data prior to merging and averaging equivalent reflections. In all other cases equivalent reflections will be averaged prior to scaling, so that the scaling may not be as effective at removing systematic errors due to absorption or other effects.

Datasets and Solutions in AutoSol

AutoSol breaks down the data for a structure solution into datasets, where a dataset is a set of data that corresponds to a single set of heavy-atom sites. An entire MAD dataset is a single dataset. An MIR structure solution consists of several datasets (one for each native-derivative combination). A MAD + SIR structure has one dataset for the MAD data and a second dataset for the SIR data. The heavy-atom sites for each dataset are found separately (but using difference Fouriers from any previously-solved datasets to help). In the phasing step all the information from all datasets is merged into a single set of phases.

The AutoSol wizard uses a "Solution" object to keep track of heavy-atom solutions and the phased datasets that go with them. There are two types of Solutions: those which consist of a single dataset (Primary Solutions) and those that are combinations of datasets (Composite Solutions). "Primary" Solutions have information on the datafiles that were part of the dataset and on the heavy-atom sites for this dataset. Composite Solutions are simply sets of Primary Solutions, with associated origin shifts. The hand of the heavy-atom or anomalously-scattering atom substructure is part of a Solution, so if you have two datatsets, each with two Solutions related by inversion, then AutoSol would normally construct four different Composite Solutions from these and score each one as described below.

Analyzing and scaling the data

The AutoSol Wizard analyzes input datasets with phenix.xtriage to identify twinning and other conditions that may require special care. The data is scaled with SOLVE. For MAD data, FA values are calculated as well.

Note on anisotropy corrections:

The AutoSol wizard will apply an anistropy correction and B-factor sharpening to all the raw experimental data by default (controlled by they keyword remove_aniso=True). The target overall Wilson B factor can be set with the keyword b_iso, as in b_iso=25. By default the target Wilson B will be 10 times the resolution of the data (e.g., if the resolution is 3 A then b_iso=30.), or the actual Wilson B of the data, whichever is lower.

If an anisotropy correction is applied then the entire AutoSol run will be carried out with anisotropy-corrected and sharpened data. At the very end of the run the final model will be re-refined against the uncorrected refinement data and this re-refined model and the uncorrected refinement data (with freeR flags) will be written out. For the top solution this will be as overall_best.pdb and overall_best_refine_data.mtz; for all other solutions the files will be listed at the end of the log file.

Finding heavy-atom (anomalously-scattering atom) sites

The AutoSol Wizard uses HYSS to find heavy-atom sites. The result of this step is a list of possible heavy-atom solutions for a dataset. For SIR or SAD data, the isomorphous or anomalous differences, respectively are used as input to HYSS. For MAD data, the anomalous differences at each wavelength, and the FA estimates of complete heavy-atom structure factors from SOLVE are each used as separate inputs to HYSS. Each heavy-atom substructure obtained from HYSS corresponds to a potential solution. In space groups where the heavy-atom structure can be either hand, a pair of enantiomorphic solutions is saved for each run of HYSS.

Running AutoSol separately in related space groups

AutoSol will check for the opposite hand of the heavy-atom solution, and at the same time it will check for the opposite hand of your space group (It will invert the heavy-atom solution from HYSS and invert the hand of the space group at the same time). Therefore you do not need to run AutoSol twice for space groups that are chiral (for example P41). The corresponding inverse space groups will be checked automatically (P43 ). If there are possibilities for your space group other than the inverse hand of the space group, then you should test them all, one at a time. For example if you were not able to measure 00l reflections in a hexagonal space group, your space group might be P6, P61, P62, P63, P64 or P65. In this case you would have to run it in P6, P61 P62 and P63 (and then P65 and P64 will be done automatically as the inverses of P61 and P62). Normally only one of these will give a plausible solution.

Scoring of heavy-atom solutions

Potential heavy-atom solutions are scored based on a set of criteria (SKEW, CORR_RMS, CC_DENMOD, RFACTOR, NCS_OVERLAP,TRUNCATE, REGIONS, CONTRAST, FOM, FLATNESS, described below), using either a Bayesian estimate or a Z-score system to put all the scores on a common scale and to combine them into a single overall score. The overall scoring method chosen (BAYES-CC or Z-SCORE) is determined by the value of the keyword overall_score_method. The default is BAYES-CC. Note that for all scoring methods, the map that is being evaluated, and the estimates of map-perfect-model correlation, refer to the experimental electron density map, not the density-modified map.

Bayesian CC scores (BAYES-CC). Bayesian estimates of the quality of experimental electron density maps are obtained using data from a set of previously-solved datasets. The standard scoring criteria were evaluated for 1905 potential solutions in a set of 246 MAD, SAD, and MIR datasets. As each dataset had previously been solved, the correlation between the refined model and each experimental map (CC_PERFECT) could be calculated for each solution (after offsetting the maps to account for origin differences). Histograms were tabulated of the number of instances that a scoring criterion (e.g., SKEW) had various possible values, as a function of the CC_PERFECT of the corresponding experimental map to the refined model. These histograms yield the relative probability of measuring a particular value of that scoring criterion (SKEW), given the value of CC_PERFECT. Using Bayes' rule, these probabilities can be used to estimate the relative probabilities of values of CC_PERFECT given the value of each scoring criterion for a particular electron density map. The mean estimate (BAYES-CC) is reported (multiplied x 100), with a +/-2SD estimate of the uncertainty in this estimate of CC_PERFECT. The BAYES-CC values are estimated independently for each scoring criterion used, and also from all those selected with the keyword score_type_list and not selected with the keyword skip_score_list.

Z-scores (Z-SCORE). The Z-score for one criterion for a particular solution is given by,

Z= (Score - mean_random_solution_score)/(SD_of_random_solution_scores)
where Score is the score for this solution, mean_random_solution_score is the mean score for a solution with randomized phases, and SD_of_random_solution_scores is the standard deviation of the scores of solutions with randomized phases.

To create a total score based on Z-scores, the Z-scores for each criterion are simply summed.

The principal scoring criteria are:

The skew (SKEW; third moment or normalized <rho**3>) of the density in an electron density map is a good measure of its quality, because a random map has a skew of zero (density histograms look like a Gaussian), while a good map has a very positive skew (density histograms very strong near zero, but many points with very high density). This criterion is used in scoring by default. Correlation of local rms density (CORR_RMS). The presence of contiguous flat solvent regions in a map was detected using the correlation coefficient of the smoothed squared electron density calculated as described above, with the same quantity calculated using half the value of the smoothing radius, yielding the correlation of rms density, r2RMS. In this way the local value of the rms density within a small local region (typically within a radius of 3 A) is compared with the local rms density in a larger local region (typically within a radius of 6 A). If there were a large, contiguous solvent region and another large contiguous region containing the macromolecule, the local rms density in the small region would be expected to be highly correlated with the rms density in the larger region. On the other hand, if the solvent region were broken up into many small flat regions, then this correlation would be expected to be smaller.

Correlation of map-phased electron density map with experimentally- phased map (CC_DENMOD). The statistical density modification in RESOLVE allows the calculation of map-based phases that are (mostly) independent of the experimental phases. The phase information in statistical density modification comes from two sources: your experimental phases and maximization of the agreement of the map with expectations (such as a flat solvent region). Normally the phase probabilities from these two sources are merged together, yielding your density-modified phases. This score is calculated based on the correlation of the phase information from these two sources before combining them, and is a good indication of the quality of the experimental phases. This criterion is used in scoring by default.

The R-factor for density modification (RFACTOR). Statistical density modification provides an estimate of structure factors that is (mostly) independent of the measured structure factors, so the R-factor between FC and Fobs is a good measure of the quality of experimental phases. This criterion is used in scoring by default.

Non-crystallographic symmetry (NCS_OVERLAP). The presence of NCS in a map is a nearly-positive indication that the map is good, or has some correct features. The AutoSol Wizard uses symmetry in heavy-atom sites to suggest NCS, and RESOLVE identifies the actual correlation of NCS-related density for the NCS overlap score. This score is used by default if NCS is present in the Z-score method of scoring.

Figure of merit (FOM). The figure of merit of phasing is a good indicator of the internal consistency of a solution. This score is not normalized by the SD of randomized phase sets (as that has no meaning; rather a standard SD=0.05 is used). This score is used by default if NCS is present in the Z-score method of scoring and in the Bayesian CC estimate method.

Map correlation after truncation (TRUNCATION). Dummy atoms (the same number as estimated non-hydrogen atoms in the structure) are placed in positions of high density of the map, and a new map is calculated based on these atomic positions. The correlation of these maps is calculated after adjusting an overall B-value for the dummy atoms to maximize the correlation. A good map will show a high correlation of these maps. This score is by default not used.

Number of contiguous regions per 100 A**3 comprising top 5% of density in map (REGIONS). The top 5% of points in the map are marked, and the number of contiguous regions that result are counted, and divided by the volume of the asymmetric unit, then multiplied by 100. A good map will have just a few contiguous regions at a high contour level, a poor map will have many isolated peaks. This score is by default not used.

Contrast, or standard deviation of local rms density (CONTRAST). The local rms density in the map is calculated using a smoothing radius of 3 times the high-resolution cutoff (or 6 A, if less than 6A). Then the standard deviation of the local rms, normalized to the mean value of the local rms, is reported. This criteria will be high if there are regions of high local rms (the macromolecule) and separate regions of low local rms (the solvent) and low if the map is random. This score is by default not used.

Phasing

The AutoSol Wizard uses Phaser to calculate experimental phases from SAD data, and SOLVE to calculate phases from MIR, MAD, and multiple-dataset cases.

Density modification (including NCS averaging)

The AutoSol Wizard uses RESOLVE to carry out density modification. It identifies NCS from symmetries in heavy-atom sites with RESOLVE and applies this NCS if it is present in the electron density map.

Preliminary model-building and refinement

The AutoSol Wizard carries out one cycle of model-building and refinement after obtaining density-modified phases. The model-building is done with RESOLVE. The refinement is carried out with phenix.refine.

Resolution limits in AutoSol

There are several resolution limits used in AutoSol. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name

Description

How default value is set

resolution

Overall resolution for a dataset

Highest resolution for any datafile in this dataset. For multiple datasets, the highest resolution for any dataset

refinement_resolution

Resolution for refinement

value of "resolution"

resolution_build

Resolution for model-building

value of "resolution"

res_phase

Resolution for phasing for a dataset

If phase_full_resolution=True then use value of "resolution". Otherwise, use value of "recommended_resolution" based on analysis of signal-to-noise in dataset.

res_eval

Resolution for evaluation of solution quality

value of "resolution" or 2.5 A, whichever is lower resolution.

Output files from AutoSol

When you run AutoSol the output files will be in a subdirectory with your run number:

AutoSol_run_1_/

The key output files that are produced are:

  • A log file describing everything in the run and the files produced:
    AutoSol_run_1_1.log # overall log file
    

  • A summary file listing the results of the run and the other files produced:
    AutoSol_summary.dat  # overall summary
    

  • A warnings file listing any warnings about the run
    AutoSol_warnings.dat  # any warnings
    

  • Density-modified map coefficients (NOTE: These files be aniso-corrected and sharpened if remove_aniso=True)
    overall_best_denmod_map_coeffs.mtz # map coefficients (density modified phases)
    

  • Current preliminary model
     overall_best.pdb # model produced for top solution
    
    NOTE: If there are multiple chains or multiple ncs copies, each chain will be given its own chainID (A B C D...). Segments that are not assigned to a chain are given a separate chainID and are given a segid of "UNK" to indicate that their assignment is unknown. The chainID for solvent molecules is normally S, and the chainID for heavy-atoms is normally Z.

  • An mtz file for use in refinement NOTE 1 : not aniso corrected and not sharpened. NOTE 2: Two sets of HL coefficients may be present. Normally use HLA HLB etc . However, if you supplied a model with input_partpdb_file=my_model.pdb then use instead HLanomA HLanomB etc. The reason is that HL coeffs contain phase information from my_model.pdb in this case and you do not want that information passed to your refinement program.
     overall_best_refine_data.mtz # F Sigma HL coeffs, freeR-flags for refinement
    
    NOTE: if this is a SAD or MAD dataset then the overall_best_refine_data.mtz file will normally have your original anomalous data.For MAD data this will be from the wavelength of data with the highest-resolution data present.

  • Heavy atom sites in PDB format
     overall_best_ha_pdb.pdb # ha file for top solution
    

  • NCS information (if any)
    overall_best_ncs_file.ncs_spec # NCS information for top solution
    

  • Experimental phases and HL coefficients (NOTE: These files are aniso-corrected and sharpened if remove_aniso=True)
    overall_best_hklout_phased.mtz # phases and HL coeffs for top solution
    

  • Log file for experimental phasing
    overall_best_log_phased.log # experimental phasing log file for top solution
    

  • Log file for scaling
    overall_best_log_phased.log # experimental phasing log file for top solution
    

  • Log file for heavy-atom substructure search
    overall_best_log_hyss.log # ha search log file for top solution
    

How to run the AutoSol Wizard

Running the AutoSol Wizard is easy. From the command-line you can type:

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5

The AutoSol Wizard will assume that w1.sca is a datafile (because it ends in .sca and is a file) and that seq.dat is a sequence file, that there are 2 heavy-atom sites, and that the heavy-atom is Se. The f_prime and f_double_prime values are set explicitly

You can also specify each of these things directly:

phenix.autosol data=w1.sca seq_file=seq.dat sites=2 \
   atom_type=Se f_prime=-8 f_double_prime=4.5

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Using the PHENIX Wizards for how to do this. Some of the most common parameters are:

sites=3     # 3 sites
sites_file=sites.pdb  # ha sites in PDB or fractional xyz format
atom_type=Se   # Se is the heavy-atom
seq_file=seq.dat   # sequence file (1-aa code, separate chains with >>>>)
quick=True  # try to find sites quickly
data=w1.sca  # input datafile
lambda=0.9798  # wavelength for SAD

Running from a parameters file

You can run phenix.autosol from a parameters file. This is often convenient because you can generate a default one with:

phenix.autosol --show_defaults > my_autosol.eff
and then you can just edit this file to match your needs and run it with:
phenix.autosol  my_autosol.eff
NOTE: the autosol parameters file my_autosol.eff will have just one blank native, derivative, and wavelength. You can cut and paste them to put in as many as you want to have.

Model viewing during model-building with the Coot-PHENIX interface

The AutoSol Wizard allows you to view the current best model that is produced by the automated model-building process. This capability is identical to the view/edit model procedure available in the AutoBuild Wizard. Normally you would use it just to view the model in AutoSol, and to view and edit a model in AutoBuild . The PHENIX-Coot interface is accessible via the command-line. When a model has been produced by the AutoSol Wizard, you can open a new window and type:

phenix.autobuild coot 
which will start Coot with your current map and model. When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. If you want, you can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. In AutoSol, only the main-chain atoms of the model you save are considered, and the side-chains are ignored. Ligands and solvent in the model are ignored as well. As the AutoSol Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX.

Examples

SAD dataset

phenix.autosol w1.sca seq.dat 2 Se lambda=0.9798
The sequence file is used to estimate the solvent content of the crystal and for model-building. The wavelength (lambda) is used to look up values for f_prime and f_double_prime from a table, but if measured values are available from a fluorescence scan, these should be given in addition to the wavelength.

SAD dataset specifying solvent fraction

phenix.autosol w1.sca seq.dat 2 Se lambda=0.9798 \
    solvent_fraction=0.45
This will force the solvent fraction to be 0.45. This illustrates a general feature of the Wizards: they will try to estimate values of parameters, but if you input them directly, they will use your input values.

SAD dataset without model-building

phenix.autosol w1.sca seq.dat 2 Se lambda=0.9798 \
    build=False
This will carry out the usual structure solution, but will skip model-building

SAD dataset, building RNA instead of protein

phenix.autosol w1.sca seq.dat 2 Se lambda=0.9798 \
    chain_type=RNA
This will carry out the usual structure solution, but will build an RNA chain. For DNA, specify chain_type=DNA. You can only build one type of chain at a time in the AutoSol Wizard. To build protein and DNA, use the AutoBuild Wizard and run it first with chain_type=PROTEIN, then run it again specifying the protein model as input_lig_file_list=proteinmodel.pdb and with chain_type=DNA.

SAD dataset, selecting a particular dataset from an MTZ file

If you have an input MTZ file with more than one anomalous dataset, you can type something like:

phenix.autosol w1.mtz seq.dat 2 Se lambda=0.9798 \
labels='F+ SIGF+ F- SIGF-'
This will carry out the usual structure solution, but will choose the input data columns based on the labels: 'F+ SIGF+ F- SIGF-' NOTE: to specify anomalous data with F+ SIGF+ F- SIGF- like this, these 4 columns must be adjacent to each other in the MTZ file with no other columns in between. FURTHER NOTE: to instead use a FAVG SIGFAVG DANO SIGDANO array in AutoSol, the data file or an input refinement file MUST also contain a separate array for FP SIGFP or I SIGI or equivalent. This is because FAVG DANO arrays are ONLY allowed as anomalous information, not as amplitudes or intensities. You can use F+ SIGF+ F- SIGF- arrays as a source of both anomalous differences and amplitudes if you want, however.

If you run the AutoSol Wizard with SAD data and an MTZ file containing more than one anomalous dataset and don't tell it which one to use, all possible values of labels are printed out for you so that you can just paste the one you want in.

You can also find out all the possible label strings to use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model

If you are carrying out SAD phasing with Phaser, you can carry out a combination of molecular replacement phasing and SAD phasing (MRSAD) by adding a single new keyword to your AutoSol run:

input_partpdb_file=MR.pdb
In this case the MR.pdb file will be used as a partial model in a maximum-likelihood SAD phasing calculation with Phaser to calculate phases and identify sites in Phaser, and the combined MR+SAD phases will be written out. NOTE: At the moment the AutoBuild Wizard is not equipped to use these combined phases optimally in iterative model-building, density modification and refinement, because they contain both experimental phase information and model information. It is therefore possible that the resulting phases are biased by your MR model, and that this bias will not go away during iterative model-building because it is continually fed back in.

Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)

You can also combine MR information with SAD phases (see J. P. Schuermann and J. J. Tanner Acta Cryst. (2003). D59, 1731-1736 ) in PHENIX by running the three wizards AutoMR, AutoSol, and AutoBuild one after the other. This method does not use the partial model and the anomalous information in the SAD dataset simultaneously as the above Phaser maximum-likelihood method does. On the other hand, the phases obtained in this method are independent of the model, so that combining them afterwards does not introduce model bias. (It is not yet clear which is the better approach, so you may wish to try both.) Additionally, this approach can be used with any method for phasing. Here is a set of three simple commands to do this: First run AutoMR to find the molecular replacement solution, but don't rebuild it yet:

phenix.automr gene-5.pdb peak.sca copies=1 \
  RMS=1.5 mass=9800 rebuild_after_mr=False
Now your MR solution is in AutoMR_run_1_/MR.1.pdb and phases are in AutoMR_run_1_/MR.1.mtz. Use these phases as input to AutoSol, along with some weak SAD data, still not building any new models:
 phenix.autosol data=peak.sca \
 input_phase_file=AutoMR_run_1_/MR.1.mtz input_phase_labels="F PHIC FOM"   \
seq_file=sequence.dat build=False
note that we have specified the data columns for F PHI and FOM in the input_phase_file. For input_phase_file you must specify all three of these (if you leave out FOM it will set it to zero). AutoSol will write an MTZ file with experimental phases to phaser_xx.mtz where xx depends on how many solutions are considered during the run. The next command for running AutoBuild you will need to edit depending on the value of xx:
 phenix.autobuild data=AutoSol_run_1_/phaser_2.mtz \
  model=AutoMR_run_1_/MR.1.pdb seq_file=sequence.dat rebuild_in_place=False
AutoBuild will now take the phases from your AutoSol run and combine them with model-based information from your AutoMR MR solution, and will carry out iterative density modification, model-building and refinement to rebuild your model. Note that you may wish to set rebuild_in_place=True, depending on how good your MR model is.

SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss

phenix.autosol 11 Pb data=deriv.sca seq_file=seq.dat \
  sites_file=deriv_hyss_consensus_model.pdb lambda=0.95
This will carry out the usual structure solution process, but will read sites from deriv_hyss_consensus_model.pdb, try both hands, and carry on from there. If you know the hand of the substructure, you can fix it with have_hand=True.

MAD dataset

The inputs for a MAD dataset need to specify f_prime and f_double_prime for each wavelength. You can use a parameters file "mad.eff" to input MAD data. You run it with "phenix.autosol mad.eff". Here is an example of a parameters file for a MAD dataset. You can set many additional parameters as well (see the list at the end of this document).

autosol {
  seq_file = seq.dat
  sites = 2
  atom_type = Se
  wavelength {
    data = peak.sca
    lambda = .9798
    f_prime = -8.0
    f_double_prime = 4.5
  }
  wavelength {
    data = inf.sca
    lambda = .9792
    f_prime = -9.0
    f_double_prime = 1.5
  }
}

MAD dataset, selecting particular datasets from an MTZ file

This is similar to the case for running a SAD analysis, selecting particular columns of data from an MTZ file. If you have an input MTZ file with more than one anomalous dataset, you can use a parameters file like the one above for MAD data, but adding information on the labels in the MTZ file that are to be chosen for each wavelength:

autosol {
  seq_file = seq.dat
  sites = 2
  atom_type = Se
  wavelength {
    data = mad.mtz
    lambda = .9798
    f_prime = -8.0
    f_double_prime = 4.5
    labels='peak(+) SIGpeak(+) peak(-) SIGpeak(-)'

  }
  wavelength {
    data = mad.mtz 
    lambda = .9792
    f_prime = -9.0
    f_double_prime = 1.5
    labels='infl(+) SIGinfl(+) infl(-) SIGinfl(-)'
  }
}
This will carry out the usual structure solution, but will choose the input peak data columns based on the label keywords.

As in the SAD case, you can find out all the possible label strings to use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

SIR dataset

The standard inputs for an SIR dataset are the native and derivative, the sequence file, the heavy-atom type, and the number of sites, as well as whether to use anomalous differences (or just isomorphous differences):

phenix.autosol native.data=native.sca deriv.data=deriv.sca \
   atom_type=I sites=2 inano=inano
This will set the heavy-atom type to Iodine, look for 2 sites, and include anomalous differences.

You can also specify many more parameters using a parameters file. This parameters file shows some of them:

autosol {
  seq_file = seq.dat
  native {
    data = native.sca
  }
  deriv {
    data = pt.sca
    lambda = 1.4
    atom_type = Pt
    f_prime = -3.0
    f_double_prime = 3.5
    sites = 3 
  }
}

SAD with more than one anomalously-scattering atom

You can tell the AutoSol wizard to look for more than one anomalously- scattering atom. Specify one atom type (Se) in the usual way. Then specify any additional ones like this if you are running AutoSol from the command line:

mad_ha_add_list="Br Pt"
Optionally, you can add f_prime and f_double_prime values for the additional atom types with commands like
mad_ha_add_f_prime_list=" -7 -10"
mad_ha_add_f_double_prime_list=" 4.2 12"
but the values from table lookup should be fine. Note that there must be the same number of entries in each of these three keyword lists, if given. During phasing Phaser will try to add whichever atom types best fit the scattering from each new site. This option is available for SAD phasing only and only for a single dataset (not with SAD+MIR etc).

MIR dataset

It is easiest to run an MIR dataset using a parameters file such as "mir.eff" which you then run with "phenix.autosol mir.eff". Here is an example parameters file for MIR:

autosol {
  seq_file = seq.dat
  native {
    data = native.sca
  }
  deriv {
    data = pt.sca
    lambda = 1.4
    atom_type = Pt
  }
  deriv {
    data = ki.sca
    lambda = 1.5
    atom_type = I
  }
}

You can enter as many derivatives as you want. If you specify a wavelength and heavy atom type then scattering factors are calculated from a table for that heavy-atom. You can instead enter scattering factors with the keywords "f_prime = -3.0 " "f_double_prime = 5.0" if you want.

SIR + SAD datasets

A combination of SIR and SAD datasets (or of SAD+SAD or MIR+SAD+SAD or any other combination) is easy with a parameters file. You tell the wizard which grouping each wavelength, native, or derivative goes with with a keyword such as "group=1".

autosol {
  seq_file = seq.dat
  native {
    group = 1
    data = native.sca
  }
  deriv {
    group = 1
    data = pt.sca
    lambda = 1.4
    atom_type = Pt
  }
  wavelength {
    group = 2
    data = w1.sca
    lambda = .9798
    atom_type = Se
    f_prime = -7.
    f_double_prime = 4.5
  }
}

The SIR and SAD datasets will be solved separately (but whichever one is solved first will use difference Fourier or anomalous difference Fourier's to locate sites for the other). Then phases will be combined by addition of Hendrickson-Lattman coefficients and the combined phases will be density modified.

Possible Problems

General limitations

Specific limitations and problems

  • The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoSol wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.

  • The keywords "cell" and "sg" have been replaced with "unit_cell" and "space_group" to make the keywords the same as in other phenix applications.

  • The keywords for running MIR and SIR and MAD datasets from parameter files and the command line have been changed to make the inputs more consistent and suitable for a static GUI.

  • The AutoSol Wizard can take a maximum of 6 derivatives for MIR.

  • The AutoSol Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Simple algorithm for a maximum-likelihood SAD function. A..J. McCoy, L.C. Storoni and R.J. Read. Acta Cryst. D60, 1220-1228 (2004)
[pdf]
Substructure search procedures for macromolecular structures. R.W. Grosse-Kunstleve and P.D. Adams. Acta Cryst. D59, 1966-1973 (2003)
[pdf]
MAD phasing: Bayesian estimates of FA T. C. Terwilliger Acta Cryst. D50 , 11-16 (1994)
[pdf]
Decision-making in structure solution using Bayesian estimates of map quality: the PHENIX AutoSol wizard. T. C. Terwilliger, P. D. Adams, R. J. Read, A. J. McCoy, N. W. Moriart y, R. W. Grosse-Kunstleve, P. V. Afonine, P. H. Zwart and L.-W. Hung Acta Cryst. D65, 582-601 (2009)
[pdf]

Additional information

List of all AutoSol keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
autosol
   atom_type= None Anomalously-scattering atom type. This sets the atom_type
              in all derivatives and wavelengths. Normally it is used as a
              shortcut for SAD or SIR cases.
   lambda= None Wavelength (A). This sets the wavelength value in all
           derivatives and wavelengths. Normally it is used as a shortcut for
           SAD or SIR cases.
   f_prime= None F-prime value. This sets the f_prime value in all derivatives
            and wavelengths. Normally it is used as a shortcut for SAD or SIR
            cases.
   f_double_prime= None F-double-prime value. This sets the f_double_prime
                   value in all derivatives and wavelengths. Normally it is
                   used as a shortcut for SAD or SIR cases.
   wavelength_name= peak inf high low remote Optional name of wavelength for
                    SAD data. This sets the name in all wavelengths. Normally
                    it is used as a shortcut for SAD cases.
   sites= None Number of heavy-atom sites. This sets the number of sites in
          all derivatives and wavelengths. Normally it is used as a shortcut
          for SAD or SIR cases.
   sites_file= None PDB or plain-text file with ha sites. This sets the sites
               in all derivatives and wavelengths. Normally it is used as a
               shortcut for SAD or SIR cases.
   seq_file= Auto Text file with 1-letter code of protein sequence NOTES: 1.
             lines starting with > are ignored and separate chains 2. FASTA
             format is fine 3. If there are multiple copies of a chain, just
             enter one copy. 4. If you enter a PDB file for rebuilding and it
             has the sequence you want, then the sequence file is not
             necessary. NOTE: You can also enter the name of a PDB file that
             contains SEQRES records, and the sequence from the SEQRES records
             will be read, written to seq_from_seqres_records.dat, and used as
             your input sequence. If you have a duplex DNA, enter each strand
             as a separate chain.
   quick= None Run everything quickly (Same as thoroughness=quick)
   data= None Shortcut for name of datafile (SAD data only. For SIR use
         "native.data=native.sca" and
         "deriv.data=deriv.sca". For MIR and MAD use a parameters
         file and specify data under "native" and " deriv"
         or for "wavelength") NOTE: For command_line input it is
         easiest if each wavelength of data is in a separate data file with
         obvious data columns. File types that are easy to read include
         Scalepack sca files , CNS hkl files, mtz files with just one
         wavelength of data, or just native or just derivative. In this case
         the Wizard can read your data without further information.  If you
         have a datafile with many columns, you can use the "labels"
         keyword to specify which data columns to read. (It may be easier in
         some cases to use the GUI or to split it with
         phenix.reflection_file_converter first, however.)
   labels= None Shortcut for specification string for data labels (SAD data
           only). Only necessary if the wizard does not automatically choose
           the correct set of data from your file For SIR use
           "native.labels" and "deriv.labels". For MIR and
           MAD use a parameters file and specify labels under
           "native" and " deriv" NOTE: To find out what
           the appropriate strings are, type "phenix.autosol
           display_labels=your-datafile-here.mtz"
   derscale= None shortcut for derivative scale
   crystal_info
      unit_cell= None Enter cell parameter (a b c alpha beta gamma)
      space_group= None Space Group symbol (i.e., C2221 or C 2 2 21)
      solvent_fraction= None Solvent fraction in crystals (0 to 1). This is
                        normally set automatically from the number of NCS
                        copies and the sequence.
      chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
                  protein, DNA, or RNA chains. At present you can only build
                  one of these in a single run. If you have both DNA and
                  protein, build one first, then run AutoBuild again,
                  supplying the prebuilt model in the
                  "input_lig_file_list" and build the other. NOTE:
                  default for this keyword is Auto, which means "carry
                  out normal process to guess this keyword". The process
                  is to look at the sequence file and/or input pdb file to see
                  what the chain type is. If there are more than one type, the
                  type with the larger number of residues is guessed. If you
                  want to force the chain_type, then set it to PROTEIN RNA or
                  DNA.
      resolution= 0 High-resolution limit. Used as resolution limit for
                  density modification and as general default high-resolution
                  limit. If resolution_build or refinement_resolution are set
                  then they override this for model-building or refinement. If
                  overall_resolution is set then data beyond that resolution
                  is ignored completely. Zero means keep everything.
      change_sg= False You can change the space group. In AutoSol the Wizard
                 will use ImportRawData and let you specify the sg and cell.
                 In AutoMR the wizard will give you an entry form to specify
                 them. NOTE: This only applies when reading in new datasets.
                 It does nothing when changed after datasets are read in.
      residues= None Number of amino acid residues in the au (or equivalent)
      sequence= None Plain text containing 1-letter code of protein sequence
                Same as seq_file except the sequence is read directly, not
                from a file. If both are given, seq_file is ignored.
   input_files
      cif_def_file_list= None You can enter any number of CIF definition
                         files. These are normally used to tell phenix.refine
                         about the geometry of a ligand or unusual residue.
                         You usually will use these in combination with
                         "PDB file with metals/ligands" (keyword
                         "input_lig_file_list" ) which allows you to
                         attach the contents of any PDB file you like to your
                         model just before it gets refined. You can use
                         phenix.elbow to generate these if you do not have a
                         CIF file and one is requested by phenix.refine
      group_labels_list= None For command-line and script running of AutoSol,
                         you may wish to use keywords to specify which set of
                         data columns to be used from an MTZ or other file
                         type with multiple datasets. (From the GUI, it is
                         easy because you are prompted with the column
                         labels). You can do this by specifying a string that
                         identifies which dataset to include. All allowed
                         values of this identification string will be written
                         out any time AutoSol is run on this dataset like
                         this: NOTE: To specify a particular set of data you
                         can specify one of the following (this example is for
                         MAD data, specifying data for peak wavelength): ...:
                         peak.labels='F SIGF DANO SIGDANO' peak.labels='F(+)
                         SIGF(+) F(-) SIGF(-)' You can then use one of the
                         above commands on the command-line to identify the
                         dataset of interest. If you want to use a script
                         instead, you can specify N files in your
                         input_data_file_list, and then specify N values for
                         group_labels_list like this: group_labels_list
                         'F,SIGF,DANO,SIGDANO' 'F(+),SIGF(+),F(-),SIGF(-)'
                         This will take 'F,SIGF,DANO,SIGDANO' as the data for
                         datafile 1 and 'F(+),SIGF(+),F(-),SIGF(-)' for
                         datafile 2 You can identify one dataset from each
                         input file in this way. If you want more than one,
                         then please use phenix.reflection_file_converter to
                         split your input file, or else use the GUI version of
                         AutoSol in which you can select any subset of the
                         data that you wish.
      input_file_list= None Normally not used. Use "data=" or
                       "wavelength.data=" or
                       "native.data=" or "deriv.data="
                       instead.
      input_phase_file= None MTZ data file with FC PHIC or equivalent to use
                        for finding heavy-atom sites with difference Fourier
                        methods.
      input_phase_labels= None Labels for FC and PHIC for data file with FC
                          PHIC or equivalent to use for finding heavy-atom
                          sites with difference Fourier methods.
      input_refinement_file= None Data file to use for refinement. The data in
                             this file should not be corrected for anisotropy.
                             It will be combined with experimental phase
                             information for refinement. If you leave this
                             blank, then the output of phasing will be used in
                             refinement (see below). If no anisotropy
                             correction is applied to the data you do not need
                             to specify a datafile for refinement. If an
                             anisotropy correction is applied to the data
                             files, then you must enter a datafile for
                             refinement if you want to refine your model. (See
                             "remove_aniso" for specifying whether
                             an anisotropy correction is applied. In most
                             cases it is not.) If an anisotropy correction is
                             applied and no refinement datafile is supplied,
                             then no refinement will be carried out in the
                             model-building step. You can choose any of your
                             datafiles to be the refinement file, or a native
                             that is not part of the datasets for structure
                             solution. If there are more than one dataset you
                             will be asked each time for a refinement file,
                             but only the last one will be used. Any standard
                             format is fine; normally only F and sigF will be
                             used. Bijvoet pairs and duplicates will be
                             averaged. If an mtz file is provided then a free
                             R flag can be read in as well. If you do not
                             provide a refinement file then the structure
                             factors from the phasing step will be used in
                             refinement. This is normally satisfactory for SAD
                             data and MIR data. For MAD data you may wish to
                             supply a refinement file because the structure
                             factors from phasing are a combination of data
                             from different wavelengths of data. It is better
                             if you choose your best wavelength of data for
                             refinement.
      input_refinement_labels= None Labels for input refinement file columns
                               (FP SIGFP FreeR_flag)
      input_seq_file= Auto Normally not used. Use instead "seq_file"
      refine_eff_file_list= None You can enter any number of refinement
                            parameter files. These are normally used to tell
                            phenix.refine defaults to apply, as well as
                            creating specialized definitions such as unusual
                            amino acid residues and linkages. These parameters
                            override the normal phenix.refine defaults. They
                            themselves can be overridden by parameters set by
                            the Wizard and by you, controlling the Wizard.
                            NOTE: Any parameters set by AutoBuild directly
                            (such as number_of_macro_cycles, high_resolution,
                            etc...) will not be taken from this parameters
                            file. This is useful only for adding extra
                            parameters not normally set by AutoBuild.
   wavelength Enter a SAD or MAD dataset by filling in information for one or
              more wavelengths. You can cut and paste an entire wavelength
              section and enter as many as you like. If you have multiple
              datasets (i.e., MIR+MAD) then group them using the
              "group" keyword.
      wavelength_name= peak inf high low remote Optionally indicate if this is
                       the peak, inflection point, high energy remote or low
                       energy remote or remote
      data= None Datafile for this wavelength.
      labels= None Specification string for data labels for peak wavelength.
              Only necessary if the wizard does not automatically choose the
              correct set of data from your file To find out what the
              appropriate strings are, type "phenix.autosol
              display_labels=your-datafile-here.mtz"
      atom_type= None Anomalously-scattering atom type. You only need to
                 specify this for one of the wavelengths in MAD datasets.
                 NOTE: if you want Phaser to add additional heavy-atoms of
                 other types, you can specify them with mad_ha_add_list.
      lambda= None wavelength (A). If you supply an atom_type and lambda then
              if you do not supply f_prime and f_double_prime a guess will be
              made for them from a table.
      res_hyss= None resolution for running HYSS for this wavelength/deriv
      res_eval= None resolution for evaluation of solutions for this
                wavelength/deriv
      f_prime= None F-prime value for this wavelength. It is best to supply it
               if you know it.
      f_double_prime= None F-double_prime value for this wavelength. It is
                      best to supply it if you know it.
      sites= None Number of anomalously-scattering sites for this wavelength
             You only need to specify this for one wavelength. If you have
             only MAD data you can also just specify "sites=2"
      sites_file= None PDB or plain-text file with heavy-atom sites. The sites
                  will be taken from this file if supplied
      derscale= None derivative scale factor
      group= 1 Phasing group(s) this wavelength is associated with (Relevant
             in cases where you have 2 MAD datasets or MAD+SAD or MAD+MIR
             etc...)
      added_wavelength= False Used internally to flag if this wavelength was
                        added automatically
      ignore= False Ignore this wavelength of data
   native Enter an MIR or SIR dataset by filling in information for a native
          and one or more derivatives. You can cut and paste these sections
          and enter as many as you like. If you have multiple datasets (i.e.,
          MIR+MAD) then group them using the "group" keyword.
      data= None Datafile for native
      labels= None Specification string for data labels for native. Only
              necessary if the wizard does not automatically choose the
              correct set of data from your file To find out what the
              appropriate strings are, type "phenix.autosol
              display_labels=your-datafile-here.mtz "
      lambda= None wavelength (A) (Not used, for your reference only).
      group= 1 Phasing group(s) this native is associated with (Relevant in
             cases where you have more than one group of native+derivs or you
             have MIR + MAD or SAD)
      added_native= False Used internally to flag if this native was added
                    automatically
      ignore= False Ignore this native data
   deriv Enter an MIR or SIR dataset by filling in information for a native
         and one or more derivatives. You can cut and paste these sections and
         enter as many as you like. If you have multiple datasets (i.e.,
         MIR+MAD) then group them using the "group" keyword.
      data= None Datafile for this derivative
      labels= None Specification string for data labels for deriv. Only
              necessary if the wizard does not automatically choose the
              correct set of data from your file To find out what the
              appropriate strings are, type "phenix.autosol
              display_labels=datafile.mtz "
      atom_type= None Heavy-atom type for deriv .
      sites= None Number of heavy-atom sites for deriv .
      sites_file= None PDB or plain-text file with heavy-atom sites. The sites
                  will be taken from this file if supplied
      res_hyss= None resolution for running HYSS for this wavelength/deriv
      res_eval= None resolution for evaluation of solutions for this
                wavelength/deriv
      inano= noinano *inano anoonly Use anomalous differences for deriv .
             noinano means do not use anomalous differences. inano means use
             anomalous differences and isomorphous differences. anoonly means
             use anomalous differences and not iso differences.
      f_prime= None F-prime value for this derivative.
      f_double_prime= None F-double_prime value for this derivative.
      lambda= None wavelength (A). Used with atom_type to calculate f_prime
              and f_double_prime if they are not supplied
      derscale= None derivative scale factor
      group= 1 Phasing group(s) this derivative is associated with (Relevant
             in cases where you have more than one group of native+derivs or
             you have MIR + MAD or SAD)
      added_deriv= False Used internally to flag if this derivative was added
                   automatically
      ignore= False Ignore this deriv data
   decision_making
      always_include_peak= True Choose True to add PEAK dataset on for HYSS if
                           not automatically chosen
      add_extra_if_fa= True Choose True to try an extra file for HYSS if FA
                       values are used. This may be useful to solve cases
                       where FA values are poor but their sigmas are small. If
                       True then the anomalous differences will be used for
                       HYSS as well.
      create_scoring_table= None Choose whether you want a scoring table for
                            solutions A scoring table is slower but better
      desired_coverage= None Choose what probability you want to have that the
                        correct solution is in your current list of top
                        solutions. A good value is 0.80. If you set a low
                        value (0.01) then only one solution will be kept at
                        any time; if you set a high value, then many solutions
                        will be kept (and it will take longer).
      self_diff_fourier= True Choose whether, in cases where there are
                         multiple derivatives or multiple datasets, you want
                         to use difference Fourier analysis on the same
                         derivative(s) used in phasing (True), or instead
                         (False) only phasing other derivatives
      combine_siblings= True You can specify that in MIR or multiple-dataset
                        solutions the solutions to combine must all be
                        ultimately derived by difference fourier from the same
                        parent. Compare with combine_same_parent_only where
                        any solutions must have the same immediate parent
                        (unless one is a composite solution).
      max_cc_extra_unique_solutions= 0.5 Specify the maximum value of CC
                                     between experimental maps for two
                                     solutions to consider them substantially
                                     different. Solutions that are within the
                                     range for consideration based on
                                     desired_coverage, but are outside of the
                                     number of allowed max_choices, will be
                                     considered, up to
                                     max_extra_unique_solutions, if they have
                                     a correlation of no more than
                                     max_cc_extra_unique_solutions with all
                                     other solutions to be tested.
      max_choices= None Number of choices for solutions to consider. Set
                   automatically with quick: 1 and thorough:3
      max_composite_choices= 8 Number of choices for composite solutions to
                             consider
      max_extra_unique_solutions= None Specify the maximum number of solutions
                                  to consider based on their uniqueness as
                                  well as their high scores. Solutions that
                                  are within the range for consideration based
                                  on desired_coverage, but are outside of the
                                  number of allowed max_choices, will be
                                  considered, up to
                                  max_extra_unique_solutions, if they have a
                                  correlation of no more than
                                  max_cc_extra_unique_solutions with all other
                                  solutions to be tested. Set automatically
                                  with quick:0 ; thorough:2
      max_range_to_keep= 4 The range of solutions to be kept is range_to_keep
                         * SD of the group of solutions. This sets the maximum
                         of range_to_keep
      min_fom= 0.05 Minimum fom of a solution to keep it at all
      low_fom= 0.20 If best FOM is less than low_fom, double range_to_keep
      minimum_merge_cc= 0.25 Minimum ratio of CC of solutions to expected in
                        merge_mir keep at all
      min_fom_for_dm= 0 Minimum fom of a solution to density modify (otherwise
                      just copy over phases). This is useful in cases where
                      the phasing is so weak that density modification does
                      nothing or makes the phases worse.
      fom_for_extreme_dm= 0.35 If FOM of phasing is less up to
                          fom_for_extreme_dm then defaults for density
                          modification become: mask_type=wang wang_radius=20
                          mask_cycles=1 minor_cycles=4
      min_phased_each_deriv= 1 You can require that the wizard phase at least
                             this number of solutions from each derivative,
                             even if they are poor solutions. Usually at least
                             1 is a good idea so that one derivative does not
                             dominate the solutions.
      n_random= 6 Number of random solutions to generate when setting up
                scoring table
      res_eval= 0 Resolution for running resolve evaluation (usually 2.5 A) It
                will be set automatically if you do not set it
      score_individual_offset_list= None Offsets for individual scores in
                                    CC-scoring. Each score will be multiplied
                                    by the score_individual_scale_list value,
                                    then score_individual_offset_list value is
                                    added, to estimate the CC**2 value using
                                    this score by itself. The uncertainty in
                                    the CC**2 value is given by
                                    score_individual_sd_list. NOTE: These
                                    scores are not used in calculation of the
                                    overall score. They are for information
                                    only
      score_individual_scale_list= None Scale factors for individual scores in
                                   CC-scoring. Each score will be multiplied
                                   by the score_individual_scale_list value,
                                   then score_individual_offset_list value is
                                   added, to estimate the CC**2 value using
                                   this score by itself. The uncertainty in
                                   the CC**2 value is given by
                                   score_individual_sd_list. NOTE: These
                                   scores are not used in calculation of the
                                   overall score. They are for information
                                   only
      score_individual_sd_list= None Uncertainties for individual scores in
                                CC-scoring. Each score will be multiplied by
                                the score_individual_scale_list value, then
                                score_individual_offset_list value is added,
                                to estimate the CC**2 value using this score
                                by itself. The uncertainty in the CC**2 value
                                is given by score_individual_sd_list. NOTE:
                                These scores are not used in calculation of
                                the overall score. They are for information
                                only
      score_overall_offset= None Overall offset for scores in CC-scoring. The
                            weighted scores will be summed, then all
                            multiplied by score_overall_scale, then
                            score_overall_offset will be added.
      score_overall_scale= None Overall scale factor for scores in CC-scoring.
                           The weighted scores will be summed, then all
                           multiplied by score_overall_scale, then
                           score_overall_offset will be added.
      score_overall_sd= None Overall SD of CC**2 estimate for scores in
                        CC-scoring. The weighted scores will be summed, then
                        all multiplied by score_overall_scale, then
                        score_overall_offset will be added. This is an
                        estimate of CC**2, with uncertainty about
                        score_overall_sd. Then the square root is taken to
                        estimate CC and SD(CC), where SD(CC) now depends on CC
                        due to the square root.
      score_type_list= SKEW CORR_RMS You can choose what scoring methods to
                       include in scoring of solutions in AutoSol. (The
                       choices available are: CC_DENMOD RFACTOR SKEW
                       NCS_COPIES NCS_IN_GROUP TRUNCATE FLATNESS CORR_RMS
                       REGIONS CONTRAST FOM ) NOTE: If you are using Z-SCORE
                       or BAYES-CC scoring, The default is CC_RMS RFACTOR SKEW
                       FOM (and NCS_OVERLAP if ncs_copies is at least equal to
                       ncs_copies_min_for_overlap.
      score_weight_list= None Weights on scores for CC-scoring. Enter the
                         weight on each score in score_type_list. The weighted
                         scores will be summed, then all multiplied by
                         score_overall_scale, then score_overall_offset will
                         be added.
      skip_score_list= None You can evaluate some scores but not use them.
                       Include the ones you do not want to use in the final
                       score in skip_score_list.
      ncs_copies_min_for_overlap= 2 Minimum number of ncs copies (set
                                  automatically from composition and cell or
                                  with ncs_copies=xx) to use NCS_OVERLAP in
                                  scoring
      rho_overlap_min= 0.3 Sets minimum average overlap of NCS-related density
                       to keep NCS. Cutoff of overlap will be rho_overlap_min
                       for 2 ncs copies, and proportionally smaller
                       (rho_overlap_min*2/N) for N ncs copies.
      rho_overlap_min_scoring= 0.5 Once NCS is found, rho_overlap_min_scoring
                               sets threshold for whether the NCS is used in
                               scoring. Cutoff of overlap will be
                               rho_overlap_min_scoring for 2 ncs copies, and
                               proportionally smaller
                               (rho_overlap_min_scoring*2/N) for N ncs copies.
                               (Compare with rho_overlap_min, which sets
                               cutoff for finding NCS, not scoring with it)
      hyss_scoring
         ha_iteration= None Choose whether you want to iterate the heavy-atom
                       search. With iteration, sites are found with HYSS, then
                       used to phase and carry out quick density-modification,
                       then difference Fourier is used to find sites again and
                       improve their accuracy. Default is to not use
                       ha_iteration except in multi-dataset or MIR analyses
         max_ha_iterations= None Number of iterations of difference Fouriers
                            in searching for heavy-atom sites. Default is to
                            set this based on data_quality. Iteration is not
                            used by default if quick is True.
         minimum_improvement= 0 Minimum improvement in score to continue ha
                              iteration
      build_scoring
         overall_score_method= *BAYES-CC Z-SCORE You have 2 choices for an
                               overall scoring method: (1) Sum of individual
                               Z-scores (Z-SCORE) (2) Bayesian estimate of CC
                               of map to perfect model (BAYES-CC) You can
                               specify which scoring criteria to include with
                               score_type_list (default is SKEW CORR_RMS for
                               BAYES-CC and CC RFACTOR SKEW FOM for Z-SCORE.
                               Additionally, if NCS is present, NCS_OVERLAP is
                               used by default in the Z-SCORE method).
         r_switch= 0.4 R-value criteria for deciding whether to use R-value or
                   residues built. A good value is 0.40
         acceptable_quality= 40 You can specify the minimum overall quality of
                             a model (as defined by overall_score_method) to
                             be considered acceptable
         acceptable_secondary_structure_cc= 0.35 You can specify the minimum
                                            correlation of density from a
                                            secondary structure model to be
                                            considered acceptable
         trace_chain= True You can build a CA-only model right after density
                      modification using trace_chain
         trace_chain_score= False You can score density-modified maps with the
                            number of residues built with regular
                            secondary-structure using trace_chain.
      dev_scoring
         random_scoring= False For testing purposes you can generate random
                         scores
         use_perfect= False You can use the CC between each solution and
                      hklperfect in scoring. This is only for methods
                      development purposes.
         hklperfect= None You can supply an mtz file with idealized
                     coefficients for a map. This will be compared with all
                     maps calculated during structure solution
         perfect_labels= None Labels for input data columns for hklperfect if
                         present. Typical value: "FP PHIC FOM"
   scaling
      remove_aniso= Auto *True False Choose if you want to apply a correction
                    for anisotropy to the data. True means always apply
                    correction, No means never apply it, Auto means apply it
                    if the data is severely anisotropic (recommended=True). If
                    you set remove_aniso=Auto then if the range of anisotropic
                    B-factors is greater than delta_b_for_auto_remove_aniso
                    and the ratio of the largest to the smallest less than
                    ratio_b_for_auto_remove_aniso then the correction will be
                    applied. Anisotropy correction will be applied to all
                    input data before scaling. If used, the default overall
                    target B factor is is minimum of (max_b_iso, lowest B of
                    datasets, target_b_ratio*resolution)
      b_iso= None Target overall B value for anisotropy correction. Ignored if
             remove_aniso = False. If None, default is minimum of (max_b_iso,
             lowest B of datasets, target_b_ratio*resolution)
      max_b_iso= 40. Default maximum overall B value for anisotropy
                 correction. Ignored if remove_aniso = False. Ignored if b_iso
                 is set. If used, default is minimum of (max_b_iso, lowest B
                 of datasets, target_b_ratio*resolution)
      target_b_ratio= 10. Default ratio of target B value to resolution for
                      anisotropy correction. Ignored if remove_aniso = False.
                      Ignored if b_iso is set. If used, default is minimum of
                      (max_b_iso, lowest B of datasets,
                      target_b_ratio*resolution)
      localscale_before_phaser= True You can apply SOLVE localscaling to SAD
                                data before passing it to Phaser for SAD
                                phasing
      delta_b_for_auto_remove_aniso= 20 Choose what range of aniso B values is
                                     so big that you want to correct for
                                     anisotropy by default. Both ratio_b and
                                     delta_b must be large to correct. See
                                     also ratio_b_for_auto_remove_aniso. See
                                     also "remove_aniso" which
                                     overrides this default if set to
                                     "True"
      ratio_b_for_auto_remove_aniso= 1.0 Choose what ratio aniso B values is
                                     so big that you want to correct for
                                     anisotropy by default. Both ratio_b and
                                     delta_b must be large to correct. see
                                     also delta_b_for_auto_remove_aniso See
                                     also "remove_aniso" which
                                     overrides this default if set to
                                     "True"
      test_remove_aniso= True Choose whether you want to try applying or not
                         applying an anisotropy correction if the run fails.
                         First your original selection for applying or not
                         will be tried, and then the opposite will be tried if
                         the run fails.
      use_sca_as_is= True Choose True to allow use of sca files (and mtz
                     files) without conversion even if the space group is
                     changed. If False, then original index files will always
                     be converted to premerged if the space group is changed
      derscale_list= None List of deriv scale factors. Not normally used. Use
                     derscale for deriv or wavelength.
   heavy_atom_search
      min_hyss_cc= 0.05 Minimum CC of a heavy-atom solution in HYSS to keep it
                   at all
      acceptable_cc_hyss= 0.2 Solutions with CC better than acceptable_cc_hyss
                          will not be rescored.
      good_cc_hyss= 0.3 Hyss will be run up to best_of_n_hyss_always times at
                    a given resolution. If the best CC value is greater than
                    good_cc_hyss and the number of sites found is at least
                    min_fraction_of_sites_found times the number expected and
                    Hyss was tried at least best_of_n_hyss times, then the
                    search is ended. Also if thoroughness=quick and a solution
                    with CC at least as high as good_cc_hyss is found, no more
                    searches will be done at all
      n_add_res_max= 2 Hyss will be run at up to n_add_res_max+1 resolutions
                     starting with res_hyss and adding increments of
                     add_res_max/n_add_res_max. If the best CC value is
                     greater than good_cc_hyss then no more resolutions are
                     tried.
      add_res_max= 2 Hyss will be run at up to n_add_res_max+1 resolutions
                   starting with res_hyss and adding increments of
                   add_res_max/n_add_res_max. If the best CC value is greater
                   than good_cc_hyss then no more resolutions are tried.
      try_recommended_resolution_for_hyss= True If yes, then hyss will be run
                                           at recommended_resolution based on
                                           anomalous signal in addition to
                                           default resolution if CC at default
                                           resolution is less than
                                           good_cc_hyss and
                                           recommended_resolution is more than
                                           0.1 A less than default
      hyss_runs_min= 2 If there are multiple derivatives or candidate
                     wavelengths for HYSS, run at least hyss_runs_min of
                     these.
      best_of_n_hyss= 1 Hyss will be run up to best_of_n_hyss_always times at
                      a given resolution. If the best CC value is greater than
                      good_cc_hyss and the number of sites found is at least
                      min_fraction_of_sites_found times the number expected
                      and Hyss was tried at least best_of_n_hyss times, then
                      the search is ended if hyss_runs_min data files have
                      been attempted.
      best_of_n_hyss_always= 10 Hyss will be run up to best_of_n_hyss_always
                             times at a given resolution. If the best CC value
                             is greater than good_cc_hyss and the number of
                             sites found is at least
                             min_fraction_of_sites_found times the number
                             expected and Hyss was tried at least
                             best_of_n_hyss times, then the search is ended if
                             hyss_runs_min data files have been attempted.
      min_fraction_of_sites_found= 0.667 Hyss will be run up to
                                   best_of_n_hyss_always times at a given
                                   resolution. If the best CC value is greater
                                   than good_cc_hyss and the number of sites
                                   found is at least
                                   min_fraction_of_sites_found times the
                                   number expected and Hyss was tried at least
                                   best_of_n_hyss times, then the search is
                                   ended if hyss_runs_min data files have been
                                   attempted.
      max_single_sites= 5 In sites_from_denmod a core set of sites that are
                        strong is identified. If the hand of the solution is
                        known then additional sites are added all at once up
                        to the expected number of sites. Otherwise sites are
                        added one at a time, up to a maximum number of tries
                        of max_single_sites
      hyss_enable_early_termination= True You can specify whether to stop HYSS
                                     as soon as it finds a convincing solution
                                     (True, default) or to keep trying...
      hyss_general_positions_only= True Select True if you want HYSS only to
                                   consider general positions and ignore sites
                                   on special positions. This is appropriate
                                   for SeMet or S-Met solutions, not so
                                   appropriate for heavy-atom soaks
      hyss_min_distance= 3.5 Enter the minimum distance between heavy-atom
                         sites to keep them in HYSS
      hyss_n_fragments= 3 Enter the number of fragments in HYSS
      hyss_n_patterson_vectors= 33 Enter the number of Patterson vectors to
                                consider in HYSS
      hyss_random_seed= 792341 Enter an integer as random seed for HYSS
      res_hyss= None Overall resolution for running HYSS (usually default is
                fine)
      use_measurability= True Use measurability (from xtriage) to estimate
                         recommended resolution for HYSS and for initial
                         phasing. Only applies to MAD/SAD phasing. Alternative
                         is to use signal-to-noise from Solve scaling.
      use_phaser_rescoring= False Run phaser rescoring for HYSS heavy-atom
                            search (only SAD data) if initial try fails
      mad_ha_n= None Normally not used. Use instead "sites" for a
                wavelength. Number of anomalously-scattering atoms in the au
      mad_ha_type= "Se" Normally not used. Use instead "atom_type"
                   for a wavelength. Anomalously-scattering or heavy atom
                   type. For" example, Se or Au. NOTE: if you want Phaser to
                   add additional heavy-atoms of other types, you can specify
                   them with mad_ha_add_list.
   phasing
      do_madbst= True Choose whether you want to carry out FA calculation
                 Skipping it speeds up MAD phasing but may reduce the ability
                 to find the sites with HYSS
      overallscale= False You can choose to have only an overall scale factor
                    for this dataset (no local scaling applied). Use this if
                    your data is already fully scaled.
      res_phase= 0 Enter the high-resolution limit for phasing (0= use all)
      phase_full_resolution= True You can choose to use the full resolution of
                             the data in phasing, instead of using the
                             recommended_resolution. This is always a good
                             idea with Phaser phases.
      fixscattfactors= None For SOLVE phasing and MAD data you can choose
                       whether scattering factors are to be fixed by choosing
                       True to fix them or False to refine them. Normally
                       choose True (fix) if the data are weak and False
                       (refine) if the data are strong.
      fixscattfactors_in_phasing= False Fix scattering factors in phasing
                                  step. For SOLVE phasing and MAD data you can
                                  choose whether scattering factors are to be
                                  fixed by choosing True to fix them or False
                                  to refine them. Normally False. This command
                                  only applies to the phasing step and not
                                  initial heavy-atom refinement. It does not
                                  apply to Phaser SAD phasing.
      fix_xyz_in_phasing= None Fix coordinates in phasing step. For SOLVE
                          phasing and MAD data you can choose whether ha
                          coordinates are to be fixed by choosing True to fix
                          them or False to refine them. May be useful in
                          maintaining the coordinates of the solutions that
                          were tested in initial phasing steps. If None, then
                          it will be set to True if the resolution of final
                          phasing step is higher than the highest resolution
                          of test phasing runs This command only applies to
                          the phasing step and not initial heavy-atom
                          refinement. It does not apply to Phaser SAD phasing
      have_hand= False Normally you will not know the hand of the heavy-atom
                 substructure, so have_hand=False. However if you do know it
                 (you got the sites from a difference Fourier or you know the
                 answer another way) you can specify that the hand is known.
      id_scale_ref= None By default the datafile with the highest resolution
                    is used for the first step in scaling of MAD data. You can
                    choose to use any of the datafiles in your MAD dataset.
                    NOTE: not applicable for multi-dataset analyses
      ratio_out= 10. You can choose the ratio of del ano or del iso to the rms
                 in the shell for rejection of a reflection. Default = 10.
      ratmin= 0. Reflections with I/sigI less than ratmin will be ignored when
              read in.
      require_nat= True Choose yes to skip any reflection with no native (for
                   SIR) or no data (MAD/SAD) or where anom difference is very
                   large. This keyword (default=True) allows the routines in
                   SOLVE to remove reflections with an implausibly large
                   anomalous difference (greater than ratio_out times the rms
                   anomalous difference).
      ikeepflag= 1 You can choose to keep all reflections in merging steps.
                 This is separate from rejecting reflections with high iso or
                 ano diffs. Default=1 (keep them)
      phasing_method= SOLVE *PHASER You can choose to phase with SOLVE or with
                      Phaser. (Only applies to SAD phasing at present)
      input_partpdb_file= None You can enter a PDB file (usually from
                          molecular replacement) for use in identifying
                          heavy-atom sites and phasing. NOTE 1: This procedure
                          works best if the model is refined. NOTE 2: This
                          file is only used in SAD phasing with Phaser on a
                          single dataset. In all other cases it is ignored.
                          NOTE 3: The output phases in phaser_xx.mtz will
                          contain both SAD and model information. They are not
                          completely suitable for use with AutoBuild or other
                          iterative model-building procedures because the
                          phases are not entirely experimental (but they may
                          work).
      partpdb_rms= 1
      llgc_sigma= None
      phaser_completion= True You can choose to use phaser log-likelihood
                         gradients to complete your heavy-atom sites. This can
                         be used with or without the ha_iteration option.
      use_phaser_hklstart= True You can choose to start density modification
                           with FWT PHWT from Phaser (Only applies to SAD
                           phasing at present)
      combine_same_parent_only= False You can choose to only combine solutions
                                with the same parent (and that have a parent)
                                in MIR, unless one solution is a composite.
                                Compare with combine_siblings in which case
                                the solutions do not have to have the same
                                immediate parents, but can be derived from the
                                same ultimate parent through several
                                difference fourier steps.
      skip_extra_phasing= *Auto True False You can choose to skip an extra
                          phasing step to speed up the process. If the extra
                          step is used then the evaluation of solutions is
                          done with data to res_eval (2.5 A) and then all the
                          data are used in an extra phasing step. Only
                          applicable to Phaser SAD phasing.
      read_sites= False Choose if you want to enter ha sites from a file The
                  name of the file will be requested after scaling is
                  finished. The file can have sites in fractional coordinates
                  or be a PDB file. Normally you do not need to set this. Set
                  automatically if you specify a sites_file
      f_double_prime_list= None f-double-prime for the heavy-atom for this
                           dataset Normally not used. Use f_double_prime for
                           wavelength or deriv
      f_prime_list= None f-prime for the heavy-atom for this dataset Normally
                    not used. Use f_prime for wavelength or deriv
      mad_ha_add_f_double_prime_list= None F-double_prime values of additional
                                      heavy-atom types. You must specify the
                                      same number of entries of
                                      mad_ha_add_f_double_prime_list as you do
                                      for mad_ha_add_f_prime_list and for
                                      mad_ha_add_list. Only use for Phaser SAD
                                      phasing with a single dataset
      mad_ha_add_f_prime_list= None F-prime values of additional heavy-atom
                               types. You must specify the same number of
                               entries of mad_ha_add_f_prime_list as you do
                               for mad_ha_add_f_double_prime_list and for
                               mad_ha_add_list. Only use for Phaser SAD
                               phasing with a single dataset
      mad_ha_add_list= None You can specify heavy atom types in addition to
                       the one you named in mad_ha_type. The heavy-atoms found
                       in initial HySS searches will be given the type of
                       mad_ha_type, and Phaser (if used for phasing) will try
                       to find additional heavy atoms of both the type
                       mad_ha_type and any listed in mad_ha_add_list. You must
                       also specify the same number of mad_ha_add_f_prime_list
                       entries and of mad_ha_add_f_double_prime_list entries.
                       Only use for Phaser SAD phasing with a single dataset
      n_ha_list= None Enter a guess of number of HA sites Normally not used.
                 Use sites in deriv instead
      nat_der_list= None Enter Native or a heavy-atom symbol (Pt, Se) Normally
                    not used. Use atom_type in deriv instead
   density_modification
      fix_xyz= False You can choose to not refine coordinates, and instead to
               fix them to the values found by the heavy-atom search.
      fix_xyz_after_denmod= None When sites are found after density
                            modification you can choose whether you want to
                            fix the coordinates to the values found in that
                            map.
      hl_in_resolve= False AutoSol normally does not write out HL coefficients
                     in the resolve.mtz file with density-modified phases. You
                     can turn them on with hl_in_resolve=True
      mask_type= *histograms probability wang classic Choose method for
                 obtaining probability that a point is in the protein vs
                 solvent region. Default is "histograms". If you
                 have a SAD dataset with a heavy atom such as Pt or Au then
                 you may wish to choose "wang" because the histogram
                 method is sensitive to very high peaks. Options are:
                 histograms: compare local rms of map and local skew of map to
                 values from a model map and estimate probabilities. This one
                 is usually the best. probability: compare local rms of map to
                 distribution for all points in this map and estimate
                 probabilities. In a few cases this one is much better than
                 histograms. wang: take points with highest local rms and
                 define as protein. Classic runs classical density
                 modification with solvent flipping.
      test_mask_type= None You can choose to have AutoSol test histograms/wang
                      methods for identifying solvent region and statistical
                      vs classical density modification based on the final
                      density modification r-factor.
      mask_cycles= 5 Number of mask cycles in density modification (5 is usual
                   for thorough density modification
      minor_cycles= 10 Number of minor cycles in density modification for each
                    mask cycle (10 is usual for thorough density modification)
      thorough_denmod= None Choose whether you want to go for density
                       modification (usual) or quick (speeds it up and for a
                       terrible map is sometimes better)
      truncate_ha_sites_in_resolve= Auto *True False You can choose to
                                    truncate the density near heavy-atom sites
                                    at a maximum of 2.5 sigma. This is useful
                                    in cases where the heavy-atom sites are
                                    very strong, and rarely hurts in cases
                                    where they are not. The heavy-atom sites
                                    are specified with
                                    "input_ha_file" and radius is
                                    rad_mask
      rad_mask= None You can define the radius for calculation of the protein
                mask Applies only to truncate_ha_sites_in_resolve. Default is
                resolution of data.
      use_ncs_in_denmod= True This script normally uses available ncs
                         information in density modification. Say No to skip
                         this. See also find_ncs
      mask_as_mtz= False Defines how omit_output_mask_file
                   ncs_output_mask_file and protein_output_mask_file are
                   written out. If mask_as_mtz=False it will be a ccp4 map. If
                   mask_as_mtz=True it will be an mtz file with map
                   coefficients FP PHIM FOMM (all three required)
      protein_output_mask_file= None Name of map to be written out
                                representing your protein (non-solvent)
                                region. If mask_as_mtz=False the map will be a
                                ccp4 map. If mask_as_mtz=True it will be an
                                mtz file with map coefficients FP PHIM FOMM
                                (all three required)
      ncs_output_mask_file= None Name of map to be written out representing
                            your ncs asymmetric unit. If mask_as_mtz=False the
                            map will be a ccp4 map. If mask_as_mtz=True it
                            will be an mtz file with map coefficients FP PHIM
                            FOMM (all three required)
      omit_output_mask_file= None Name of map to be written out representing
                             your omit region. If mask_as_mtz=False the map
                             will be a ccp4 map. If mask_as_mtz=True it will
                             be an mtz file with map coefficients FP PHIM FOMM
                             (all three required)
      use_hl_anom_in_denmod= None Default is False (use HL coefficients in
                             density modification) Allows you to specify that
                             HL coefficients including only the phase
                             information from the imaginary (anomalous
                             difference) contribution from the anomalous
                             scatterers are to be used in density
                             modification. Two sets of HL coefficients are
                             produced by Phaser. HLA HLB etc are HL
                             coefficients including the contribution of both
                             the real scattering and the anomalous
                             differences. HLanomA HLanomB etc are HL
                             coefficients including the contribution of the
                             anomalous differences alone. These HL
                             coefficients for anomalous differences alone are
                             the ones that you will want to use in cases where
                             you are bringing in model information that
                             includes the real scattering from the model used
                             in Phaser, such as when you are carrying out
                             density modification with a model or refinement
                             of a model If use_hl_anom_in_denmod=True then the
                             HLanom HL coefficients from Phaser are used in
                             density modification
      use_hl_anom_in_denmod_with_model= None Default is True if
                                        input_partpdb_file is included. (See
                                        also use_hl_anom_in_denmod) If
                                        use_hl_anom_in_denmod=True then the
                                        HLanom HL coefficients from Phaser are
                                        used in density modification with a
                                        model
   model_building
      build= True Build model after density modification?
      phase_improve_and_build= True Carry out cycles of phase improvement with
                               quick model-building followed by a full
                               model-building step NOTE: This is now the
                               standard model-building approach for AutoSol
      sort_hetatms= False Waters are automatically named with the chain of the
                    closest macromolecule if you set sort_hetatms=True This is
                    for the final model only.
      map_to_object= None you can supply a target position for your model with
                     map_to_object=my_target.pdb. Then at the very end your
                     molecule will be placed as close to this as possible. The
                     center of mass of the autobuild model will be
                     superimposed on the center of mass of my_target.pdb using
                     space group symmetry, taking any match closer than 15 A
                     within 3 unit cells of the original position. The new
                     file will be overall_best_mapped.pdb
      build_type= *RESOLVE RESOLVE_AND_BUCCANEER You can choose to build
                  models with RESOLVE or with RESOLVE and BUCCANEER #and
                  TEXTAL and how many different models to build with RESOLVE.
                  The more you build, the more likely to get a complete model.
                  Note that rebuild_in_place can only be carried out with
                  RESOLVE model-building. For BUCCANEER model building you
                  need CCP4 version 6.1.2 or higher and BUCCANEER version
                  1.3.0 or higher
      resolve Parameters specific for RESOLVE model-building
         n_cycle_build= None Choose number of cycles (3).
         refine= True This script normally refines the model during building.
                 Say False to skip refinement
         ncycle_refine= 3 Choose number of refinement cycles (3)
         number_of_builds= None Number of different solutions to build models
                           for
         number_of_models= None This parameter lets you choose how many
                           initial models to build with RESOLVE within a
                           single build cycle.
         resolution_build= 0 Enter the high-resolution limit for
                           model-building. If 0.0, the value of resolution is
                           used as a default.
         helices_strands_only= False You can choose to use a quick
                               model-building method that only builds
                               secondary structure. At low resolution this may
                               be both quicker and more accurate than trying
                               to build the entire structure If you are
                               running the AutoSol Wizard, normally you should
                               choose 'False' as standard building is quick.
                               When your structure is solved by AutoSol, go on
                               to AutoBuild and build a more complete model
                               (still using helices_strands_only=False). NOTE:
                               helices_strands_only does not apply in AutoSol
                               if phase_improve_and_build=True
         helices_strands_start= False You can choose to use a quick
                                model-building method that builds secondary
                                structure as a way to get started...then model
                                completion is done as usual. (Contrast with
                                helices_strands_only which only does secondary
                                structure)
         cc_helix_min= None Minimum CC of helical density to map at low
                       resolution when using helices_strands_only
         cc_strand_min= None Minimum CC of strand density to map when using
                        helices_strands_only
         loop_lib= False Use loop library to fit loops Only applicable for
                   chain_type=PROTEIN
         standard_loops= True Use standard loop fitting
         trace_loops= False Use loop tracing to fit loops Only applicable for
                      chain_type=PROTEIN
         refine_trace_loops= True Refine loops (real-space) after trace_loops
         density_of_points= None Packing density of points to consider as as
                            possible CA atoms in trace_loops. Try 1.0 for a
                            quick run, up to 5 for much more thorough run If
                            None, try value depending on value of quick.
         max_density_of_points= None Maximum packing density of points to
                                consider as as possible CA atoms in
                                trace_loops.
         cutout_model_radius= None Radius to cut out density for trace_loops
                              If None, guess based on length of loop
         max_cutout_model_radius= 20. Maximum value of cutout_model_radius to
                                  try
         padding= 1. Padding for cut out density in trace_loops
         max_span= 30 Maximum length of a gap to try to fill
         max_overlap= None Maximum number of residues from ends to start with.
                      (1=use existing ends, 2=one in from ends etc) If None,
                      set based on value of quick.
         min_overlap= None Minimum number of residues from ends to start with.
                      (1=use existing ends, 2=one in from ends etc)
         fit_loops= True You can fit loops automatically if sequence alignment
                    has been done.
         loop_cc_min= 0.4 You can specify the minimum correlation of density
                      from a loop with the map.
         group_ca_length= 4 In resolve building you can specify how short a
                          fragment to keep. Normally 4 or 5 residues should be
                          the minimum.
         group_length= 2 In resolve building you can specify how many
                       fragments must be joined to make a connected group that
                       is kept. Normally 2 fragments should be the minimum.
         input_compare_file= None If you are rebuilding a model or already
                             think you know what the model should be, you can
                             include a comparison file in rebuilding. The
                             model is not used for anything except to write
                             out information on coordinate differences in the
                             output log files. NOTE: this feature does not
                             always work correctly.
         n_random_frag= 0 In resolve building you can randomize each fragment
                        slightly so as to generate more possibilities for
                        tracing based on extending it.
         n_random_loop= 3 Number of randomized tries from each end for
                        building loops If 0, then one try. If N, then N
                        additional tries with randomization based on
                        rms_random_loop.
         offsets_list= 53 7 23 You can specify an offset for the orientation
                       of the helix and strand templates in building. This is
                       used in generating different starting models.
         remove_outlier_segments_z_cut= 3.0 You can remove any segments that
                                        are not assigned to sequence during
                                        model-building if the mean density at
                                        atomic positions are more than
                                        remove_outlier_segments_z_cut sd lower
                                        than the mean for the structure.
         resolve_command_list= None Commands for resolve. One per line in the
                               form: keyword value value can be optional
                               Examples: coarse_grid resolution 200 2.0 hklin
                               test.mtz NOTE: for command-line usage you need
                               to enclose the whole set of commands in double
                               quotes (") and each individual command in
                               single quotes (') like this:
                               resolve_command_list="'no_build'
                               'b_overall 23' "
         solve_command_list= None Commands for solve. One per line in the
                             form: keyword value, where value can be optional
                             Examples: verbose resolution 200 2.0 For
                             specification from command_line enclose each
                             command and value in quotes, and then use a
                             different type of quotes to enclose all of them
                             (same as resolve_command_list)
         rms_random_frag= None Rms random position change added to residues on
                          ends of fragments when extending them If you enter a
                          negative number, defaults will be used.
         rms_random_loop= None Rms random position change added to residues on
                          ends of loops in tries for building loops If you
                          enter a negative number, defaults will be used.
         semet= None You can specify that the dataset that is used for
                refinement is a selenomethionine dataset, and that the model
                should be the SeMet version of the protein, with all SD of MET
                replaced with Se of MSE. By default if your heavy-atom is Se
                then this will be set to True
         use_met_in_align= Auto *True False You can use the heavy-atom
                           positions in input_ha_file as markers for Met SD
                           positions.
         start_chains_list= None You can specify the starting residue number
                            for each of the unique chains in your structure.
                            If you use a sequence file then the unique chains
                            are extracted and the order must match the order
                            of your starting residue numbers. For example, if
                            your sequence file has chains A and B (identical)
                            and chains C and D (identical to each other, but
                            different than A and B) then you can enter 2
                            numbers, the starting residues for chains A and C.
                            NOTE: you need to specify an input sequence file
                            for start_chains_list to be applied.
         thorough_loop_fit= None Try many conformations and accept them even
                            if the fit is not perfect. If you say True the
                            parameters for thorough loop fitting are:
                            n_random_loop=100 rms_random_loop=0.3
                            rho_min_main=0.5 while if you say No those for
                            quick loop fitting are: n_random_loop=20
                            rms_random_loop=0.3 rho_min_main=1.0
         trace_as_lig= False You can specify that in building steps the ends
                       of chains are to be extended using the LigandFit
                       algorithm. This is default for nucleic acid
                       model-building.
         use_any_side= False You can choose to have resolve model-building
                       place the best-fitting side chain at each position,
                       even if the sequence is not matched to the map.
   ncs
      find_ncs= Auto *True False The wizard normally deduces ncs information
                from the NCS in heavy atom sites, and then later from any NCS
                in chains of models that are built during model-building. The
                update is done each cycle in which an improved model is
                obtained. Say No to skip this update.
      ncs_copies= None Number of copies of the molecule in the au (note: only
                  one type of molecule allowed at present)
      ncs_refine_coord_sigma_from_rmsd= False You can choose to use the
                                        current NCS rmsd as the value of the
                                        sigma for NCS restraints. See also
                                        ncs_refine_coord_sigma_from_rmsd_ratio
      ncs_refine_coord_sigma_from_rmsd_ratio= 1 You can choose to multiply the
                                              current NCS rmsd by this value
                                              before using it as the sigma for
                                              NCS restraints See also
                                              ncs_refine_coord_sigma_from_rmsd
      optimize_ncs= True This script normally deduces ncs information from the
                    NCS in chains of models that are built during iterative
                    model-building. Optimize NCS adds a step to try and make
                    the molecule formed by NCS as compact as possible, without
                    losing any point-group symmetry.
      refine_with_ncs= True This script can allow phenix.refine to
                       automatically identify NCS and use it in refinement.
      ncs_in_refinement= *torsion cartesian None Use torsion_angle refinement
                         of NCS. Alternative is cartesian or None (None will
                         use phenix.refine default)
   refinement
      refine_b= True You can choose whether phenix.refine is to refine
                individual atomic displacement parameters (B values)
      refine_se_occ= True You can choose to refine the occupancy of SE atoms
                     in a SEMET structure (default=True). This only applies if
                     semet=true
      skip_clash_guard= True Skip refinement check for atoms that clash
      correct_special_position_tolerance= None Adjust tolerance for special
                                          position check. If 0., then check
                                          for clashes near special positions
                                          is not carried out. This sometimes
                                          allows phenix.refine to continue
                                          even if an atom is near a special
                                          position. If 1., then checks within
                                          1 A of special positions. If None,
                                          then uses phenix.refine default. (1)
      use_mlhl= True This script normally uses information from the input file
                (HLA HLB HLC HLD) in refinement. Say No to only refine on Fobs
      generate_hl_if_missing= False This script normally uses information from
                              the input file (HLA HLB HLC HLD) in refinement.
                              Say No to not generate HL coeffs from input
                              phases.
      place_waters= True You can choose whether phenix.refine automatically
                    places ordered solvent (waters) during the refinement
                    process.
      refinement_resolution= 0 Enter the high-resolution limit for refinement
                             only. This high-resolution limit can be different
                             than the high-resolution limit for other steps.
                             The default ("None" or 0.0) is to use
                             the overall high-resolution limit for this run
                             (as set by resolution)
      ordered_solvent_low_resolution= None You can choose what resolution
                                      cutoff to use fo placing ordered solvent
                                      in phenix.refine. If the resolution of
                                      refinement is greater than this cutoff,
                                      then no ordered solvent will be placed,
                                      even if
                                      refinement.main.ordered_solvent=True.
      link_distance_cutoff= 3 You can specify the maximum bond distance for
                            linking residues in phenix.refine called from the
                            wizards.
      r_free_flags_fraction= 0.1 Maximum fraction of reflections in the free R
                             set. You can choose the maximum fraction of
                             reflections in the free R set and the maximum
                             number of reflections in the free R set. The
                             number of reflections in the free R set will be
                             up the lower of the values defined by these two
                             parameters.
      r_free_flags_max_free= 2000 Maximum number of reflections in the free R
                             set. You can choose the maximum fraction of
                             reflections in the free R set and the maximum
                             number of reflections in the free R set. The
                             number of reflections in the free R set will be
                             up the lower of the values defined by these two
                             parameters.
      r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
                                         can decide whether to include lattice
                                         symmetry (good in general, necessary
                                         if there is twinning).
      r_free_flags_lattice_symmetry_max_delta= 5 You can set the maximum
                                               deviation of distances in the
                                               lattice that are to be
                                               considered the same for
                                               purposes of generating a
                                               lattice-symmetry-unique set of
                                               free R flags.
      allow_overlapping= None Default is None (set automatically, normally
                         False unless S or Se atoms are the
                         anomalously-scattering atoms). You can allow atoms in
                         your ligand files to overlap atoms in your
                         protein/nucleic acid model. This overrides
                         'keep_pdb_atoms' Useful in early stages of
                         model-building and refinement The ligand atoms get
                         the altloc indicator 'L' NOTE: The ligand occupancy
                         will be refined by default if you set
                         allow_overlapping=True (because of the altloc
                         indicator) You can turn this off with
                         fix_ligand_occupancy=True
      fix_ligand_occupancy= None If allow_overlapping=True then ligand
                            occupancies are refined as a group. You can turn
                            this off with fix_ligand_occupancy=true NOTE: has
                            no effect if allow_overlapping=False
      remove_outlier_segments= True You can remove any segments that are not
                               assigned to sequence if their mean B values are
                               more than remove_outlier_segments_z_cut sd
                               higher than the mean for the structure. NOTE:
                               this is done after refinement, so the R/Rfree
                               are no longer applicable; the remarks in the
                               PDB file are removed
      twin_law= None You can specify a twin law for refinement like this:
                twin_law='-h,k,-l'
      use_hl_anom_in_refinement= None Default is True if input_partpdb_file is
                                 used (See also use_hl_anom_in_denmod). If
                                 use_hl_anom_in_refinement=True then the
                                 HLanom HL coefficients from Phaser are used
                                 in refinement
      include_ha_in_refinement= None You can choose to include your heavy-atom
                                sites in the model for refinement. This is a
                                good idea if your structure includes these
                                heavy-atom sites (i.e., for SAD or MAD
                                structures where you are not using a native
                                dataset). Heavy-atom sites that overlap an
                                atom in your model will be ignored. Default is
                                True unless the dataset is SAD/MAD with Se or
                                S
   display
      number_of_solutions_to_display= 1 Number of solutions to put on screen
                                      and to write out
      solution_to_display= 0 Solution number of the solution to display and
                           write out ( use 0 to let the wizard display the top
                           solution)
   general
      data_quality= *moderate strong weak The defaults are set for you
                    depending on the anticipated data quality. You can choose
                    "moderate" if you are unsure.
      thoroughness= *quick thorough You can try to run quickly and see if you
                    can get a solution ("quick") or more thoroughly
                    to get the best possible solution ("thorough").
      nproc= 1 You can specify the number of processors to use (nproc) and the
             number of batches to divide the data into for parallel jobs.
             Normally you will set nproc to the number of processors available
             and leave nbatch alone. If you leave nbatch as None it will be
             set automatically, with a value depending on the Wizard. This is
             recommended. The value of nbatch can affect the results that you
             get, as the jobs are not split into exact replicates, but are
             rather run with different random numbers. If you want to get the
             same results, keep the same value of nbatch.
      nbatch= 1 You can specify the number of processors to use (nproc) and
              the number of batches to divide the data into for parallel jobs.
              Normally you will set nproc to the number of processors
              available and leave nbatch alone. If you leave nbatch as None it
              will be set automatically, with a value depending on the Wizard.
              This is recommended. The value of nbatch can affect the results
              that you get, as the jobs are not split into exact replicates,
              but are rather run with different random numbers. If you want to
              get the same results, keep the same value of nbatch.
      keep_files= overall_best* phaser_*.mtz resolve_*.mtz solve_*.mtz
                  ha_*.pdb List of files that are not to be cleaned up.
                  wildcards permitted
      coot_name= "coot" If your version of coot is called something else, then
                 you can specify that here.
      i_ran_seed= 72432 Random seed (positive integer) for model-building and
                  simulated annealing refinement
      raise_sorry= False You can have any failure end with a Sorry instead of
                   simply printout to the screen
      background= True When you specify nproc=nn, you can run the jobs in
                  background (default if nproc is greater than 1) or
                  foreground (default if nproc=1). If you set run_command=qsub
                  (or otherwise submit to a batch queue), then you should set
                  background=False, so that the batch queue can keep track of
                  your runs. There is no need to use background=True in this
                  case because all the runs go as controlled by your batch
                  system. If you use run_command='sh ' (or similar, sh is
                  default) then normally you will use background=True so that
                  all the jobs run simultaneously.
      max_wait_time= 1.0 You can specify the length of time (seconds) to wait
                     when looking for a file. If you have a cluster where jobs
                     do not start right away you may need a longer time to
                     wait. The symptom of too short a wait time is 'File not
                     found'
      wait_between_submit_time= 1.0 You can specify the length of time
                                (seconds) to wait between each job that is
                                submitted when running sub-processes. This can
                                be helpful on NFS-mounted systems when running
                                with multiple processors to avoid file
                                conflicts. The symptom of too short a
                                wait_between_submit_time is File exists:....
      cache_resolve_libs= True Use caching of resolve libraries to speed up
                          resolve
      resolve_size= 12 Size for solve/resolve
                    ("","_giant",
                    "_huge","_extra_huge" or a number
                    where 12=giant 18=huge
      check_run_command= False You can have the wizard check your run command
                         at startup
      run_command= "sh " When you specify nproc=nn, you can run the
                   subprocesses as jobs in background with sh (default) or
                   submit them to a queue with the command of your choice
                   (i.e., qsub ). If you have a multi-processor machine, use
                   sh. If you have a cluster, use qsub or the equivalent
                   command for your system. NOTE: If you set run_command=qsub
                   (or otherwise submit to a batch queue), then you should set
                   background=False, so that the batch queue can keep track of
                   your runs. There is no need to use background=True in this
                   case because all the runs go as controlled by your batch
                   system. If nproc is greater than 1 and you use
                   run_command='sh '(or similar, sh is default) then normally
                   you will use background=True so that all the jobs run
                   simultaneously.
      queue_commands= None You can add any commands that need to be run for
                      your queueing system. These are written before any other
                      commands in the file that is submitted to your queueing
                      system. For example on a PBS system you might say:
                      queue_commands='#PBS -N mr_rosetta' queue_commands='#PBS
                      -j oe' queue_commands='#PBS -l walltime=03:00:00'
                      queue_commands='#PBS -l nodes=1:ppn=1' NOTE: you can put
                      in the characters '' in any queue_commands line and this
                      will be replaced by a string of characters based on the
                      path to the run directory. The first character and last
                      two characters of each part of the path will be
                      included, separated by '_',up to 15 characters. For
                      example
                      'test_autobuild/WORK_5/AutoBuild_run_1_/TEMP0/RUN_1'
                      would be represented by: 'tld_W_5_A1__TP0_1'
      condor_universe= vanilla The universe for condor is usually vanilla.
                       However you might need to set it to local for your
                       cluster
      add_double_quotes_in_condor= True You might need to turn on or off
                                   double quotes in condor job submission
                                   scripts. These are already default
                                   elsewhere but may interfere with condor
                                   paths.
      condor= None Specifies if the group_run_command is submitting a job to a
              condor cluster. Set by default to True if
              group_run_command=condor_submit, otherwise False. For condor job
              submission mr_rosetta uses a customized script with condor
              commands. Also uses one_subprocess_level=True
      last_process_is_local= True If true, run the last process in a group in
                             background with sh as part of the job that is
                             submitting jobs. This prevents having the job
                             that is submitting jobs sit and wait for all the
                             others while doing nothing
      skip_r_factor= False You can skip R-factor calculation if refinement is
                     not done and maps_only=True
      skip_xtriage= False You can bypass xtriage if you want. This will
                    prevent you from applying anisotropy corrections, however.
      base_path= None You can specify the base path for files (default is
                 current working directory)
      temp_dir= None Define a temporary directory (it must exist)
      clean_up= False At the end of the entire run the TEMP directories will
                be removed if clean_up is True. The default is yes, delete
                these directories. If you want to remove them after your run
                is finished use a command like "phenix.autobuild run=1
                clean_up=True" Files listed in keep_files will not be
                deleted
      print_citations= True Print citations at end of run
      solution_output_pickle_file= None At end of run, write solutions to this
                                   file in output directory if defined
      title= None Enter any text you like to help identify what you did in
             this run
      top_output_dir= None This is used in subprocess calls of wizards and to
                      tell the Wizard where to look for the STOPWIZARD file.
      wizard_directory_number= None This is used by the GUI to define the run
                               number for Wizards. It is the same as
                               desired_run_number NOTE: this value can only be
                               specified on the command line, as the directory
                               number is set before parameters files are read.
      verbose= False Command files and other verbose output will be printed
      extra_verbose= False Facts and possible commands will be printed every
                     cycle if True
      debug= False You can have the wizard stop with error messages about the
             code if you use debug. NOTE: you cannot use Pause with debug.
             Additionally the output goes to the terminal if you specify
             "debug=True"
      require_nonzero= True Require non-zero values in data columns to
                       consider reading in.
      remove_path_word_list= None List of words identifying paths to remove
                             from PATH These can be used to shorten your PATH.
                             For example... cns ccp4 coot would remove all
                             paths containing these words except those also
                             containing phenix. Capitalization is ignored.
      fill= False Fill in all missing reflections to resolution res_fill.
            Applies to density modified maps. See also filled_2fofc_maps in
            autobuild.
      res_fill= None Resolution for filling in missing data (default = highest
                resolution of any datafile). Only applies to density modified
                maps. Default is fill to high resolution of data. Ignored if
                fill=False
      check_only= False Just read in and check initial parameters. Not for
                  general use
   run_control
      coot= None Set coot to True and optionally run=[run-number] to run Coot
            with the current model and map for run run-number. In some wizards
            (AutoBuild) you can edit the model and give it back to PHENIX to
            use as part of the model-building process. If you just say coot
            then the facts for the highest-numbered existing run will be
            shown.
      ignore_blanks= None ignore_blanks allows you to have a command-line
                     keyword with a blank value like
                     "input_lig_file_list="
      stop= None You can stop the current wizard with "stopwizard"
            or "stop". If you type "phenix.autobuild run=3
            stop" then this will stop run 3 of autobuild.
      display_facts= None Set display_facts to True and optionally
                     run=[run-number] to display the facts for run run-number.
                     If you just say display_facts then the facts for the
                     highest-numbered existing run will be shown.
      display_summary= None Set display_summary to True and optionally
                       run=[run-number] to show the summary for run
                       run-number. If you just say display_summary then the
                       summary for the highest-numbered existing run will be
                       shown.
      carry_on= None Set carry_on to True to carry on with highest-numbered
                run from where you left off.
      run= None Set run to n to continue with run n where you left off.
      copy_run= None Set copy_run to n to copy run n to a new run and continue
                where you left off.
      display_runs= None List all runs for this wizard.
      delete_runs= None List runs to delete: 1 2 3-5 9:12
      display_labels= None display_labels=test.mtz will list all the labels
                      that identify data in test.mtz. You can use the label
                      strings that are produced in AutoSol to identify which
                      data to use from a datafile like this:
                      peak.data="F+ SIGF+ F- SIGF-". The entire
                      string in quotes counts here You can use the individual
                      labels from these strings as identifiers for data
                      columns in AutoSol or AutoBuild like this:
                      input_refinement_labels="FP SIGFP FreeR_flags"
                      # each individual label counts
      dry_run= False Just read in and check parameter names
      params_only= False Just read in and return parameter defaults. Not for
                   general use
      display_all= False Just read in and display parameter defaults
   special_keywords
      write_run_directory_to_file= None Writes the full name of a run
                                   directory to the specified file. This can
                                   be used as a call-back to tell a script
                                   where the output is going to go.
   non_user_parameters These are obsolete parameters and parameters that the
                       wizards use to communicate among themselves. Not
                       normally for general use.
      gui_output_dir= None Used only by the GUI
      allow_negative_f_double_prime= False Allow a negative f-double-prime
                                     value
      inano_list= None Choose inano for including anomalous differences and
                  noinano not to include them and anoonly for just anomalous
                  differences (no isomorphous differences) Not normally used.
                  Use inano in deriv instead
      ha_sites_file= None Not normally used. Use sites_file for wavelength or
                     deriv
      expt_type= *Auto mad sir sad Not normally used. Determined automatically
                 from your inputs for wavelength and native/deriv. Experiment
                 type (MAD SIR SAD) NOTE: Please treat MIR experiments as a
                 set of SIR experiments. NOTE: The default for this keyword is
                 Auto which means "carry out normal process to guess this
                 keyword". If you have a single file, then it is assumed
                 to be SAD. If you specify native.data and deriv.data it is
                 SIR, if you specify peak.data and infl.data it is MAD. If the
                 Wizard does not guess correctly, you can set it with this
                 keyword.
      wavelength_list= None Optional wavelength of x-ray data (A) Not normally
                       used. Use wavelength/deriv and lambda instead
      wavelength_name_list= None Names of wavelengths. Not normally used. Use
                            wavelength/deriv and name instead
      sg= None Obsolete. Use space_group instead