TEXTAL


New! (Sept 2006) A version of TEXTAL customized for molecular replacement

Overview

TEXTAL is a program for automated protein model-building based on pattern recognition techniques. It tries to model (build coordinates for) regions of an electron-density map by searching a database of previously-solved maps to find the most similar regions it has seen before, and takes the coordinates in those regions, transforms them into the new map, and concatenates them. It relies on the extraction of rotation-invariant features that characterize 3D patterns in the density, as well as local density correlation calculations.

The original program took an electron-density map as input, and output a PDB file with a partial model. We have expanded the current version to take a reflection file (.mtz or .hkl) as input, so the user is no longer required to create a map first. In addition, we have developed an automated routine for centering the map on a contiguous molecular region (Findmol).

An important point about resolution: TEXTAL was optimized for building models in density maps generated at 2.8A. Its pattern-recognition routines were trained on databases of electron-density maps at this resolution. If you have higher-resolution data, that is OK; TEXTAL will simply truncate the higher-resolution reflections when generating maps internally for building. TEXTAL can also work on slightly lower (worse) resolution data. It has been demonstrated to work pretty robustly on datasets whose resolution upper limit is between 2.6-3.2A. If you have much higher resolution data, e.g. <2.4A, you might as well use ARP/wARP, which has been shown to do quite well at building very accurate models on a number of high-resolution datasets. However, ARP/wARP does not do so well in medium-resolution ranges, where the data/parameter ratio (number of reflections to coordinates) is lower. We intentionally chose 2.8A as a target resolution for TEXTAL because this represents a very common resolution range (2.5-3.0A) for many MAD datasets collected at synchrotrons. At this resolution, the challenges of automatically interpreting an electron-density map are substantial. There are many ambiguities in the density (cannot see individual atoms), and often a great deal of noise.

TEXTAL has been developed since around 1998 at Texas A&M University through a collaboration between Thomas R. Ioerger (Dept. of Computer Science) and James C. Sacchettini (Dept. of Biochemistry and Biophysics), with funding from the NIH, and contributions from many graduate students and post-docs over the years.

Major Steps

To explain some of our terminology, here are the major steps TEXTAL goes through:

Expectations

Accuracy: Run Time:

Assumptions

We generally read and write electron-density maps in XPLOR format, and models in PDB format. We use TER records to separate chains. The occupancy and B-factor fields are often used for other purposes so be careful. However, in the final model, we try to set them to reasonable values.

TEXTAL is optimized for recognizing patterns in 2.8A maps. It is OK if you have higher-resolution data; TEXTAL will automatically truncate your structure factors when generating maps internally for model-building.

TEXTAL only knows how to build peptide structures. If you have other molecules in the crystal, ranging from solvent molecules to ligands and co-factors to nucleic acids, TEXTAL will try its best to interpret them as protein.

It is relatively important to use density-modified phases. Initial phasing is often not sufficient to produce a map that is clean enough to build a model by pattern recognition. However, in our experience, simple things like solvent-flattening go a long way to producing interpretable density. Keep in mind that TEXTAL builds by pattern-recognition, so if you cannot possibly visually interpret something in the density, then TEXTAL probably cannot either.

Currently, TEXTAL does *not* do iterative improvement of phases by iterating between model-building and phase-combination between experimental phases and phases from partial model. So don't expect to give poorly phased data to TEXTAL and have it output a complete and refined model. This is a future objective. For now, it just builds what it sees, and thus is dependent on the quality of the density. Again, if you cannot see some disordered region in density, TEXTAL probably cannot either.

How to Use TEXTAL

Right now, the main distribution of TEXTAL is through PHENIX (phenix-online.org). It takes a long time to download and install, but it is worth it. Phenix has a great deal of funcionality, including cctbx, Phaser, and Resolve, as well as TEXTAL.

For everything below, you must first source the phenix_env script in the root directory of the PHENIX installation to setup appropriate environment variables.

A note on reflection file handling:

Using TEXTAL Through the Phenix GUI

Under "Tasks" and "Strategies" there is a single entry for model-building with Textal. This task is designed to do multiple things, depending on what data you give it and what options you select. In particular, you can:

When building a complete model, the user the option of running sequence alignment, and doing simulated annealing as a post-processing step.

Input may either be a reflection file (in .mtz or .hkl format), or a pre-generated map supplied by the user for a region he or she wishes to build.

No DISPLAY items. Note that in the current implementation of the Textal task/strategy, there are no objects to be displayed as output (i.e. the "magnifying glass" button is non-functional). All output is written to disk as files with a common prefix (e.g. "textal-...").

Using TEXTAL from the Command Line

The command-line script 'textal.build' is intended to mirror what you can do through the Textal task in the Phenix GUI. It contains many options, as follows:
> textal.build
usage: textal.build [options]
 options:
  --reflections=<filename> (default=None)
     .mtz or .hkl file
  --symmetry=<params or filename> (default=None)
     only needed for .hkl files; unit cell params and space group separated by commas, or CNS .inp file
  --amplitudes=<column name> (default=None)
     e.g. FP; optional, will try to guess from file
  --phases=<column name> (default=None)
     e.g. PHIB; optional, will try to guess from file
  --FOM=<column name> (default=None)
     e.g. FOM; optional, will try to guess from file
  --resolution=<number or range> (default=2.8)
     for truncating SFs to make maps; best to leave set at 2.8, since that is optimal for Textal; can also give range like 2.8-20.0
  --threshold=<number> (default=1.0)
     contour threshold; affects connectivity
  --min_chain_len=<integer> (default=6)
     shorter chains are filtered out of model
  --input_map=<filename> (default=None)
     alternative to giving reflection file
  --input_model=<filename> (default=None)
     for user-defined C-alpha atoms
  --sequence=<filename> (default=None)
     single-letter amino-acid codes
  --se_sites=<filename> (default=None)
     selenium sites, if known
  --copies=<integer> (default=1)
     expected number of NCS symmetry copies in ASU
  --prefix=<string> (default=textal)
     base name for output files
  --capra_only
     build C-alpha chains only, without modeling side-chains (faster)
  --preserve_identities
     force LOOKUP to use amino acid types in user's input model instead of predicting side-chain identities based on density
  --asu
     skip Findmol; build model in map of ASU
  --no_sa
     skip simulated annealing after model-building
  --sa_only
     just run simulated annealing on input model

Help on command-line programs can be accessed by typing "textal.help" on the command line. Also, most programs using the convention that if you try them without any arguments, they will output a usage statement that describes what arguments and options they take.

Examples:

// with no args, prints out help/usage like above, as do all textal programs

textal.build 

// textal can guess the ampl/phase columns because they are unambiguous
// outputs textal-refine.pdb after simulated annealing

textal.build --reflections=if5a.mtz --sequence=if5a.seq

// builds only C-alpha chains
// must supply symmetry info (unit cell params, space group), which are in CNS .inp file

textal.build --reflections=a2u-globulin.hkl --symmetry=a2u-globulin.inp --capra_only

// must give column names here, since there are multiple choices in this mtz
// also note: 2 copies of molecules expected in ASU

textal.build --reflections=czra-ref.mtz --amplitudes=FOBS --phases=PHASE --FOM=FOM --sequence=czra.seq --copies=2

// skips FINDMOL; builds backbone model in ASU without re-centering
// also changes prefix for output files to be "new_run-trace.pdb", etc. instead of "textal-trace.pdb", etc. by default

textal.build --reflections=if5a.mtz --capra_only --asu --prefix=new_run

// builds and refines side-chains for user-supplied backbone

textal.build --reflections=mvk-dm.mtz --amplitudes=FP --phases=PHIDM --FOM=FOMDM --sequence=mvk.seq --input_model=my-backbone-model.pdb

Calling TEXTAL from Python Scripts

Basically, importing the textal.pytex module will give you access to many of the functions. For example, to scale a map, you could do:
from textal.pytex import *
map = emap(file_name="my_map.xplor")
(scaled_map,log) = scale(map)
scaled_map.write("my_map_scaled.xplor")
Note that in the case, as with many of the functions in pytex, a tuple is returned that contains the output object and a log file with some (rather arbitrary and uninterpretable) text.

Help on pytex functions can be accessed by typing "textal.pytex_help" on the command line.

References

The main reference for Textal is: More references may be found here.

Etc.

More documentation may be found at textal.tamu.edu

You may contact us by sending email to: [email protected]