PHENIX Python-based Hierarchical ENvironment for Integrated Xtallography

structure_search

Overview

Structure_search is a tool to quickly identify structural homologs of the input PDB file from the Protein Data Bank. It uses the SARST algorithm, and it's very fast. A typical search time against the whole PDB is usually less than one second. There is an option to allow users to obtain a list of ligands found in structures of those homologs.

Usage

  • obtain a list of PDB ids and chains sorted by similarities to mypdb.pdb

    phenix.structure_search mypdb.pdb

  • obtain a list of homologs of mypdb.pdb and all ligands found in structures of those homologs

    phenix.structure_search mypdb.pdb get_ligand=True

More information can be found in input/Output files sections below:

Input files

required input:

  • pdb_file: the file containing the protein model of interest.

Optional inputs :

  • get_ligand:"=True" if want a list of ligands found in homologous PDBs, Default = False.

  • job_title: current job title

  • output_prefix: prefix for output files if needed.

  • get_pdb:Collect and superpose the top N homologous pdbs (default=10).

  • coot_display: Display superposed pdb files in coot. [default=(False/True) as E-value(>/<)1E-18].

  • sequence_only: Perform Blast sequence search against PDB database using Phenix internal DB. This

    option does not require network connection.

Output files

In addition to screen output, these files contains results of structure_search:
  • output.txt: file containing homologs of 'pdb_file' sorted by scores.

  • MyBlast.log: Standard BLAST output with selected pairwise alignments. NOTE: for structure alignment, the

    'sequences' are structure-based Ramachandran codes (see reference), not 1-letter code for amino acids.

  • pdb_ligand.txt (if get_ligand=True): file containing all ligands found in all homologs from this search.

  • superposed PDB files: Can be found in TEMPPDB_## subdirectory as prompted in the program output.

References

Lo WC, Huang PJ, Chang CH, Lyu PC. BMC Bioinformatics. 2007, 8:307

List of all available keywords

  • structure_search
    • pdb_file = None Enter a PDB file name
    • output_prefix = 'output' Provide an output prefix if needed
    • blastpath = None Enter path to blastall executable
    • sequence_only = False Do a Blast search again PDBaa sequence instead of doing a Ramanchandran-based structure search
    • get_ligand = False Use get_ligand=True to retrive ligands.
    • get_pdb = 10 get_pdb=N will collect and superpose the top N homologous pdbs. Use get_pdb=0 to disable this option.
    • coot_display = False (default) Display output pdb files in coot.
    • RCSB_root = None Enter the top directory of local RCSB mirror. PDBs will be copied from this local RCSB mirror. Note this assumes the directory tree under it follows that in RCSB -- pdb files as 'pdb####.ent.gz' in RCSB_root/data/structures/all/pdb directory
    • local_pdb_dir = None Enter the path directly to your local PDB repository.
    • verbose = False verbose output
    • debug = False debugging output
    • job_title = None Job title in PHENIX GUI, not used on command line