[phenixbb] automating setting up parallel refinement jobs

Thu Feb 17 15:42:04 PST 2011

On Thu, Feb 17, 2011 at 12:02 PM, Kendall Nettles <knettles at scripps.edu> wrote:
> So here is my feature request: I would love to have a GUI interface to generate the folders and parameter files, where you would select a list of common parameters, and a list of parameters to populate 1/ job, instead of having to set up each one as a separate job.
>
>  I think this would really speed things up for your industrial users. I'm working on 20 structures of the same protein with different ligands, and expect to spend maybe 8 hours generating TLS groups and editing the 240 parameter files. A GUI interface would make it 10 or 20 minutes!

I did something like this for Phaser as a proof-of-concept for simple
parallelization of tasks:

http://cci.lbl.gov/~nat/img/phenix/phaser_mp_config.png
http://cci.lbl.gov/~nat/img/phenix/phaser_mp_results.png

It runs all search models in parallel, and can sample multiple
expected RMSDs too.  The calculations can be parallelized over
multiple cores (I never tried more than 12, I think, but there's no
limit that I'm aware of) or across a cluster.  It only uses one
dataset with many models, but I could have just as easily done the
reverse, or both model and data parallel.  This isn't a very
sophisticated program (it was maybe 2 days effort), but eventually
we'll have a new MR frontend that does something similar, with lots
more pre-processing of search models.

So, from a technical standpoint, it's fairly easy to set up, and
distributing the jobs and displaying results is relatively easy.  The
main reason I haven't done anything like this yet is that it isn't
obvious to me which parameters need to be sampled and which would be
in common.  (Also, I'm already at the limit of my multitasking
ability.)  I like the idea of making the user choose; since almost all
of the controls in the GUI can be generated automatically, a dynamic
interface is not difficult to set up.  There is a separate problem of
how to group inputs, but this may not be as hard as I'm imagining.
(From my perspective, there is yet another issue with how to organize
and save results - should those 20 structures be one project or 20,
etc.)

All that said, I think the immediate problem is actually not too bad -
phenix.refine will take as many parameter files as you want, so for
TLS, for example, you just need to make one file that looks like this
(for example):

refinement.refine.adp {
  tls = "chain A"
  tls = "chain B"
}

... and call it "tls.eff", then run "phenix.refine model1.pdb
data1.mtz tls.eff other_params.eff", and so on with each dataset.  You
can have additional parameter files for other settings that you want
to vary.  It doesn't solve the organizational problem, however, nor
does it display the results conveniently, but at least it's less time
spent in a text editor.

-Nat