[phenixbb] Questions about phenix.refine with twin_law

Mon Dec 27 11:26:04 PST 2010

  Hi Keitaro,

> Here is the typical distribution of Rfree, Rwork and Rfree-Rwork for structures in PDB refined at 2.5A resolution:
> Are their statistics applied to twinning cases?
> I think such kind of statistics should be (slightly) different from
> normal cases.. not?

you are right: this analysis does not discriminate structures by 
twinning, although I don't see why the R-factor stats should be (much) 
different.

>> Did you use PHENIX to select free-R flags? It is important.
> Yes, I used phenix to select R-free-flags with use_lattice_symmetry=true.

Good.

> Do you have any way to know the refinement is biased or not because of
> wrong R-free-flags selections?

If you used PHENIX to select free-R flags then it is unlikely to be 
wrong. By wrong I mean:
- not taking lattice symmetry into account;
- making distribution of flags not uniform across the resolution range 
such that each relatively thin resolution bin receives "enough" of test 
reflections, etc...

However, the refinement outcome may vary depending on the choice of 
free-R flags anyway, but for the different reasons. This is because of 
refinement artifacts. For example, if you run a hundred of identical 
Simulated Annealing refinement jobs where the only difference between 
each job is the random seed, then you will get an ensemble of somewhat 
(mostly slightly) different structures, and depending on resolution the 
R-factors may range within 0-3% (lower the resolution, higher the spread).
We know that the profile of a function that we optimize in refinement is 
very complex, and the optimizers we use are very simple to thoroughly 
search this profile. So by the end of refinement we never end up in the 
global minimum, but ALWAYS get stuck in a local minimum. Depending on 
initial condition the optimization may take a different pathway and end 
up in a different local minimum. Even plus/minus one reflection may 
trigger this change, or even rounding errors, etc. So, the ensemble of 
models you see after multi-start SA refinement does not necessarily 
reflects what's in the crystal. Yes, among the models in the whole 
ensemble, some some side chains may adopt one or another alternative 
conformations and then this variability of refinement results would be 
reflecting what's in crystal. This is extensively discussed in this paper:

Interpretation of ensembles created by multiple iterative rebuilding of 
macromolecular models. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. 
Afonine, P. D. Adams, N. W. Moriarty, P. H. Zwart, R. J. Read, D. Turk 
and L.-W. Hung Acta Cryst. D63, 597-610 (2007).

Some illustrative discussion is here:
http://www.phenix-online.org/presentations/latest/pavel_validation.pdf

Having said this, it shouldn't be too surprising if you select say 10 
different free-r flag sets, then do thorough refinement (to achieve 
convergence and remove memory from test reflections), and in the end you 
get somewhat different Rwork/Rfree. You can try it to get some 
"confidence range" for the spread of Rwork and Rfree.

You can also do the above experiment with the SA.

However, apart from academic interest / making yourself confident about 
the numbers you get, I don't really see any practical use of these tests.

>> ML is better than LS because ML better account for model errors and incompleteness taking the latter into account statistically.
> Do they come from sigma-A estimation?

See (and references therein):

V.Yu., Lunin & T.P., Skovoroda. Acta Cryst. (1995). A51, 880-887. 
"R-free likelihood-based estimates of errors for phases calculated from 
atomic models"

Pannu, N.S., Murshudov, G.N., Dodson, E.J. & Read, R.J. (1998). Acta 
Cryst. D54, 1285-1294. "Incorporation of Prior Phase Information 
Strengthens Maximum-Likelihood Structure Refinement"

V.Y., Lunin, P.V. Afonine & A.G., Urzhumtsev. Acta Cryst. (2002). A58, 
270-282. "Likelihood-based refinement. I. Irremovable model errors"

A.G. Urzhumtsev, T.P. Skovoroda & V.Y. Lunin. J. Appl. Cryst. (1996). 
29, 741-744. "A procedure compatible with X-PLOR for the calculation of 
electron-density maps weighted using an R-free-likelihood approach"

R. J. Read. Acta Cryst. (1986). A42, 140-149. "Improved Fourier 
coefficients for maps using phases from partial structures with errors"

>> I do not know what's implemented in Refmac - I'm not aware of a
>> corresponding publication.
> FYI, I think No. 13 of this slide describes the likelihood function in
> case of twin..
> http://www.ysbl.york.ac.uk/refmac/Presentations/refmac_Osaka.pdf

I see a lot of handwaiving and jiggling the magic words "maximum" and 
"likelihood", but I don't see any details about underlying statistical 
model, approximation assumptions (if any), derivation and mathematical 
analysis of the new function behavior, etc... I know all this is beyond 
the scope of conference slides, so this is why I said above "I'm not 
aware of a corresponding publication" meaning a proper peer-reviewed 
publication where all these important details are explained.

>> Typically, when people send us the "reproducer" (all inputs that are enough
>> to reproduce the problem) then we can work much more efficiently, otherwise
>> it takes a lot of emails before one can start having a clue about the
>> problem.
> I fully understand it, but I'm sorry I couldn't..

No problems.
Please let us know if we can be of any help.

All the best,
Pavel.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20101227/c4881e66/attachment-0002.htm>