[phenixbb] phenix and weak data
rjr27 at cam.ac.uk
Wed Dec 12 01:00:18 PST 2012
I've been reading this thread for a while, wondering if I should say anything. As Pavel knows, I've been suggesting that phenix.refine should include the experimental errors in the variance term ever since I realised they were being left out. On the other hand, Pavel has been asking (quite reasonably) for some evidence that it makes any difference. I feel that there ought to be circumstances where it does make a difference, but my attempts over the last few days to find a convincing test case have failed. Like you, I tried running Refmac with and without experimental sigmas, in my case on a structure where I know that we pushed the resolution limits (1bos), and I can't see any significant difference in the model or the resulting phases. Gabor Bunkoczi has suggested that it might be more relevant for highly anisotropic structures, so we'll probably look for a good example along those lines.
In principle it should make a difference, but I think there's one point missing from your discussion below. If you leave out the experimental sigmas, then the refinement of sigmaA (or, equivalently, the alpha and beta parameters in phenix.refine) will lead to variances that already incorporate the average effect of experimental measurement error in each resolution shell. So if all reflections were measured to the same precision, it shouldn't matter at all to the likelihood target whether you add them explicitly or absorb them implicitly into the variances. The potential problems come when different reflections are measured with different levels of precision, e.g. because the data collection strategy gave widely varying redundancies for different reflections.
In the statistics you give below, the key statistic is probably the standard deviation of sigf/sqrt(beta), which is actually quite small. So after absorbing the average effect of measurement error into the beta values, the residual variation is even less important to the total variance than you would think from the total value of sigf.
I would still argue that it's relatively easy to incorporate the experimental error into the likelihood variances so it's worth doing even if we haven't found the circumstances where it turns out to matter!
On 12 Dec 2012, at 06:46, Ed Pozharski wrote:
> On Tue, 2012-12-11 at 11:27 -0500, Douglas Theobald wrote:
>> What is the evidence, if any, that the exptl sigmas are actually negligible compared to fit beta (is it alluded to in Lunin 2002)? Is there somewhere in phenix output I can verify this myself?
> Essentially, equation 4 in Lunin (2002) is the same as equation 14 in
> Murshudov (1997) or equation 1 in Cowtan (2005) or 12-79 in Rupp (2010).
> The difference is that instead of combination of sigf^2 and sigma_wc you
> have a single parameter, beta. One can do that assuming that
> sigf<<sqrt(beta). Phenix log files list optimized beta parameter in
> each resolution shell. It does not list sigf though, but trust me - I
> checked and it is indeed true that sqrt(beta)>sigf. I just pulled up a
> random dataset refined with phenix and here is what I see
> min(sigf/sqrt(beta)) = 0.012
> max(sigf/sqrt(beta)) = 0.851
> mean(sigf/sqrt(beta)) = 0.144
> std(sigf/sqrt(beta)) = 0.118
> But there are two problems. First, in the highest resolution shell
> min(sigf/sqrt(beta)) = 0.116
> max(sigf/sqrt(beta)) = 0.851
> mean(sigf/sqrt(beta)) = 0.339
> std(sigf/sqrt(beta)) = 0.110
> This is a bit more troubling. Notice that for acentrics it's 2sigf**2
> +sigma_wc, thus the actual ratio should be increased by sqrt(2), getting
> uncomfortably close to 1/2. Still, given that one adds variances, this
> is at most 25% correction, and this *is* the high resolution shell.
> Second, if one tries to interpret sqrt(beta) as a measure of model error
> in reciprocal space, one runs into trouble. This dataset was refined to
> R~18%. Assuming that sqrt(beta) should roughly predict discrepancy
> between Fo and Fc, it corresponds to R~30%. This suggests that for
> reasons I don't yet quite understand beta overestimates model variance.
> If it is simply doubled, then it becomes comparable to experimental
> error, at least in higher resolution shells.
>> And, in comparison, how does refmac handle the exptl sigmas? Maybe this last question is more appropriate for ccp4bb, but contrasting with phenix would be helpful for me. I know there's a box, checked by default, "Use exptl sigmas to weight Xray terms".
> Refmac fits sigmaA to a certain resolution dependence and then adds
> experimental sigmas (or not as you noticed). I was told that the actual formulation is different from what is described in the original manuscript. But what's important that if one pulls out the sigma_wc as calculated by refmac it has all the same characteristics as sqrt(beta) - it is generally >>sigf and suggests model error in reciprocal space that is incompatible with (too large) observed R-values. Kevin Cowtan's spline approximation implemented in clipper libraries behaves much better, meaning that R-value expectations projected from sigma_wc are much closer to observed R-value.
> Curiously, it does not make much difference in practice, i.e. refined model is not affected as much. For instance, with refmac there are no significant changes whether one uses experimental errors or not. I could think of several reasons for this, but haven't verified any.
> "I'd jump in myself, if I weren't so good at whistling."
> Julian, King of Lemurs
> phenixbb mailing list
> phenixbb at phenix-online.org
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research Tel: + 44 1223 336500
Wellcome Trust/MRC Building Fax: + 44 1223 336827
Hills Road E-mail: rjr27 at cam.ac.uk
Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
More information about the phenixbb