[phenixbb] questions related to Phenix refinement

Sun Jan 18 10:30:45 PST 2015

Dear Pavel,

thanks for your thoughtful email! I am not going to try and comment on every specific point you make, mostly because I don't see any fundamental disagreement. People should really go and read the papers you cite; reading mailing lists is not a substitute for that.
Let me just make one remark: crystallography, although it resides on firm grounds, is complex enough that there are no simple rules for everything. It is oversimplification that has done a bad service to our science; just think of the "religious" Rsym cutoffs, in use until not long ago, that has caused people to discard a lot of valuable data. This is why I am against seemingly innocent rules of thumb like "the high-resolution cutoff should be done at X I/sigma or when the completeness falls below Y %" (and no, I'm not implying that you said this). There are better ways, but they are not as simplistic.

best,

Kay

On Sunday, January 18, 2015 18:58 CET, Pavel Afonine <pafonine at lbl.gov> wrote: 

> Dear Kay,
> 
> thanks for email and bringing this topic!
> 
> >>>> In the X-ray statistics by resolution bin of the Phenix.refine result,
> >>>> there is a column "%complete".  For my refinement data, I find the
> >>>> better the resolution (from lower resolution to the higher
> >>>> resolution), the lower the completeness (for example for 40-6 A,
> >>>> %complete is 98, for 3.1-3.0 A, %complete is 60%, for 2.2-2.1 A,
> >>>>   %complete is  6%).
> >>>>
> >>>> Will you please tell me what does this "%complete" mean? why it

> >>>> decreases in the better diffraction bin?
> >> Completeness is how many reflections you have compared to theoretically
> >> possible. So the higher completeness the better. Ideally (and it's not
> >> that uncommon these days) you should have 100% complete data set in
> >> d_min-inf resolution. Anything below say 80 in any resolution bin is
> >> bad, and numbers you quote 6-60% mean something is wrong withe the dataset.
> >>
> > Given your standing in the community, the last sentence will lead many
> > unexperienced people to believe that they should cut their data at the
> > resolution where the completeness falls below "say 80"%.
> >
> > But that would be wrong. There is no reason to consider a completeness
> > as "too low in a high-resolution shell" as long as the data in that
> > shell are good. Particularly in refinement any reflection helps to

> > improve the model, and to reduce overfitting.
> 
> Clearly, email is not the best way of communication, especially if 
> written without a lawyer's help and attempted to read between the lines!
> 
> No, I was not suggesting to cut the data, particularly if cutting is 
> judged by completeness exclusively. What I was really saying is that if 
> the data set is so incomplete then that should be alerting and prompt to 
> review data collection and processing steps (rather than spending months 
> struggling with a poor data set!).
> 
> Also, I think, extremes such as routine data cutoffs by "sigma" or/and 
> resolution (as used to be in the past) and panic fear to throw away a 
> reflection (as the modern trend is) may be counterproductive. Indeed, 
> for example, non-permanent data cutoffs by resolution (or by other 
> criteria, such as derived from Fobs vs Fmodel differences) may be 
> essential for success of refinement and phasing by Molecular Replacement:
> 
>            J. Appl. Cryst. (2008). 41, 491-522
>            Structure refinement: some background theory and practical 
> strategies
>            D. Watkin
> 
>            Acta Cryst. (1999). D55, 1759-1764
>            Detecting outliers in non-redundant diffraction data
>            R. J. Read
> 
>            J. Appl. Cryst. (2009). 42, 607-615
>            Automatic multiple-zone rigid-body refinement with a large 
> convergence radius
>            P. V. Afonine, R. W. Grosse-Kunstleve, A. Urzhumtsev and P. 
> D. Adams
> 
>            STIR option in SHELX.
> 
> Also, incomplete data can distort maps. As few as 1% of missing 
> reflections may be sufficient to destroy molecule image in Fourier maps:
> 
>            Acta Cryst. (1991). A47, 794-801
>            Low-resolution phases: influence on SIR syntheses and 
> retrieval with double-step filtration
>            A. G. Urzhumtsev
> 
>            Acta Cryst. (2014). D70, 2593-2606
>            Metrics for comparison of crystallographic maps
>            A. Urzhumtsev, P. V. Afonine, V. Y. Lunin, T. C. Terwilliger 
> and P. D. Adams
> 
>            Retrieval of lost reflections in high resolution Fourier 

> syntheses by 'soft' solvent flattening.
>            Natalia L. Lunina, Vladimir Y. Lunin and Alberto D. Podjarny
> http://www.ccp4.ac.uk/newsletters/newsletter41/00_contents.html
> 
> Finally, it is a poor idea to assign the data resolution the resolution 
> of the highest resolution reflection unless the data set is 100% 
> complete. Instead, effective resolution (that has strict mathematical 
> definition and meaning) should be used:
> 
>            Acta Cryst. (2013). D69, 1921-1934
>            On effective and optical resolutions of diffraction data sets
>            L. Urzhumtseva, B. Klaholz and A. Urzhumtsev
> 
> Summarizing, a severely incomplete data set should trigger suspicion. If 
> that's the only datset available then correct expectations should be set 
> about (possible difficulty of) structure solution and quality of final 
> model.
> 
> All the best,
> Pavel

-- 
Kay Diederichs                            http://strucbio.biologie.uni-konstanz.de
email: Kay.Diederichs at uni-konstanz.de Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universität Konstanz, Box 647, D-78457 Konstanz