[phenixbb] Adequate size for Free R test set?

Tue Aug 3 10:38:57 PDT 2010

  Hi Joe,

I think almost every one has his/her own opinion on this... Here is what 
I think:

1) The test set should be such that each "relatively thin resolution 
shell" receives at least 50 reflections, and we empirically found that 
150 is "good enough" withing phenix.refine framework.
For "relatively thin resolution shell" definition see:
Lunin & Skovoroda. Acta Cryst. (1995). A51, 880-887. "R-free 
likelihood-based estimates of errors for phases calculated from atomic 
models".

This basically defines how many test reflections you need.

2) It is customary to set aside either 5 or 10% for test set, with the 
total maximum 2000. These are all "magic numbers", that I presume more 
or less satisfy "1)" so they became widely used.

3) Presence of high-order NCS and selecting free-flags using "thin 
shells" algorithm is a different story (Acta Cryst. (2006). D62, 
227--238). It is good to do that because it removes the cross-talk 
between test and work reflections due to NCS, but at the same time it 
invalidates the requirement "1)". So, this is a gray area (for me at least).

4) Some people believe that the final refinement run should be done 
using all reflections, arguing that taking away 5-10% of test 
reflections worsens the maps. There is some truth in this, yes, removing 
the data worsens the maps, but:
a) it is noticeable (in a sense that it can reduce the interpretability 
of some parts of the map) only in extreme cases of somewhat low 
resolution or low completeness data, b) in most of all other cases it is 
simply negligible, c) removing reflections randomly has much smaller 
effect than removing them systematically (see page #40 here: 
http://www.phenix-online.org/presentations/latest/pavel_maps.pdf and 
some relevant references in 2010 PHENIX paper in Acta D). However, if 
you do that "final run", you will invalidate the final refinement 
statistics, Rfree and Rwork, and thus obtained final structure cannot 
have the Rfree associated with it anymore.

Pavel.

On 8/3/10 10:04 AM, Joseph Noel wrote:
> Hi Folks,
>
> Its been a while since I personally refined many structures. In the 
> past, I used as a default, 5% of my unique reflections for the Free R 
> test set. I have a high resolution structure with 150,000 unique 
> reflections and noticed that Phenix defaults are 5% or 2000 
> reflections which ever is smaller. What is the current consensus on an 
> adequate number of unique reflections to use for cross-validation?
>
> Thanks!
> Joe
>
> P.S. I really, really love Phenix.
> ___________________________________________________________
> Joseph P. Noel, Ph.D.
> Investigator, Howard Hughes Medical Institute
> Professor, The Jack H. Skirball Center for Chemical Biology and Proteomics
> The Salk Institute for Biological Studies
> 10010 North Torrey Pines Road
> La Jolla, CA  92037 USA
>
> Phone: (858) 453-4100 extension 1442
> Cell: (858) 349-4700
> Fax: (858) 597-0855
> E-mail: noel at salk.edu <mailto:noel at salk.edu>
>
> Web Site (Salk): http://www.salk.edu/faculty/faculty_details.php?id=37
> Web Site (HHMI): http://hhmi.org/research/investigators/noel.html
> ___________________________________________________________
>
>
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20100803/248c25b7/attachment-0003.htm>