[phenixbb] phenix.map_to_model input mtz file failure --caution on using map_to_model with X-ray data

Edward A. Berry BerryE at upstate.edu
Tue Jun 13 17:56:41 PDT 2017



On 06/13/2017 07:44 PM, Dale Tronrud wrote:
>
>     First we have to agree on exactly what we are talking about.  I
> presumed we were talking about real space refinement against an
> experimental map such as one gets in cryo-EM.  In that case there are no
> Fobs.  Reciprocal space is a fiction and should be avoided.
>
>     If you are working, instead, with a 2Fo-Fc map then just do
> reciprocal space refinement.  I don't know of any reason to do
> whole-molecule real space refinement when you are working with crystal
> diffraction data.  Reciprocal space is where the experiment lives and
> the analysis should be done there.
>
>     In cases of model building it is computationally quicker to do a
> local real space refinement to touch up a model just so you can see if
> it looks reasonable before going back into reciprocal space.  This real
> space refinement is quick-and-dirty and any flaws will be erased by the
> proper reciprocal space refinement that follows.

I have no argument with that! And I do realize how significant cryo-EM has
become to structural biology with the advent of direct electron detectors.
I was still stuck on the perhaps not so important theoretical question of
"to exclude or not to exclude" the free-R set in real-space refinement
prior to reciprocal space refinement.

>     As for "neighborhood correlation" I was thinking of cryo-EM maps.
> Since the individual measurements (pictures) are real space in nature, I
> can't imagine an experimental error in the voltage of one voxel wouldn't
> tend to show up similarly in its neighbors.  The whole group of voxels
> will be illuminated by electrons who all had very similar histories
> passing through the microscope.
>
>     We have similar situations with diffraction data.  A reflection whose
> neighbor is in a shadow has a much higher chance of being shadowed
> itself.  Our spots, however, are much further apart on the detector than
> the voxels of an EM image.
>
>     There is another type of correlation that is probably more important.
>   Our diffraction spots are separated enough that you cannot predict the
> intensity of a reflection based on its neighbors.  You can make a very
> good prediction of the darkness of a voxel based on its neighbors.  If
> you leave out one voxel, as a test set member, you could easily deduce
> its hidden value without even building a molecular model - just
> interpolate.  You can't do that with diffraction data.
>
>     This means if you want to leave out a chunk of map data for a test
> set you have to pull out a big enough piece (many contiguous voxels)
> that you can't deduce anything about their opaqueness from the remaining
> image.  To do this you have to know something about how the microscope
> works.
>
> Dale Tronrud
>
> On 6/13/2017 4:08 PM, Edward A. Berry wrote:
>>>   To unbias you would have to calculate a
>>> new map with current Fcalc's for every iteration of the model, but this
>>> method would not take into account the neighborhood correlation present
>>> in experimental maps.)
>>
>> Thanks, Dale,
>> Could you explain this "neighborhood correlation"?
>>
>> My very simple (maybe too simple) understanding of how real space would
>> bias reflections is as follows:
>>
>> You make a map using phases (and Fc?) from the current model and Fobs.
>> But you omit the free set.
>>
>> Now if you take the fourier transform of that unmodified map, you would
>> get back
>> exactly the coefficients you put in: 2Fo-Fc (?) for the working
>> reflections,
>> and zero for the free set.
>>
>> Then you make modifications to the model to make its density match as
>> nearly
>> as possible the density of the map. If you were able to make the density of
>> the model exactly match that of the map, then the Fc for the model would
>> be that of the map.
>>
>> Of course you can never make the density of the model exactly match that of
>> the map - modelization is the severest form of density modification.
>> But, to the extent that you make the model's density more nearly like that
>> of the map, the Fourier transform of the new model will be more
>> like that of the map.
>>
>> That means for the working reflections, Fc will get closer to 2Fo-Fc
>> which brings them closer to Fo; and the R-work improves.
>> (If there is error in the Fobs, that will be reflected in the map,
>> and the Fcalc of the model will tend toward these eromeous Fobs
>> (fitting the error) and Rwork will get better than it should (bias).)
>> Free reflections will move closer to zero, and most likely Rfree
>> will get worse.
>>
>> I think that's all consistent with what you wrote, but then
>> I had the impression that the bias could be prevented by making the
>> map with Fc for the test set (proposed in an old paper by Ivan
>> Rayment.  That way the free reflections get are following the
>> process by their coupling to neighboring reflections in reciprocal
>> space (neighborhood correlation?), the same way  they do in reciprocal
>> space refinement, rather than the Fobs being used. The information
>> in these free Fcalc is coming from the neighboring working reflections
>> due to redundancy of information in a finely sampled molecular transform.
>>
>> Ed
>>
>>
>>
>> On 06/13/2017 05:40 PM, Dale Tronrud wrote:
>>>
>>>
>>> On 6/13/2017 12:30 PM, Edward A. Berry wrote:
>>>> Thanks, Pavel,
>>>> I really appreciate your taking the time to generate the example.
>>>>
>>>> While I agree with Tim and Ian that refinement to convergence should
>>>> remove the bias making it perhaps not a serious problem, my question was
>>>> in fact whether there is any bias immediately after the refinement.
>>>>
>>>> I will need to study this example a bit, but one thing I notice is
>>>> that you are doing exactly what I was guessing, comparing Rfree
>>>> after real-space refinement with and without using the free set.
>>>> Then, I still think, we
>>>>>>> have to think about how much of that difference results from
>>>>>>> bias towards the observed values (when the reflections are included)
>>>>>>> and
>>>>>>> how much is from bias towards zero (when the free set is excluded).
>>>>
>>>      Of course the model is refined as though the test set Fourier
>>> components were equal to zero.  In reciprocal space refinement when you
>>> leave a reflection out of the "sum over all reflections" when
>>> calculating the difference map you are saying that you have no opinion
>>> about the amplitude of that reflection.  When you calculate a real space
>>> map from Fourier coefficients you can't not have an opinion,  i.e. you
>>> can't leave a term out of the sum you can only set that term to zero.
>>> If your model produces a prediction for that term which is not equal to
>>> zero it will be penalized.  (If you set that term to Fcalc you tie your
>>> model to its starting point.  To unbias you would have to calculate a
>>> new map with current Fcalc's for every iteration of the model, but this
>>> method would not take into account the neighborhood correlation present
>>> in experimental maps.)
>>>
>>>      What this means is that Rfree is not a meaningful stat for assessing
>>> overfitting of real space refinement.  This is hardly a surprise.  A
>>> test of a refinement protocol has to be based on the mathematics of that
>>> protocol, not the protocol you happened to have used yesterday.  If you
>>> want an unbiased estimate of the quality of a real space refinement you
>>> have to leave out a region of the map and then see how well the model
>>> fits that region.  This is harder to do in an automated fashion and
>>> there will be a lot of caveats about your results (e.g. you know about
>>> the ability to fit one region but does that generalize to other areas?).
>>>    If you recall there are a lot of caveats about Rfree too - we have just
>>> stopped worrying about them.  (e.g. low resolution vrs high resolution
>>> reflections, choosing based on shells or randomly, what to do about
>>> ncs...)
>>>
>>>      I think you should consider yourself on the wrong track if you come
>>> up with a statistical test, but haven't given any thought to the actual
>>> experiment that produced your map.
>>>
>>> Dale Tronrud
>>>
>>>> Things I need to look at-
>>>> What are R and R-free for the original refined model
>>>> What are R and R-free after shaking (did RSR lower R but not Rfree, or
>>>> did it raise Rfree?
>>>> What if RSR is done using a map made with fill-in strategy?
>>>>
>>>> Ed
>>>>
>>>> On 06/13/2017 02:15 PM, Pavel Afonine wrote:
>>>>> Hi Ed,
>>>>>
>>>>> Including free-r reflections into map calculation and then using such
>>>>> map in real-space refinement of entire model will affect Rfree. Here
>>>>> is a simple example that illustrates my statement, step-by-step:
>>>>>
>>>>> 1) Get data and model from PDB:
>>>>>
>>>>> phenix.fetch_pdb 1f8t --mtz
>>>>>
>>>>> 2) Compute two 2mFo-DFc maps: one includes all reflections the other
>>>>> one has no free-r terms:
>>>>>
>>>>> phenix.python run.py 1f8t.{pdb,mtz}
>>>>>
>>>>> This will create an MTZ file (map_coeffs.mtz) that contains Fourier
>>>>> map coefficients for both maps.
>>>>>
>>>>> 3) Shake model a bit:
>>>>>
>>>>> phenix.dynamics 1f8t.pdb number_of_steps=500
>>>>>
>>>>> 4) Run real-space refinement using two maps:
>>>>>
>>>>> phenix.real_space_refine map_coeffs.mtz 1f8t_shaken.pdb
>>>>> label="work,PHIwork" ncs_constraints=false output.file_name_prefix=work
>>>>>
>>>>> phenix.real_space_refine map_coeffs.mtz 1f8t_shaken.pdb
>>>>> label="all,PHIall" ncs_constraints=false output.file_name_prefix=all
>>>>>
>>>>> 5) Compute R-factors using data and real-space refined models:
>>>>>
>>>>> phenix.model_vs_data 1f8t.mtz all_real_space_refined.pdb
>>>>>        r_work(re-computed)                : 0.2419
>>>>>        r_free(re-computed)                : 0.2441
>>>>>
>>>>> phenix.model_vs_data 1f8t.mtz work_real_space_refined.pdb
>>>>>        r_work(re-computed)                : 0.2444
>>>>>        r_free(re-computed)                : 0.2756
>>>>>
>>>>> The result is self-explicable and is inline with Tom's reply to Wei.
>>>>>
>>>>> All files necessary to reproduce calculations above are here:
>>>>> http://cci.lbl.gov/~afonine/tmp/
>>>>>
>>>>> All the best,
>>>>> Pavel
>>>>>
>>>>>
>>>>> On 6/8/17 10:05, Tim Gruene wrote:
>>>>>> Hi Ed,
>>>>>>
>>>>>> including the 'free' reflections in the map for modelling does not
>>>>>> taint the
>>>>>> value of Rfree. That is a misconception that i s very persistent (as
>>>>>> prejudice
>>>>>> usually are). I believe it was Ian Tickle who formulated that when
>>>>>> you simply
>>>>>> refine long enough towards convergence, all reflections excluded from
>>>>>> refinement
>>>>>> will become independent, i.e. you can assign a new set for Rfree
>>>>>> every time
>>>>>> you refine, if you wish so.
>>>>>>
>>>>>> This concept is the reason why Rcomplete (the "better" equivalent to
>>>>>> Rfree for
>>>>>> small data sets with < 10,000 unique reflections), introduced by Axel
>>>>>> Brunger,
>>>>>> works, as we could demonstrate in     doi: 10.1073/pnas.1502136112
>>>>>>
>>>>>> So nothing to worry about when including all reflections in map
>>>>>> calculations.
>>>>>>
>>>>>> Cheers,
>>>>>> Tim
>>>>>>
>>>>>> On Thursday, June 8, 2017 12:42:53 PM CEST Edward A. Berry wrote:
>>>>>>> Hi, Tom,
>>>>>>> Please forgive what may be a silly question from an outsider who
>>>>>>> hasn't
>>>>>>> really kept up with the crystallography literature or even all the
>>>>>>> Phenix
>>>>>>> newsletters- What is the evidence that including the free set in
>>>>>>> real space
>>>>>>> refinement biases R-free of the resulting model? Is this Rfree also
>>>>>>> biased
>>>>>>> when map coefficients use "fill-in" for the excluded free
>>>>>>> reflections (and
>>>>>>> is that what phenix.remove_free_from_map does?).
>>>>>>>
>>>>>>> My point is that literally excluding the free reflections, as
>>>>>>> opposed to
>>>>>>> substituting their values with Fc, will bias the free set toward
>>>>>>> grossly
>>>>>>> incorrect values (namely zero) and therefore greatly worsen R-free.
>>>>>>> Thus if
>>>>>>> the evidence for bias is that you get worse R-free when you
>>>>>>> exclude the
>>>>>>> free set, you have to think about how much of that difference
>>>>>>> results from
>>>>>>> bias towards the observed values (when the reflections are included)
>>>>>>> and
>>>>>>> how much is from bias towards zero (when the free set is excluded).
>>>>>>> (Again, I realize this may be all very well understood by the
>>>>>>> crystallography community and properly taken care of in phenix; I'm
>>>>>>> just
>>>>>>> asking for my own information) eab
>>>>>>>
>>>>>>> On 06/08/2017 07:28 AM, Terwilliger, Thomas Charles wrote:
>>>>>>>> ​Hi Wei,
>>>>>>>>
>>>>>>>>
>>>>>>>> I want to give a word of caution about how to use
>>>>>>>> phenix.map_to_model on
>>>>>>>> crystallographic data...The bottom line is you should remove the
>>>>>>>> test set
>>>>>>>> from your map coefficients before running phenix.map_to model on
>>>>>>>> X-ray
>>>>>>>> data.  Here is why:
>>>>>>>>
>>>>>>>>
>>>>>>>> phenix.map_to_model uses real-space refinement, which is refinement
>>>>>>>> against the map. If you supply map coefficients that include your
>>>>>>>> test
>>>>>>>> reflections, then you will be refining against data that is in your
>>>>>>>> test
>>>>>>>> set.   This will make your Rfree invalid when you go back and
>>>>>>>> refine your
>>>>>>>> model against the original crystallographic data.
>>>>>>>>
>>>>>>>>
>>>>>>>> To remove the test set from your map coefficients you can use:
>>>>>>>>
>>>>>>>>
>>>>>>>> phenix.remove_free_from_map  map_coeffs=my_map_coeffs.mtz
>>>>>>>> free_in=my_data_file_with_freeR_flags.mtz
>>>>>>>> mtz_out=my_map_coeffs_no_free.mtz
>>>>>>>>
>>>>>>>>
>>>>>>>> Also note that phenix.map_to_model uses a fixed map (it does not do
>>>>>>>> density modification).  Consequently for most crystallographic
>>>>>>>> data at
>>>>>>>> moderate resolution or higher phenix.autobuild is going to do much
>>>>>>>> better
>>>>>>>> than phenix.map_to_model.
>>>>>>>>
>>>>>>>>
>>>>>>>> All the best,
>>>>>>>>
>>>>>>>> Tom T
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------- *From:*dingding830106 at 163.com
>>>>>>>> <dingding830106 at 163.com>  on behalf ofdancingdream at 163.com
>>>>>>>> <dancingdream at 163.com>  *Sent:* Tuesday, June 6, 2017 9:16 PM
>>>>>>>> *To:* Terwilliger, Thomas Charles
>>>>>>>> *Cc:*phenixbb at phenix-online.org
>>>>>>>> *Subject:* Re:Re: [phenixbb] phenix.map_to_model input mtz file
>>>>>>>> failure
>>>>>>>> Dear Thomas,
>>>>>>>> I use CAD to convert the labels from FDM->FWT, PHIDM->PHFWT, then
>>>>>>>> submit
>>>>>>>> this job again (without map_coeffs_labels=... ), and everything
>>>>>>>> seems ok.
>>>>>>>> Thank you very much for you help.
>>>>>>>> Best!
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Wei Ding
>>>>>>>> P.O.Box 603
>>>>>>>> The Institute of Physics,Chinese Academy of Sciences
>>>>>>>> Beijing,China
>>>>>>>> 100190
>>>>>>>> Tel: +86-10-82649083
>>>>>>>>
>>>>>>>> E-mail:dingwei at iphy.ac.cn  <mailto:wangli at moon.ibp.ac.cn>
>>>>>>>>
>>>>>>>> At 2017-06-07 10:32:14, "Terwilliger, Thomas Charles"
>>>>>> <terwilliger at lanl.gov>  wrote:
>>>>>>>>        Hi Wei,
>>>>>>>>
>>>>>>>>
>>>>>>>>        I'm sorry for the trouble!
>>>>>>>>
>>>>>>>>
>>>>>>>>        If you supply an MTZ file that has FWT,PHFWT or similar
>>>>>>>> labels, then
>>>>>>>>        you can skip the "labels=...." statement and it should run.
>>>>>>>>
>>>>>>>>
>>>>>>>>        Let me know if that does not work!
>>>>>>>>        All the best,
>>>>>>>>
>>>>>>>>        Tom T
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>        ---------- *From:*phenixbb-bounces at phenix-online.org
>>>>>>>>        <mailto:phenixbb-bounces at phenix-online.org>
>>>>>>>>        <phenixbb-bounces at phenix-online.org
>>>>>>>>        <mailto:phenixbb-bounces at phenix-online.org>> on behalf of
>>>>>>>>        dancingdream at 163.com  <mailto:dancingdream at 163.com>
>>>>>>>>        <dancingdream at 163.com  <mailto:dancingdream at 163.com>> *Sent:*
>>>>>>>> Tuesday,
>>>>>>>>        June 6, 2017 8:19 PM
>>>>>>>>        *To:*phenixbb at phenix-online.org
>>>>>>>> <mailto:phenixbb at phenix-online.org>
>>>>>>>>        *Subject:* [phenixbb] phenix.map_to_model input mtz file
>>>>>>>> failure
>>>>>>>>        Dear Phenix bb,
>>>>>>>>        I intend to build a initial model by phenix.map_to_model.
>>>>>>>> And the
>>>>>>>>        command line is as follows: phenix.map_to_model_1.12rc0-2787
>>>>>>>>        map_coeffs_file=../rep_dm.mtz map_coeffs_labels="'FP,SIGFP'
>>>>>>>> 'PHIDM'
>>>>>>>>        'FOMDM'" seq_file=../resolve.seq  is_crystal=True
>>>>>>>>        use_sg_symmetry=True  density_select=False
>>>>>>>> truncate_at_d_min=True
>>>>>>>>        and the feedback like this:
>>>>>>>>        Sorry: No initial assignment made for map_coeffs. Labels used:
>>>>>>>>        FP,SIGFP PHIDM FOMDM. Available labels: ['PHIB', 'FOM',
>>>>>>>>        'HLA,HLB,HLC,HLD', 'FP,SIGFP', 'PHIDM', 'FOMDM', 'FDM',
>>>>>>>>        'HLADM,HLBDM,HLCDM,HLDDM'] NOTE: grouped labels like
>>>>>>>> 'FP,SIGFP' must
>>>>>>>>        stay together,
>>>>>>>>        have commas, and have no spaces. If they come from an MTZ
>>>>>>>> file,
>>>>>>>>        they must be in adjacent columns as well.
>>>>>>>>        Suggested labels to use:  PHIDM  FOMDM
>>>>>>>>        I try many other input format of map_coeffs_labels, such as
>>>>>>>>        map_coeffs_labels="FP,SIGFP PHIDM FOMDM"
>>>>>>>>        map_coeffs_labels=["FP,SIGFP PHIDM FOMDM"]
>>>>>>>>        ... ...
>>>>>>>>        but the result is the same. Dose anyone can tell me how to fix
>>>>>>>> this
>>>>>>>>        problem? Thank a lot.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>        --
>>>>>>>>        Wei Ding
>>>>>>>>        P.O.Box 603
>>>>>>>>        The Institute of Physics,Chinese Academy of Sciences
>>>>>>>>        Beijing,China
>>>>>>>>        100190
>>>>>>>>        Tel: +86-10-82649083
>>>>>>>>        E-mail:dingwei at iphy.ac.cn  <mailto:wangli at moon.ibp.ac.cn>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> phenixbb mailing list
>>>>>>>> phenixbb at phenix-online.org
>>>>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>>>>> Unsubscribe:phenixbb-leave at phenix-online.org
>>>>>>> _______________________________________________
>>>>>>> phenixbb mailing list
>>>>>>> phenixbb at phenix-online.org
>>>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>>>> Unsubscribe:phenixbb-leave at phenix-online.org
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> phenixbb mailing list
>>>>>> phenixbb at phenix-online.org
>>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>>> Unsubscribe:phenixbb-leave at phenix-online.org
>>>>>
>>>> _______________________________________________
>>>> phenixbb mailing list
>>>> phenixbb at phenix-online.org
>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>> Unsubscribe: phenixbb-leave at phenix-online.org
>>> _______________________________________________
>>> phenixbb mailing list
>>> phenixbb at phenix-online.org
>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>> Unsubscribe: phenixbb-leave at phenix-online.org
>>>
>>
>


More information about the phenixbb mailing list