[phenixbb] geometry_minimization makes molprobity score worse

Pavel Afonine pafonine at lbl.gov
Tue Jul 13 14:10:12 PDT 2021


Hi James,

in this numeric exercise we did pure geometry minimization (no data 
involved), so the optimized models reflect restraints only. Here the 
more information you add (like SS restraints or H or...) on top of 
standard set of simple restraints the better model you get.

In high-resolution refinement data should be sufficient to keep your 
model good. But at lower resolutions yes we need to bring more 
information and this is when we recommend using all these extra restraints.

Why it's not the default? At least three reasons that I can think of:

1) You don't always need it. For example, if data are sufficiently good 
(high resolution, for example). In this case simple set of restraints 
proves to be just enough.

2) These restraints are tricky to set up 100% correctly fully 
automatically. For example, secondary structure annotation relies on the 
quality of input model geometry. Depending on model quality, the 
procedure can potentially miss some H-bonds or create spurious ones. 
This is why we always recommend to hand verify and curate (if needed) 
automatically found secondary structure annotation before using it: you 
don't want to blindly enforce wrong restraints!

3) Validation tools like Ramachandran plot loose their validation value 
if used as restraint target. You only want to use it if nothing else 
helps to keep your model geometry meaningful. This, however, is now less 
of a problem given that we can use Rama-Z, which is not used as restraints.

Pavel

On 7/13/21 13:51, James Holton wrote:
> Thanks Pavel, this is great!
>
> I never would have thought to use SS restraints on such a 
> high-resolution structure.  But, if that turns vdw repulsions into 
> hydrogen bonds, then I agree that would be the only way to go.
>
> If you don't mind my asking, why is that not the default?
>
>
> On 7/13/2021 12:04 PM, Pavel Afonine wrote:
>> P.S.:
>>
>> and if on top of using SS and Ramachandran plot restraints I bump up 
>> the nonbonded weight by a bit (150), then I get
>>
>> ---------------------------------------
>>   Deviations from Ideal Values.
>>     Bond      :  0.003   0.012    519
>>     Angle     :  0.776   3.318    702
>>
>>   Molprobity Statistics.
>>     All-atom Clashscore : 0.00
>>     Ramachandran Plot:
>>       Outliers :  0.00 %
>>       Allowed  :  0.00 %
>>       Favored  : 100.00 %
>>     Rotamer Outliers :  0.00 %
>>     MolProbity score: 0.5
>>
>>   Rama-Z: 1.08
>> ---------------------------------------
>>
>> Pavel
>>
>> On 7/13/21 11:44, Pavel Afonine wrote:
>>> James,
>>>
>>> okay here comes a more detailed analysis...
>>>
>>> This is the statistics for monomer.pdb:
>>>
>>> ---------------------------------------
>>> Deviations from Ideal Values.
>>>   Bond      :  0.014   0.037    519
>>>   Angle     :  1.622   6.565    702
>>>
>>> Molprobity Statistics.
>>>   All-atom Clashscore : 0.00
>>>   Ramachandran Plot:
>>>     Outliers :  0.00 %
>>>     Allowed  :  1.61 %
>>>     Favored  : 98.39 %
>>>   Rotamer Outliers :  1.85 %
>>>   MolProbity score: 0.70
>>>
>>> Rama-Z: -0.32
>>> ---------------------------------------
>>>
>>> and here it is for monomer_minimized.pdb:
>>>
>>> ---------------------------------------
>>> Deviations from Ideal Values.
>>>   Bond      :  0.001   0.010    519
>>>   Angle     :  0.361   3.089    702
>>>
>>> Molprobity Statistics.
>>>   All-atom Clashscore : 3.12
>>>   Ramachandran Plot:
>>>     Outliers :  1.61 %
>>>     Allowed  :  9.68 %
>>>     Favored  : 88.71 %
>>>   Rotamer Outliers :  1.85 %
>>>   MolProbity score: 1.89
>>>
>>> Rama-Z: -3.20
>>> ---------------------------------------
>>>
>>> What I see is totally unsurprising to me. The model does not have 
>>> hydrogens and restraint only have repulsions as NCI term. 
>>> All-together, these two facts (no H and poor NCI terms) result in 
>>> deterioration of secondary structure (and other H-bonds), poorer 
>>> Ramachandran plot and slightly larger clashscore. Since Molprobity 
>>> score is based on accumulation of clashes, Rama plot stats, rotamers 
>>> etc -- no wonder it got worse in this case!
>>>
>>> As a side remark, I don't know if clashscore 0 is better or worse 
>>> than 3.12. Generally, zero clashscore is not the goal. Clashscore 
>>> calculation is agnostic to genuine NCI and various non-standard 
>>> covalent links. For example, if you have something covalently 
>>> attached to your aa residue, clashscore calculation will regard it 
>>> as a terrible clash, which in reality isn't.
>>>
>>> Now, back to your example. Adding H to monomer.pdb and running 
>>> minimization gives me:
>>>
>>> ---------------------------------------
>>> Deviations from Ideal Values.
>>>   Bond      :  0.001   0.008    519
>>>   Angle     :  0.340   3.422    702
>>>
>>> Molprobity Statistics.
>>>   All-atom Clashscore : 0.00
>>>   Ramachandran Plot:
>>>     Outliers :  1.61 %
>>>     Allowed  : 14.52 %
>>>     Favored  : 83.87 %
>>>   Rotamer Outliers :  0.00 %
>>>   MolProbity score: 1.18
>>>
>>> Rama-Z: -4.59
>>> ---------------------------------------
>>>
>>> Expectedly, adding H reduced clashscore, eliminated rotamer 
>>> outliers, but deteriorated secondary structure even more as 
>>> evidenced by Rama favored and terrible Rama-Z. Turns out for the 
>>> Molprobity score the improvements in clashscore and rotamers 
>>> outweighed worsening of Ramachandran stats, and the Molprobity score 
>>> becames better as result.
>>>
>>> Now, if I re-ran minimization of model with H but this time using 
>>> secondary structure restraints (which are essentially H-bond 
>>> restraints for SS elements) and Ramachandran plot restraints, here 
>>> is what I get:
>>>
>>>
>>> ---------------------------------------
>>> Deviations from Ideal Values.
>>>   Bond      :  0.001   0.009    519
>>>   Angle     :  0.340   3.267    702
>>>
>>> Molprobity Statistics.
>>>   All-atom Clashscore : 1.04
>>>   Ramachandran Plot:
>>>     Outliers :  0.00 %
>>>     Allowed  :  0.00 %
>>>     Favored  : 100.00 %
>>>   Rotamer Outliers :  0.00 %
>>>   MolProbity score: 0.80
>>>
>>> Rama-Z: 0.90
>>> ---------------------------------------
>>>
>>> Now everything is 'perfect', and Molprobity score is the same as for 
>>> your starting model. Can you achieve the same stats with other 
>>> packages? I'm *not* saying the model with this statistics is 
>>> realistic or any better than your initial model (remember, outliers 
>>> that are supported by the data are expected!) but this result shows 
>>> that underlying algorithms work as expected.
>>>
>>> So far all makes sense to me...
>>>
>>> Pavel
>>>
>>> On 7/12/21 17:55, James Holton wrote:
>>>> Thank you Dale for the thoughtful explanation.
>>>>
>>>> I understand and agree that the "ideal" geometry is just an 
>>>> approximation.  But is our approximation so bad that that it is 
>>>> impossible to construct a model that doesn't have "problems"? As 
>>>> in: do we always have to decide between strained bonds and angles 
>>>> vs bad clashes and torsional outliers?  Even in the absence of an 
>>>> x-ray term?
>>>>
>>>> I don't believe that is so.  My evidence to support this is that I 
>>>> can get pretty close to perfect molprobity scores using amber 
>>>> (imin=1), refmac (refi type ideal), and pdb2ins/shelxl (WGHT 0.1 0 
>>>> 0 0 0 0), but not with phenix.geometry_minimization. I thought 
>>>> perhaps the phenix developers might find that interesting?  I 
>>>> certainly do.
>>>>
>>>> I do get a better clash score if I provide a model with 
>>>> phenix.reduce hydrogens, but I always get Rama outliers. CDL or no 
>>>> CDL. And, of course, if I turn on Rama restraints I always get 
>>>> clashes, no matter what nonbonded_weight I use.
>>>>
>>>> 1aho is a small, 1.0 A structure so it is not hard to see what is 
>>>> specifically happening.  The one Rama outlier is Gly17. After amber 
>>>> minimization it is on the edge of the "favored" region, but after 
>>>> geometry_minimization it is pushed into outlier territory. The 
>>>> other thing bumping up the Molprobity score is a steric clash 
>>>> between OG:Ser40 and O:Gly43.  Yes, there is a hydrogen between 
>>>> them, but the O--O distance goes from 2.61 to 2.33 A, which is 
>>>> pretty tight for an H-bond. I realize that in common practice a 
>>>> single outlier is considered "OK", but in reality they are 
>>>> physically unreasonable.
>>>>
>>>> I bring up my "whirlygig hypothesis" because the contour length of 
>>>> the main chain is the only aspect of macromolecular models that is 
>>>> sensitive to small "errors" in the geometry. Or, at least, errors 
>>>> that are not counter-balanced by other errors. Yes, the difference 
>>>> in the ideal (aka "goal") CA-CA distance in phenix is only 2% 
>>>> shorter than it is in the other three packages, but a 2% stretch of 
>>>> a 100 aa chain requires ~4 kcal/mol.  This is significantly more 
>>>> than any conventional dihedral and/or anti-bumping restraints, 
>>>> which is where I think the energy is going.
>>>>
>>>> Or do you think something else is going on?
>>>>
>>>> Thank you for taking the time to consider this,
>>>>
>>>> -James
>>>>
>>>>
>>>> On 7/9/2021 2:12 PM, Dale Tronrud wrote:
>>>>> Hi,
>>>>>
>>>>>    I thought I'd toss in an observation that might clarify some of 
>>>>> your problems.  As Pavel noted earlier, the uncertainty in the 
>>>>> "ideal" values in libraries based on high resolution crystal 
>>>>> structures are about 0.02 A for most bonds and one to two degrees 
>>>>> for bond angles (with a higher number for the NCaC angle).  The 
>>>>> origin of these uncertainties is not "error" but the actual 
>>>>> variability of atomic arrangements caused by their environment. 
>>>>> The variability these "uncertainties" represent are, in fact, real.
>>>>>
>>>>>    Not only are the angles and bonds in individual molecules 
>>>>> perturbed by their surroundings, but the distortions of one angle 
>>>>> will, systematically, be connected to distortions of other 
>>>>> angles.  For example, when the NCaC angle is larger than average 
>>>>> both the NCaCb and CbCaC angles will tend to be smaller.  This 
>>>>> particular relationship is ignored in most libraries of standard 
>>>>> geometry but is implicit in the CDLs that Andy Karplus has produced.
>>>>>
>>>>>    What this fact indicates to me is that the standard libraries 
>>>>> describe an ensemble of molecular models that all are consistent 
>>>>> with the library, and long as their deviations from the mean are 
>>>>> within the uncertainty.  This means that there is nothing at all 
>>>>> special about the conformation that has average bond length and 
>>>>> angles and makes me question to utility of creating an "idealized" 
>>>>> model.  There really is no evidence that a significant number of 
>>>>> molecules actually exist with that set of lengths and angles.  In 
>>>>> fact, since we don't really understand all the interrelationships 
>>>>> between the lengths and angles, there is a good chance that a 
>>>>> molecule with all "perfect" angles and lengths is impossible! 
>>>>> There may be a conflict which we identify as a Molprobaty clash 
>>>>> which forbids that arrangement of atoms despite them being within 
>>>>> the simple-minded ensemble limits of a library w/o 
>>>>> interrelationships.
>>>>>
>>>>>    In the absence of any other information you may decide that an 
>>>>> "idealized" set of coordinates is the safest statement you can 
>>>>> make about your molecule's structure but this limitation of 
>>>>> traditional geometry libraries will mean that you have little to 
>>>>> no confidence that any molecule will actually have that 
>>>>> structure.  It is kind of like the expectation value of a 
>>>>> structure factor whose magnitude you know but you have no idea of 
>>>>> the phase. There is a center to that distribution, but that 
>>>>> doesn't mean that the probability at that center is non-zero.
>>>>>
>>>>> Dale Tronrud
>>>>>
>>>>>
>>>>>
>>>>> On 7/8/2021 10:24 AM, James Holton wrote:
>>>>>> Thank you Pavel for your prompt response!
>>>>>>
>>>>>> I agree with everything you wrote below, and that is a good point 
>>>>>> about 2nd derivatives.
>>>>>>
>>>>>> However, what I'm seeing is the opposite of what you might 
>>>>>> predict. See below.
>>>>>>
>>>>>> On 7/7/2021 11:27 PM, Pavel Afonine wrote:
>>>>>>> Hi James,
>>>>>>>
>>>>>>> thanks for email and sharing your observations!
>>>>>>>
>>>>>>>> Greetings all, and I hope this little observation helps improve 
>>>>>>>> things somehow.
>>>>>>>>
>>>>>>>> I did not expect this result, but there it is. My MolProbity 
>>>>>>>> score goes from 0.7 to 1.9 after a run of 
>>>>>>>> phenix.geometry_minimization
>>>>>>>>
>>>>>>>> I started with an AMBER-minimized model (based on 1aho), and 
>>>>>>>> that got me my best MolProbity score so far (0.7). But, even 
>>>>>>>> with hydrogens and waters removed the geometry_minimization run 
>>>>>>>> increases the clashscore from 0 to 3.1 and Ramachandran favored 
>>>>>>>> drops from 98% to 88% with one residue reaching the outlier level.
>>>>>>>
>>>>>>> It is not a secret that 'standard geometry restraints' used in 
>>>>>>> Phenix and alike (read Refmac, etc) are very simplistic. They 
>>>>>>> are not aware of main chain preferential conformations 
>>>>>>> (Ramachandran plot), favorable side chain rotamer conformations. 
>>>>>>> They don't even have any electrostatic/attraction terms -- only 
>>>>>>> anti-bumping repulsion! Standard geometry restraints won't like 
>>>>>>> any NCI (non-covalent interaction) and likely will make 
>>>>>>> interacting atoms break apart rather than stay close together 
>>>>>>> interacting.
>>>>>>
>>>>>> Yes, there's the rub: I'm not seeing "interacting atoms break 
>>>>>> apart", but rather they are being smashed together. Torsion 
>>>>>> angles are also being twisted out of allowed regions of the 
>>>>>> Ramachandran plot.
>>>>>>
>>>>>> All this with the x-ray term turned off!
>>>>>>
>>>>>>> With this in mind any high quality (high-resolution) atomic 
>>>>>>> model or the one optimized using sufficiently high-level QM is 
>>>>>>> going to have a more realistic geometry than the result of 
>>>>>>> geometry regularization against very simplistic restraints 
>>>>>>> target. An example:
>>>>>>>
>>>>>>> https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf
>>>>>>>
>>>>>>> and previous papers on the topic.
>>>>>>
>>>>>> I agree, but what doesn't make sense to me is how the "simplistic 
>>>>>> restraints" of phenix.geometry_minimization would be so 
>>>>>> inconsistent with the "simplistic restraints" in phenix.molprobity ?
>>>>>>
>>>>>> What I am doing here is starting with an energy-minimized model 
>>>>>> of a 1.0 A structure (1aho). It's not a fancy QM, just the ff14SB 
>>>>>> potential in AMBER.  I get my best molprobity scores this way, 
>>>>>> but I need an x-ray refinement program like phenix.refine to 
>>>>>> compare these models with reality.  It troubles me that the 
>>>>>> "geometry" in the x-ray refinement program all by itself messes 
>>>>>> up my molprobity score.
>>>>>>
>>>>>>>
>>>>>>>> Just for comparison, with refmac5 in "refi type ideal" mode I 
>>>>>>>> see the MolProbity rise to 1.13, but Clashscore remains zero, 
>>>>>>>> some Ramas go from favored to allowed, but none rise to the 
>>>>>>>> level of outliers.
>>>>>>>
>>>>>>> I believe this is because of the nature of minimizer used. 
>>>>>>> Refmac uses 2nd derivative based one, which in a nutshell means 
>>>>>>> it can move the model much less (just a bit in vicinity of a 
>>>>>>> local minimum) than any program that uses gradients only (like 
>>>>>>> Phenix).
>>>>>> good point.
>>>>>>
>>>>>> So, what should I do to stabilize phenix.geometry_minimization? 
>>>>>> Crank up the non-bonded weight? Restrain to starting coordinates?
>>>>>>
>>>>>>>> Files and logs here:
>>>>>>>> https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
>>>>>>>>
>>>>>>>> I suspect this might have something to do with library values 
>>>>>>>> for main-chain bonds and angles? They do seem to vary between 
>>>>>>>> programs. Phenix having the shortest CA-CA distance by up to 
>>>>>>>> 0.08 A. After running thorough minimization on a poly-A peptide 
>>>>>>>> I get:
>>>>>>>> bond   amber   refmac  phenix  shelxl Stryer
>>>>>>>>  C-N   1.330   1.339   1.331   1.325     1.32
>>>>>>>>  N-CA  1.462   1.482   1.455   1.454     1.47
>>>>>>>> CA-C   1.542   1.534   1.521   1.546     1.53
>>>>>>>> CA-CA  3.862   3.874 *3.794* 3.854
>>>>>>>>
>>>>>>>> So, which one is "right" ?
>>>>>>>
>>>>>>> I'd say they are all the same, within their 'sigmas' which are 
>>>>>>> from memory about 0.02A:
>>>>>> I note that 3.874 - 3.794 = 0.08 > 0.02
>>>>>>
>>>>>> This brings me to my pet theory.  I think what is going on is 
>>>>>> small errors like this build up a considerable amount of tension 
>>>>>> in the long main chain. For this 64-mer, the contour length of 
>>>>>> the main chain after idealization is ~5 A shorter after 
>>>>>> phenix.geometry_minimization than it is after shelxl or amber. 
>>>>>> That 5 A has to come from somewhere.  Without stretching bonds or 
>>>>>> bending angles the only thing left to do is twisting torsions. A 
>>>>>> kind of "whirlygig" effect.
>>>>>>
>>>>>> The question is: is the phenix CA-CA distance too short? Or is 
>>>>>> the amber CA-CA distance too long?
>>>>>>
>>>>>> Shall we vote?
>>>>>>
>>>>>> -James
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> phenixbb mailing list
>>>>>> phenixbb at phenix-online.org
>>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>>> Unsubscribe: phenixbb-leave at phenix-online.org
>>>>>>
>>>>
>


More information about the phenixbb mailing list