[phenixbb] geometry_minimization makes molprobity score worse

Tue Jul 13 13:51:29 PDT 2021

Thanks Pavel, this is great!

I never would have thought to use SS restraints on such a 
high-resolution structure.  But, if that turns vdw repulsions into 
hydrogen bonds, then I agree that would be the only way to go.

If you don't mind my asking, why is that not the default?

On 7/13/2021 12:04 PM, Pavel Afonine wrote:
> P.S.:
>
> and if on top of using SS and Ramachandran plot restraints I bump up 
> the nonbonded weight by a bit (150), then I get
>
> ---------------------------------------
>   Deviations from Ideal Values.
>     Bond      :  0.003   0.012    519
>     Angle     :  0.776   3.318    702
>
>   Molprobity Statistics.
>     All-atom Clashscore : 0.00
>     Ramachandran Plot:
>       Outliers :  0.00 %
>       Allowed  :  0.00 %
>       Favored  : 100.00 %
>     Rotamer Outliers :  0.00 %
>     MolProbity score: 0.5
>
>   Rama-Z: 1.08
> ---------------------------------------
>
> Pavel
>
> On 7/13/21 11:44, Pavel Afonine wrote:
>> James,
>>
>> okay here comes a more detailed analysis...
>>
>> This is the statistics for monomer.pdb:
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>>   Bond      :  0.014   0.037    519
>>   Angle     :  1.622   6.565    702
>>
>> Molprobity Statistics.
>>   All-atom Clashscore : 0.00
>>   Ramachandran Plot:
>>     Outliers :  0.00 %
>>     Allowed  :  1.61 %
>>     Favored  : 98.39 %
>>   Rotamer Outliers :  1.85 %
>>   MolProbity score: 0.70
>>
>> Rama-Z: -0.32
>> ---------------------------------------
>>
>> and here it is for monomer_minimized.pdb:
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>>   Bond      :  0.001   0.010    519
>>   Angle     :  0.361   3.089    702
>>
>> Molprobity Statistics.
>>   All-atom Clashscore : 3.12
>>   Ramachandran Plot:
>>     Outliers :  1.61 %
>>     Allowed  :  9.68 %
>>     Favored  : 88.71 %
>>   Rotamer Outliers :  1.85 %
>>   MolProbity score: 1.89
>>
>> Rama-Z: -3.20
>> ---------------------------------------
>>
>> What I see is totally unsurprising to me. The model does not have 
>> hydrogens and restraint only have repulsions as NCI term. 
>> All-together, these two facts (no H and poor NCI terms) result in 
>> deterioration of secondary structure (and other H-bonds), poorer 
>> Ramachandran plot and slightly larger clashscore. Since Molprobity 
>> score is based on accumulation of clashes, Rama plot stats, rotamers 
>> etc -- no wonder it got worse in this case!
>>
>> As a side remark, I don't know if clashscore 0 is better or worse 
>> than 3.12. Generally, zero clashscore is not the goal. Clashscore 
>> calculation is agnostic to genuine NCI and various non-standard 
>> covalent links. For example, if you have something covalently 
>> attached to your aa residue, clashscore calculation will regard it as 
>> a terrible clash, which in reality isn't.
>>
>> Now, back to your example. Adding H to monomer.pdb and running 
>> minimization gives me:
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>>   Bond      :  0.001   0.008    519
>>   Angle     :  0.340   3.422    702
>>
>> Molprobity Statistics.
>>   All-atom Clashscore : 0.00
>>   Ramachandran Plot:
>>     Outliers :  1.61 %
>>     Allowed  : 14.52 %
>>     Favored  : 83.87 %
>>   Rotamer Outliers :  0.00 %
>>   MolProbity score: 1.18
>>
>> Rama-Z: -4.59
>> ---------------------------------------
>>
>> Expectedly, adding H reduced clashscore, eliminated rotamer outliers, 
>> but deteriorated secondary structure even more as evidenced by Rama 
>> favored and terrible Rama-Z. Turns out for the Molprobity score the 
>> improvements in clashscore and rotamers outweighed worsening of 
>> Ramachandran stats, and the Molprobity score becames better as result.
>>
>> Now, if I re-ran minimization of model with H but this time using 
>> secondary structure restraints (which are essentially H-bond 
>> restraints for SS elements) and Ramachandran plot restraints, here is 
>> what I get:
>>
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>>   Bond      :  0.001   0.009    519
>>   Angle     :  0.340   3.267    702
>>
>> Molprobity Statistics.
>>   All-atom Clashscore : 1.04
>>   Ramachandran Plot:
>>     Outliers :  0.00 %
>>     Allowed  :  0.00 %
>>     Favored  : 100.00 %
>>   Rotamer Outliers :  0.00 %
>>   MolProbity score: 0.80
>>
>> Rama-Z: 0.90
>> ---------------------------------------
>>
>> Now everything is 'perfect', and Molprobity score is the same as for 
>> your starting model. Can you achieve the same stats with other 
>> packages? I'm *not* saying the model with this statistics is 
>> realistic or any better than your initial model (remember, outliers 
>> that are supported by the data are expected!) but this result shows 
>> that underlying algorithms work as expected.
>>
>> So far all makes sense to me...
>>
>> Pavel
>>
>> On 7/12/21 17:55, James Holton wrote:
>>> Thank you Dale for the thoughtful explanation.
>>>
>>> I understand and agree that the "ideal" geometry is just an 
>>> approximation.  But is our approximation so bad that that it is 
>>> impossible to construct a model that doesn't have "problems"? As in: 
>>> do we always have to decide between strained bonds and angles vs bad 
>>> clashes and torsional outliers?  Even in the absence of an x-ray term?
>>>
>>> I don't believe that is so.  My evidence to support this is that I 
>>> can get pretty close to perfect molprobity scores using amber 
>>> (imin=1), refmac (refi type ideal), and pdb2ins/shelxl (WGHT 0.1 0 0 
>>> 0 0 0), but not with phenix.geometry_minimization. I thought perhaps 
>>> the phenix developers might find that interesting?  I certainly do.
>>>
>>> I do get a better clash score if I provide a model with 
>>> phenix.reduce hydrogens, but I always get Rama outliers. CDL or no 
>>> CDL. And, of course, if I turn on Rama restraints I always get 
>>> clashes, no matter what nonbonded_weight I use.
>>>
>>> 1aho is a small, 1.0 A structure so it is not hard to see what is 
>>> specifically happening.  The one Rama outlier is Gly17. After amber 
>>> minimization it is on the edge of the "favored" region, but after 
>>> geometry_minimization it is pushed into outlier territory. The other 
>>> thing bumping up the Molprobity score is a steric clash between 
>>> OG:Ser40 and O:Gly43.  Yes, there is a hydrogen between them, but 
>>> the O--O distance goes from 2.61 to 2.33 A, which is pretty tight 
>>> for an H-bond. I realize that in common practice a single outlier is 
>>> considered "OK", but in reality they are physically unreasonable.
>>>
>>> I bring up my "whirlygig hypothesis" because the contour length of 
>>> the main chain is the only aspect of macromolecular models that is 
>>> sensitive to small "errors" in the geometry. Or, at least, errors 
>>> that are not counter-balanced by other errors. Yes, the difference 
>>> in the ideal (aka "goal") CA-CA distance in phenix is only 2% 
>>> shorter than it is in the other three packages, but a 2% stretch of 
>>> a 100 aa chain requires ~4 kcal/mol.  This is significantly more 
>>> than any conventional dihedral and/or anti-bumping restraints, which 
>>> is where I think the energy is going.
>>>
>>> Or do you think something else is going on?
>>>
>>> Thank you for taking the time to consider this,
>>>
>>> -James
>>>
>>>
>>> On 7/9/2021 2:12 PM, Dale Tronrud wrote:
>>>> Hi,
>>>>
>>>>    I thought I'd toss in an observation that might clarify some of 
>>>> your problems.  As Pavel noted earlier, the uncertainty in the 
>>>> "ideal" values in libraries based on high resolution crystal 
>>>> structures are about 0.02 A for most bonds and one to two degrees 
>>>> for bond angles (with a higher number for the NCaC angle).  The 
>>>> origin of these uncertainties is not "error" but the actual 
>>>> variability of atomic arrangements caused by their environment. The 
>>>> variability these "uncertainties" represent are, in fact, real.
>>>>
>>>>    Not only are the angles and bonds in individual molecules 
>>>> perturbed by their surroundings, but the distortions of one angle 
>>>> will, systematically, be connected to distortions of other angles.  
>>>> For example, when the NCaC angle is larger than average both the 
>>>> NCaCb and CbCaC angles will tend to be smaller.  This particular 
>>>> relationship is ignored in most libraries of standard geometry but 
>>>> is implicit in the CDLs that Andy Karplus has produced.
>>>>
>>>>    What this fact indicates to me is that the standard libraries 
>>>> describe an ensemble of molecular models that all are consistent 
>>>> with the library, and long as their deviations from the mean are 
>>>> within the uncertainty.  This means that there is nothing at all 
>>>> special about the conformation that has average bond length and 
>>>> angles and makes me question to utility of creating an "idealized" 
>>>> model.  There really is no evidence that a significant number of 
>>>> molecules actually exist with that set of lengths and angles.  In 
>>>> fact, since we don't really understand all the interrelationships 
>>>> between the lengths and angles, there is a good chance that a 
>>>> molecule with all "perfect" angles and lengths is impossible! There 
>>>> may be a conflict which we identify as a Molprobaty clash which 
>>>> forbids that arrangement of atoms despite them being within the 
>>>> simple-minded ensemble limits of a library w/o interrelationships.
>>>>
>>>>    In the absence of any other information you may decide that an 
>>>> "idealized" set of coordinates is the safest statement you can make 
>>>> about your molecule's structure but this limitation of traditional 
>>>> geometry libraries will mean that you have little to no confidence 
>>>> that any molecule will actually have that structure.  It is kind of 
>>>> like the expectation value of a structure factor whose magnitude 
>>>> you know but you have no idea of the phase.  There is a center to 
>>>> that distribution, but that doesn't mean that the probability at 
>>>> that center is non-zero.
>>>>
>>>> Dale Tronrud
>>>>
>>>>
>>>>
>>>> On 7/8/2021 10:24 AM, James Holton wrote:
>>>>> Thank you Pavel for your prompt response!
>>>>>
>>>>> I agree with everything you wrote below, and that is a good point 
>>>>> about 2nd derivatives.
>>>>>
>>>>> However, what I'm seeing is the opposite of what you might 
>>>>> predict. See below.
>>>>>
>>>>> On 7/7/2021 11:27 PM, Pavel Afonine wrote:
>>>>>> Hi James,
>>>>>>
>>>>>> thanks for email and sharing your observations!
>>>>>>
>>>>>>> Greetings all, and I hope this little observation helps improve 
>>>>>>> things somehow.
>>>>>>>
>>>>>>> I did not expect this result, but there it is. My MolProbity 
>>>>>>> score goes from 0.7 to 1.9 after a run of 
>>>>>>> phenix.geometry_minimization
>>>>>>>
>>>>>>> I started with an AMBER-minimized model (based on 1aho), and 
>>>>>>> that got me my best MolProbity score so far (0.7). But, even 
>>>>>>> with hydrogens and waters removed the geometry_minimization run 
>>>>>>> increases the clashscore from 0 to 3.1 and Ramachandran favored 
>>>>>>> drops from 98% to 88% with one residue reaching the outlier level.
>>>>>>
>>>>>> It is not a secret that 'standard geometry restraints' used in 
>>>>>> Phenix and alike (read Refmac, etc) are very simplistic. They are 
>>>>>> not aware of main chain preferential conformations (Ramachandran 
>>>>>> plot), favorable side chain rotamer conformations. They don't 
>>>>>> even have any electrostatic/attraction terms -- only anti-bumping 
>>>>>> repulsion! Standard geometry restraints won't like any NCI 
>>>>>> (non-covalent interaction) and likely will make interacting atoms 
>>>>>> break apart rather than stay close together interacting.
>>>>>
>>>>> Yes, there's the rub: I'm not seeing "interacting atoms break 
>>>>> apart", but rather they are being smashed together. Torsion angles 
>>>>> are also being twisted out of allowed regions of the Ramachandran 
>>>>> plot.
>>>>>
>>>>> All this with the x-ray term turned off!
>>>>>
>>>>>> With this in mind any high quality (high-resolution) atomic model 
>>>>>> or the one optimized using sufficiently high-level QM is going to 
>>>>>> have a more realistic geometry than the result of geometry 
>>>>>> regularization against very simplistic restraints target. An 
>>>>>> example:
>>>>>>
>>>>>> https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf
>>>>>>
>>>>>> and previous papers on the topic.
>>>>>
>>>>> I agree, but what doesn't make sense to me is how the "simplistic 
>>>>> restraints" of phenix.geometry_minimization would be so 
>>>>> inconsistent with the "simplistic restraints" in phenix.molprobity ?
>>>>>
>>>>> What I am doing here is starting with an energy-minimized model of 
>>>>> a 1.0 A structure (1aho). It's not a fancy QM, just the ff14SB 
>>>>> potential in AMBER.  I get my best molprobity scores this way, but 
>>>>> I need an x-ray refinement program like phenix.refine to compare 
>>>>> these models with reality.  It troubles me that the "geometry" in 
>>>>> the x-ray refinement program all by itself messes up my molprobity 
>>>>> score.
>>>>>
>>>>>>
>>>>>>> Just for comparison, with refmac5 in "refi type ideal" mode I 
>>>>>>> see the MolProbity rise to 1.13, but Clashscore remains zero, 
>>>>>>> some Ramas go from favored to allowed, but none rise to the 
>>>>>>> level of outliers.
>>>>>>
>>>>>> I believe this is because of the nature of minimizer used. Refmac 
>>>>>> uses 2nd derivative based one, which in a nutshell means it can 
>>>>>> move the model much less (just a bit in vicinity of a local 
>>>>>> minimum) than any program that uses gradients only (like Phenix).
>>>>> good point.
>>>>>
>>>>> So, what should I do to stabilize phenix.geometry_minimization? 
>>>>> Crank up the non-bonded weight? Restrain to starting coordinates?
>>>>>
>>>>>>> Files and logs here:
>>>>>>> https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
>>>>>>>
>>>>>>> I suspect this might have something to do with library values 
>>>>>>> for main-chain bonds and angles?  They do seem to vary between 
>>>>>>> programs. Phenix having the shortest CA-CA distance by up to 
>>>>>>> 0.08 A. After running thorough minimization on a poly-A peptide 
>>>>>>> I get:
>>>>>>> bond   amber   refmac  phenix  shelxl Stryer
>>>>>>>  C-N   1.330   1.339   1.331   1.325     1.32
>>>>>>>  N-CA  1.462   1.482   1.455   1.454     1.47
>>>>>>> CA-C   1.542   1.534   1.521   1.546     1.53
>>>>>>> CA-CA  3.862   3.874 *3.794* 3.854
>>>>>>>
>>>>>>> So, which one is "right" ?
>>>>>>
>>>>>> I'd say they are all the same, within their 'sigmas' which are 
>>>>>> from memory about 0.02A:
>>>>> I note that 3.874 - 3.794 = 0.08 > 0.02
>>>>>
>>>>> This brings me to my pet theory.  I think what is going on is 
>>>>> small errors like this build up a considerable amount of tension 
>>>>> in the long main chain. For this 64-mer, the contour length of the 
>>>>> main chain after idealization is ~5 A shorter after 
>>>>> phenix.geometry_minimization than it is after shelxl or amber. 
>>>>> That 5 A has to come from somewhere.  Without stretching bonds or 
>>>>> bending angles the only thing left to do is twisting torsions. A 
>>>>> kind of "whirlygig" effect.
>>>>>
>>>>> The question is: is the phenix CA-CA distance too short? Or is the 
>>>>> amber CA-CA distance too long?
>>>>>
>>>>> Shall we vote?
>>>>>
>>>>> -James
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> phenixbb mailing list
>>>>> phenixbb at phenix-online.org
>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>> Unsubscribe: phenixbb-leave at phenix-online.org
>>>>>
>>>