[phenixbb] geometry_minimization makes molprobity score worse
James Holton
jmholton at lbl.gov
Tue Jul 13 13:51:29 PDT 2021
Thanks Pavel, this is great!
I never would have thought to use SS restraints on such a
high-resolution structure. But, if that turns vdw repulsions into
hydrogen bonds, then I agree that would be the only way to go.
If you don't mind my asking, why is that not the default?
On 7/13/2021 12:04 PM, Pavel Afonine wrote:
> P.S.:
>
> and if on top of using SS and Ramachandran plot restraints I bump up
> the nonbonded weight by a bit (150), then I get
>
> ---------------------------------------
> Deviations from Ideal Values.
> Bond : 0.003 0.012 519
> Angle : 0.776 3.318 702
>
> Molprobity Statistics.
> All-atom Clashscore : 0.00
> Ramachandran Plot:
> Outliers : 0.00 %
> Allowed : 0.00 %
> Favored : 100.00 %
> Rotamer Outliers : 0.00 %
> MolProbity score: 0.5
>
> Rama-Z: 1.08
> ---------------------------------------
>
> Pavel
>
> On 7/13/21 11:44, Pavel Afonine wrote:
>> James,
>>
>> okay here comes a more detailed analysis...
>>
>> This is the statistics for monomer.pdb:
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>> Bond : 0.014 0.037 519
>> Angle : 1.622 6.565 702
>>
>> Molprobity Statistics.
>> All-atom Clashscore : 0.00
>> Ramachandran Plot:
>> Outliers : 0.00 %
>> Allowed : 1.61 %
>> Favored : 98.39 %
>> Rotamer Outliers : 1.85 %
>> MolProbity score: 0.70
>>
>> Rama-Z: -0.32
>> ---------------------------------------
>>
>> and here it is for monomer_minimized.pdb:
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>> Bond : 0.001 0.010 519
>> Angle : 0.361 3.089 702
>>
>> Molprobity Statistics.
>> All-atom Clashscore : 3.12
>> Ramachandran Plot:
>> Outliers : 1.61 %
>> Allowed : 9.68 %
>> Favored : 88.71 %
>> Rotamer Outliers : 1.85 %
>> MolProbity score: 1.89
>>
>> Rama-Z: -3.20
>> ---------------------------------------
>>
>> What I see is totally unsurprising to me. The model does not have
>> hydrogens and restraint only have repulsions as NCI term.
>> All-together, these two facts (no H and poor NCI terms) result in
>> deterioration of secondary structure (and other H-bonds), poorer
>> Ramachandran plot and slightly larger clashscore. Since Molprobity
>> score is based on accumulation of clashes, Rama plot stats, rotamers
>> etc -- no wonder it got worse in this case!
>>
>> As a side remark, I don't know if clashscore 0 is better or worse
>> than 3.12. Generally, zero clashscore is not the goal. Clashscore
>> calculation is agnostic to genuine NCI and various non-standard
>> covalent links. For example, if you have something covalently
>> attached to your aa residue, clashscore calculation will regard it as
>> a terrible clash, which in reality isn't.
>>
>> Now, back to your example. Adding H to monomer.pdb and running
>> minimization gives me:
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>> Bond : 0.001 0.008 519
>> Angle : 0.340 3.422 702
>>
>> Molprobity Statistics.
>> All-atom Clashscore : 0.00
>> Ramachandran Plot:
>> Outliers : 1.61 %
>> Allowed : 14.52 %
>> Favored : 83.87 %
>> Rotamer Outliers : 0.00 %
>> MolProbity score: 1.18
>>
>> Rama-Z: -4.59
>> ---------------------------------------
>>
>> Expectedly, adding H reduced clashscore, eliminated rotamer outliers,
>> but deteriorated secondary structure even more as evidenced by Rama
>> favored and terrible Rama-Z. Turns out for the Molprobity score the
>> improvements in clashscore and rotamers outweighed worsening of
>> Ramachandran stats, and the Molprobity score becames better as result.
>>
>> Now, if I re-ran minimization of model with H but this time using
>> secondary structure restraints (which are essentially H-bond
>> restraints for SS elements) and Ramachandran plot restraints, here is
>> what I get:
>>
>>
>> ---------------------------------------
>> Deviations from Ideal Values.
>> Bond : 0.001 0.009 519
>> Angle : 0.340 3.267 702
>>
>> Molprobity Statistics.
>> All-atom Clashscore : 1.04
>> Ramachandran Plot:
>> Outliers : 0.00 %
>> Allowed : 0.00 %
>> Favored : 100.00 %
>> Rotamer Outliers : 0.00 %
>> MolProbity score: 0.80
>>
>> Rama-Z: 0.90
>> ---------------------------------------
>>
>> Now everything is 'perfect', and Molprobity score is the same as for
>> your starting model. Can you achieve the same stats with other
>> packages? I'm *not* saying the model with this statistics is
>> realistic or any better than your initial model (remember, outliers
>> that are supported by the data are expected!) but this result shows
>> that underlying algorithms work as expected.
>>
>> So far all makes sense to me...
>>
>> Pavel
>>
>> On 7/12/21 17:55, James Holton wrote:
>>> Thank you Dale for the thoughtful explanation.
>>>
>>> I understand and agree that the "ideal" geometry is just an
>>> approximation. But is our approximation so bad that that it is
>>> impossible to construct a model that doesn't have "problems"? As in:
>>> do we always have to decide between strained bonds and angles vs bad
>>> clashes and torsional outliers? Even in the absence of an x-ray term?
>>>
>>> I don't believe that is so. My evidence to support this is that I
>>> can get pretty close to perfect molprobity scores using amber
>>> (imin=1), refmac (refi type ideal), and pdb2ins/shelxl (WGHT 0.1 0 0
>>> 0 0 0), but not with phenix.geometry_minimization. I thought perhaps
>>> the phenix developers might find that interesting? I certainly do.
>>>
>>> I do get a better clash score if I provide a model with
>>> phenix.reduce hydrogens, but I always get Rama outliers. CDL or no
>>> CDL. And, of course, if I turn on Rama restraints I always get
>>> clashes, no matter what nonbonded_weight I use.
>>>
>>> 1aho is a small, 1.0 A structure so it is not hard to see what is
>>> specifically happening. The one Rama outlier is Gly17. After amber
>>> minimization it is on the edge of the "favored" region, but after
>>> geometry_minimization it is pushed into outlier territory. The other
>>> thing bumping up the Molprobity score is a steric clash between
>>> OG:Ser40 and O:Gly43. Yes, there is a hydrogen between them, but
>>> the O--O distance goes from 2.61 to 2.33 A, which is pretty tight
>>> for an H-bond. I realize that in common practice a single outlier is
>>> considered "OK", but in reality they are physically unreasonable.
>>>
>>> I bring up my "whirlygig hypothesis" because the contour length of
>>> the main chain is the only aspect of macromolecular models that is
>>> sensitive to small "errors" in the geometry. Or, at least, errors
>>> that are not counter-balanced by other errors. Yes, the difference
>>> in the ideal (aka "goal") CA-CA distance in phenix is only 2%
>>> shorter than it is in the other three packages, but a 2% stretch of
>>> a 100 aa chain requires ~4 kcal/mol. This is significantly more
>>> than any conventional dihedral and/or anti-bumping restraints, which
>>> is where I think the energy is going.
>>>
>>> Or do you think something else is going on?
>>>
>>> Thank you for taking the time to consider this,
>>>
>>> -James
>>>
>>>
>>> On 7/9/2021 2:12 PM, Dale Tronrud wrote:
>>>> Hi,
>>>>
>>>> I thought I'd toss in an observation that might clarify some of
>>>> your problems. As Pavel noted earlier, the uncertainty in the
>>>> "ideal" values in libraries based on high resolution crystal
>>>> structures are about 0.02 A for most bonds and one to two degrees
>>>> for bond angles (with a higher number for the NCaC angle). The
>>>> origin of these uncertainties is not "error" but the actual
>>>> variability of atomic arrangements caused by their environment. The
>>>> variability these "uncertainties" represent are, in fact, real.
>>>>
>>>> Not only are the angles and bonds in individual molecules
>>>> perturbed by their surroundings, but the distortions of one angle
>>>> will, systematically, be connected to distortions of other angles.
>>>> For example, when the NCaC angle is larger than average both the
>>>> NCaCb and CbCaC angles will tend to be smaller. This particular
>>>> relationship is ignored in most libraries of standard geometry but
>>>> is implicit in the CDLs that Andy Karplus has produced.
>>>>
>>>> What this fact indicates to me is that the standard libraries
>>>> describe an ensemble of molecular models that all are consistent
>>>> with the library, and long as their deviations from the mean are
>>>> within the uncertainty. This means that there is nothing at all
>>>> special about the conformation that has average bond length and
>>>> angles and makes me question to utility of creating an "idealized"
>>>> model. There really is no evidence that a significant number of
>>>> molecules actually exist with that set of lengths and angles. In
>>>> fact, since we don't really understand all the interrelationships
>>>> between the lengths and angles, there is a good chance that a
>>>> molecule with all "perfect" angles and lengths is impossible! There
>>>> may be a conflict which we identify as a Molprobaty clash which
>>>> forbids that arrangement of atoms despite them being within the
>>>> simple-minded ensemble limits of a library w/o interrelationships.
>>>>
>>>> In the absence of any other information you may decide that an
>>>> "idealized" set of coordinates is the safest statement you can make
>>>> about your molecule's structure but this limitation of traditional
>>>> geometry libraries will mean that you have little to no confidence
>>>> that any molecule will actually have that structure. It is kind of
>>>> like the expectation value of a structure factor whose magnitude
>>>> you know but you have no idea of the phase. There is a center to
>>>> that distribution, but that doesn't mean that the probability at
>>>> that center is non-zero.
>>>>
>>>> Dale Tronrud
>>>>
>>>>
>>>>
>>>> On 7/8/2021 10:24 AM, James Holton wrote:
>>>>> Thank you Pavel for your prompt response!
>>>>>
>>>>> I agree with everything you wrote below, and that is a good point
>>>>> about 2nd derivatives.
>>>>>
>>>>> However, what I'm seeing is the opposite of what you might
>>>>> predict. See below.
>>>>>
>>>>> On 7/7/2021 11:27 PM, Pavel Afonine wrote:
>>>>>> Hi James,
>>>>>>
>>>>>> thanks for email and sharing your observations!
>>>>>>
>>>>>>> Greetings all, and I hope this little observation helps improve
>>>>>>> things somehow.
>>>>>>>
>>>>>>> I did not expect this result, but there it is. My MolProbity
>>>>>>> score goes from 0.7 to 1.9 after a run of
>>>>>>> phenix.geometry_minimization
>>>>>>>
>>>>>>> I started with an AMBER-minimized model (based on 1aho), and
>>>>>>> that got me my best MolProbity score so far (0.7). But, even
>>>>>>> with hydrogens and waters removed the geometry_minimization run
>>>>>>> increases the clashscore from 0 to 3.1 and Ramachandran favored
>>>>>>> drops from 98% to 88% with one residue reaching the outlier level.
>>>>>>
>>>>>> It is not a secret that 'standard geometry restraints' used in
>>>>>> Phenix and alike (read Refmac, etc) are very simplistic. They are
>>>>>> not aware of main chain preferential conformations (Ramachandran
>>>>>> plot), favorable side chain rotamer conformations. They don't
>>>>>> even have any electrostatic/attraction terms -- only anti-bumping
>>>>>> repulsion! Standard geometry restraints won't like any NCI
>>>>>> (non-covalent interaction) and likely will make interacting atoms
>>>>>> break apart rather than stay close together interacting.
>>>>>
>>>>> Yes, there's the rub: I'm not seeing "interacting atoms break
>>>>> apart", but rather they are being smashed together. Torsion angles
>>>>> are also being twisted out of allowed regions of the Ramachandran
>>>>> plot.
>>>>>
>>>>> All this with the x-ray term turned off!
>>>>>
>>>>>> With this in mind any high quality (high-resolution) atomic model
>>>>>> or the one optimized using sufficiently high-level QM is going to
>>>>>> have a more realistic geometry than the result of geometry
>>>>>> regularization against very simplistic restraints target. An
>>>>>> example:
>>>>>>
>>>>>> https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf
>>>>>>
>>>>>> and previous papers on the topic.
>>>>>
>>>>> I agree, but what doesn't make sense to me is how the "simplistic
>>>>> restraints" of phenix.geometry_minimization would be so
>>>>> inconsistent with the "simplistic restraints" in phenix.molprobity ?
>>>>>
>>>>> What I am doing here is starting with an energy-minimized model of
>>>>> a 1.0 A structure (1aho). It's not a fancy QM, just the ff14SB
>>>>> potential in AMBER. I get my best molprobity scores this way, but
>>>>> I need an x-ray refinement program like phenix.refine to compare
>>>>> these models with reality. It troubles me that the "geometry" in
>>>>> the x-ray refinement program all by itself messes up my molprobity
>>>>> score.
>>>>>
>>>>>>
>>>>>>> Just for comparison, with refmac5 in "refi type ideal" mode I
>>>>>>> see the MolProbity rise to 1.13, but Clashscore remains zero,
>>>>>>> some Ramas go from favored to allowed, but none rise to the
>>>>>>> level of outliers.
>>>>>>
>>>>>> I believe this is because of the nature of minimizer used. Refmac
>>>>>> uses 2nd derivative based one, which in a nutshell means it can
>>>>>> move the model much less (just a bit in vicinity of a local
>>>>>> minimum) than any program that uses gradients only (like Phenix).
>>>>> good point.
>>>>>
>>>>> So, what should I do to stabilize phenix.geometry_minimization?
>>>>> Crank up the non-bonded weight? Restrain to starting coordinates?
>>>>>
>>>>>>> Files and logs here:
>>>>>>> https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
>>>>>>>
>>>>>>> I suspect this might have something to do with library values
>>>>>>> for main-chain bonds and angles? They do seem to vary between
>>>>>>> programs. Phenix having the shortest CA-CA distance by up to
>>>>>>> 0.08 A. After running thorough minimization on a poly-A peptide
>>>>>>> I get:
>>>>>>> bond amber refmac phenix shelxl Stryer
>>>>>>> C-N 1.330 1.339 1.331 1.325 1.32
>>>>>>> N-CA 1.462 1.482 1.455 1.454 1.47
>>>>>>> CA-C 1.542 1.534 1.521 1.546 1.53
>>>>>>> CA-CA 3.862 3.874 *3.794* 3.854
>>>>>>>
>>>>>>> So, which one is "right" ?
>>>>>>
>>>>>> I'd say they are all the same, within their 'sigmas' which are
>>>>>> from memory about 0.02A:
>>>>> I note that 3.874 - 3.794 = 0.08 > 0.02
>>>>>
>>>>> This brings me to my pet theory. I think what is going on is
>>>>> small errors like this build up a considerable amount of tension
>>>>> in the long main chain. For this 64-mer, the contour length of the
>>>>> main chain after idealization is ~5 A shorter after
>>>>> phenix.geometry_minimization than it is after shelxl or amber.
>>>>> That 5 A has to come from somewhere. Without stretching bonds or
>>>>> bending angles the only thing left to do is twisting torsions. A
>>>>> kind of "whirlygig" effect.
>>>>>
>>>>> The question is: is the phenix CA-CA distance too short? Or is the
>>>>> amber CA-CA distance too long?
>>>>>
>>>>> Shall we vote?
>>>>>
>>>>> -James
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> phenixbb mailing list
>>>>> phenixbb at phenix-online.org
>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>> Unsubscribe: phenixbb-leave at phenix-online.org
>>>>>
>>>
More information about the phenixbb
mailing list