[phenixbb] geometry_minimization makes molprobity score worse
Pavel Afonine
pafonine at lbl.gov
Tue Jul 13 12:04:07 PDT 2021
P.S.:
and if on top of using SS and Ramachandran plot restraints I bump up the
nonbonded weight by a bit (150), then I get
---------------------------------------
Deviations from Ideal Values.
Bond : 0.003 0.012 519
Angle : 0.776 3.318 702
Molprobity Statistics.
All-atom Clashscore : 0.00
Ramachandran Plot:
Outliers : 0.00 %
Allowed : 0.00 %
Favored : 100.00 %
Rotamer Outliers : 0.00 %
MolProbity score: 0.5
Rama-Z: 1.08
---------------------------------------
Pavel
On 7/13/21 11:44, Pavel Afonine wrote:
> James,
>
> okay here comes a more detailed analysis...
>
> This is the statistics for monomer.pdb:
>
> ---------------------------------------
> Deviations from Ideal Values.
> Bond : 0.014 0.037 519
> Angle : 1.622 6.565 702
>
> Molprobity Statistics.
> All-atom Clashscore : 0.00
> Ramachandran Plot:
> Outliers : 0.00 %
> Allowed : 1.61 %
> Favored : 98.39 %
> Rotamer Outliers : 1.85 %
> MolProbity score: 0.70
>
> Rama-Z: -0.32
> ---------------------------------------
>
> and here it is for monomer_minimized.pdb:
>
> ---------------------------------------
> Deviations from Ideal Values.
> Bond : 0.001 0.010 519
> Angle : 0.361 3.089 702
>
> Molprobity Statistics.
> All-atom Clashscore : 3.12
> Ramachandran Plot:
> Outliers : 1.61 %
> Allowed : 9.68 %
> Favored : 88.71 %
> Rotamer Outliers : 1.85 %
> MolProbity score: 1.89
>
> Rama-Z: -3.20
> ---------------------------------------
>
> What I see is totally unsurprising to me. The model does not have
> hydrogens and restraint only have repulsions as NCI term.
> All-together, these two facts (no H and poor NCI terms) result in
> deterioration of secondary structure (and other H-bonds), poorer
> Ramachandran plot and slightly larger clashscore. Since Molprobity
> score is based on accumulation of clashes, Rama plot stats, rotamers
> etc -- no wonder it got worse in this case!
>
> As a side remark, I don't know if clashscore 0 is better or worse than
> 3.12. Generally, zero clashscore is not the goal. Clashscore
> calculation is agnostic to genuine NCI and various non-standard
> covalent links. For example, if you have something covalently attached
> to your aa residue, clashscore calculation will regard it as a
> terrible clash, which in reality isn't.
>
> Now, back to your example. Adding H to monomer.pdb and running
> minimization gives me:
>
> ---------------------------------------
> Deviations from Ideal Values.
> Bond : 0.001 0.008 519
> Angle : 0.340 3.422 702
>
> Molprobity Statistics.
> All-atom Clashscore : 0.00
> Ramachandran Plot:
> Outliers : 1.61 %
> Allowed : 14.52 %
> Favored : 83.87 %
> Rotamer Outliers : 0.00 %
> MolProbity score: 1.18
>
> Rama-Z: -4.59
> ---------------------------------------
>
> Expectedly, adding H reduced clashscore, eliminated rotamer outliers,
> but deteriorated secondary structure even more as evidenced by Rama
> favored and terrible Rama-Z. Turns out for the Molprobity score the
> improvements in clashscore and rotamers outweighed worsening of
> Ramachandran stats, and the Molprobity score becames better as result.
>
> Now, if I re-ran minimization of model with H but this time using
> secondary structure restraints (which are essentially H-bond
> restraints for SS elements) and Ramachandran plot restraints, here is
> what I get:
>
>
> ---------------------------------------
> Deviations from Ideal Values.
> Bond : 0.001 0.009 519
> Angle : 0.340 3.267 702
>
> Molprobity Statistics.
> All-atom Clashscore : 1.04
> Ramachandran Plot:
> Outliers : 0.00 %
> Allowed : 0.00 %
> Favored : 100.00 %
> Rotamer Outliers : 0.00 %
> MolProbity score: 0.80
>
> Rama-Z: 0.90
> ---------------------------------------
>
> Now everything is 'perfect', and Molprobity score is the same as for
> your starting model. Can you achieve the same stats with other
> packages? I'm *not* saying the model with this statistics is realistic
> or any better than your initial model (remember, outliers that are
> supported by the data are expected!) but this result shows that
> underlying algorithms work as expected.
>
> So far all makes sense to me...
>
> Pavel
>
> On 7/12/21 17:55, James Holton wrote:
>> Thank you Dale for the thoughtful explanation.
>>
>> I understand and agree that the "ideal" geometry is just an
>> approximation. But is our approximation so bad that that it is
>> impossible to construct a model that doesn't have "problems"? As in:
>> do we always have to decide between strained bonds and angles vs bad
>> clashes and torsional outliers? Even in the absence of an x-ray term?
>>
>> I don't believe that is so. My evidence to support this is that I
>> can get pretty close to perfect molprobity scores using amber
>> (imin=1), refmac (refi type ideal), and pdb2ins/shelxl (WGHT 0.1 0 0
>> 0 0 0), but not with phenix.geometry_minimization. I thought perhaps
>> the phenix developers might find that interesting? I certainly do.
>>
>> I do get a better clash score if I provide a model with phenix.reduce
>> hydrogens, but I always get Rama outliers. CDL or no CDL. And, of
>> course, if I turn on Rama restraints I always get clashes, no matter
>> what nonbonded_weight I use.
>>
>> 1aho is a small, 1.0 A structure so it is not hard to see what is
>> specifically happening. The one Rama outlier is Gly17. After amber
>> minimization it is on the edge of the "favored" region, but after
>> geometry_minimization it is pushed into outlier territory. The other
>> thing bumping up the Molprobity score is a steric clash between
>> OG:Ser40 and O:Gly43. Yes, there is a hydrogen between them, but the
>> O--O distance goes from 2.61 to 2.33 A, which is pretty tight for an
>> H-bond. I realize that in common practice a single outlier is
>> considered "OK", but in reality they are physically unreasonable.
>>
>> I bring up my "whirlygig hypothesis" because the contour length of
>> the main chain is the only aspect of macromolecular models that is
>> sensitive to small "errors" in the geometry. Or, at least, errors
>> that are not counter-balanced by other errors. Yes, the difference in
>> the ideal (aka "goal") CA-CA distance in phenix is only 2% shorter
>> than it is in the other three packages, but a 2% stretch of a 100 aa
>> chain requires ~4 kcal/mol. This is significantly more than any
>> conventional dihedral and/or anti-bumping restraints, which is where
>> I think the energy is going.
>>
>> Or do you think something else is going on?
>>
>> Thank you for taking the time to consider this,
>>
>> -James
>>
>>
>> On 7/9/2021 2:12 PM, Dale Tronrud wrote:
>>> Hi,
>>>
>>> I thought I'd toss in an observation that might clarify some of
>>> your problems. As Pavel noted earlier, the uncertainty in the
>>> "ideal" values in libraries based on high resolution crystal
>>> structures are about 0.02 A for most bonds and one to two degrees
>>> for bond angles (with a higher number for the NCaC angle). The
>>> origin of these uncertainties is not "error" but the actual
>>> variability of atomic arrangements caused by their environment. The
>>> variability these "uncertainties" represent are, in fact, real.
>>>
>>> Not only are the angles and bonds in individual molecules
>>> perturbed by their surroundings, but the distortions of one angle
>>> will, systematically, be connected to distortions of other angles.
>>> For example, when the NCaC angle is larger than average both the
>>> NCaCb and CbCaC angles will tend to be smaller. This particular
>>> relationship is ignored in most libraries of standard geometry but
>>> is implicit in the CDLs that Andy Karplus has produced.
>>>
>>> What this fact indicates to me is that the standard libraries
>>> describe an ensemble of molecular models that all are consistent
>>> with the library, and long as their deviations from the mean are
>>> within the uncertainty. This means that there is nothing at all
>>> special about the conformation that has average bond length and
>>> angles and makes me question to utility of creating an "idealized"
>>> model. There really is no evidence that a significant number of
>>> molecules actually exist with that set of lengths and angles. In
>>> fact, since we don't really understand all the interrelationships
>>> between the lengths and angles, there is a good chance that a
>>> molecule with all "perfect" angles and lengths is impossible! There
>>> may be a conflict which we identify as a Molprobaty clash which
>>> forbids that arrangement of atoms despite them being within the
>>> simple-minded ensemble limits of a library w/o interrelationships.
>>>
>>> In the absence of any other information you may decide that an
>>> "idealized" set of coordinates is the safest statement you can make
>>> about your molecule's structure but this limitation of traditional
>>> geometry libraries will mean that you have little to no confidence
>>> that any molecule will actually have that structure. It is kind of
>>> like the expectation value of a structure factor whose magnitude you
>>> know but you have no idea of the phase. There is a center to that
>>> distribution, but that doesn't mean that the probability at that
>>> center is non-zero.
>>>
>>> Dale Tronrud
>>>
>>>
>>>
>>> On 7/8/2021 10:24 AM, James Holton wrote:
>>>> Thank you Pavel for your prompt response!
>>>>
>>>> I agree with everything you wrote below, and that is a good point
>>>> about 2nd derivatives.
>>>>
>>>> However, what I'm seeing is the opposite of what you might predict.
>>>> See below.
>>>>
>>>> On 7/7/2021 11:27 PM, Pavel Afonine wrote:
>>>>> Hi James,
>>>>>
>>>>> thanks for email and sharing your observations!
>>>>>
>>>>>> Greetings all, and I hope this little observation helps improve
>>>>>> things somehow.
>>>>>>
>>>>>> I did not expect this result, but there it is. My MolProbity
>>>>>> score goes from 0.7 to 1.9 after a run of
>>>>>> phenix.geometry_minimization
>>>>>>
>>>>>> I started with an AMBER-minimized model (based on 1aho), and that
>>>>>> got me my best MolProbity score so far (0.7). But, even with
>>>>>> hydrogens and waters removed the geometry_minimization run
>>>>>> increases the clashscore from 0 to 3.1 and Ramachandran favored
>>>>>> drops from 98% to 88% with one residue reaching the outlier level.
>>>>>
>>>>> It is not a secret that 'standard geometry restraints' used in
>>>>> Phenix and alike (read Refmac, etc) are very simplistic. They are
>>>>> not aware of main chain preferential conformations (Ramachandran
>>>>> plot), favorable side chain rotamer conformations. They don't even
>>>>> have any electrostatic/attraction terms -- only anti-bumping
>>>>> repulsion! Standard geometry restraints won't like any NCI
>>>>> (non-covalent interaction) and likely will make interacting atoms
>>>>> break apart rather than stay close together interacting.
>>>>
>>>> Yes, there's the rub: I'm not seeing "interacting atoms break
>>>> apart", but rather they are being smashed together. Torsion angles
>>>> are also being twisted out of allowed regions of the Ramachandran
>>>> plot.
>>>>
>>>> All this with the x-ray term turned off!
>>>>
>>>>> With this in mind any high quality (high-resolution) atomic model
>>>>> or the one optimized using sufficiently high-level QM is going to
>>>>> have a more realistic geometry than the result of geometry
>>>>> regularization against very simplistic restraints target. An example:
>>>>>
>>>>> https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf
>>>>>
>>>>> and previous papers on the topic.
>>>>
>>>> I agree, but what doesn't make sense to me is how the "simplistic
>>>> restraints" of phenix.geometry_minimization would be so
>>>> inconsistent with the "simplistic restraints" in phenix.molprobity ?
>>>>
>>>> What I am doing here is starting with an energy-minimized model of
>>>> a 1.0 A structure (1aho). It's not a fancy QM, just the ff14SB
>>>> potential in AMBER. I get my best molprobity scores this way, but
>>>> I need an x-ray refinement program like phenix.refine to compare
>>>> these models with reality. It troubles me that the "geometry" in
>>>> the x-ray refinement program all by itself messes up my molprobity
>>>> score.
>>>>
>>>>>
>>>>>> Just for comparison, with refmac5 in "refi type ideal" mode I see
>>>>>> the MolProbity rise to 1.13, but Clashscore remains zero, some
>>>>>> Ramas go from favored to allowed, but none rise to the level of
>>>>>> outliers.
>>>>>
>>>>> I believe this is because of the nature of minimizer used. Refmac
>>>>> uses 2nd derivative based one, which in a nutshell means it can
>>>>> move the model much less (just a bit in vicinity of a local
>>>>> minimum) than any program that uses gradients only (like Phenix).
>>>> good point.
>>>>
>>>> So, what should I do to stabilize phenix.geometry_minimization?
>>>> Crank up the non-bonded weight? Restrain to starting coordinates?
>>>>
>>>>>> Files and logs here:
>>>>>> https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
>>>>>>
>>>>>> I suspect this might have something to do with library values for
>>>>>> main-chain bonds and angles? They do seem to vary between
>>>>>> programs. Phenix having the shortest CA-CA distance by up to 0.08
>>>>>> A. After running thorough minimization on a poly-A peptide I get:
>>>>>> bond amber refmac phenix shelxl Stryer
>>>>>> C-N 1.330 1.339 1.331 1.325 1.32
>>>>>> N-CA 1.462 1.482 1.455 1.454 1.47
>>>>>> CA-C 1.542 1.534 1.521 1.546 1.53
>>>>>> CA-CA 3.862 3.874 *3.794* 3.854
>>>>>>
>>>>>> So, which one is "right" ?
>>>>>
>>>>> I'd say they are all the same, within their 'sigmas' which are
>>>>> from memory about 0.02A:
>>>> I note that 3.874 - 3.794 = 0.08 > 0.02
>>>>
>>>> This brings me to my pet theory. I think what is going on is small
>>>> errors like this build up a considerable amount of tension in the
>>>> long main chain. For this 64-mer, the contour length of the main
>>>> chain after idealization is ~5 A shorter after
>>>> phenix.geometry_minimization than it is after shelxl or amber. That
>>>> 5 A has to come from somewhere. Without stretching bonds or
>>>> bending angles the only thing left to do is twisting torsions. A
>>>> kind of "whirlygig" effect.
>>>>
>>>> The question is: is the phenix CA-CA distance too short? Or is the
>>>> amber CA-CA distance too long?
>>>>
>>>> Shall we vote?
>>>>
>>>> -James
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> phenixbb mailing list
>>>> phenixbb at phenix-online.org
>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>> Unsubscribe: phenixbb-leave at phenix-online.org
>>>>
>>
More information about the phenixbb
mailing list