[phenixbb] rotamers

Thu Mar 24 02:27:37 PDT 2011

> Yes, there are basically three options:
>
> 1) cut the side chain down to whatever is still visible in the density
> 2) let the refinement proceed as is
> 3) set side chain atom occupancies to zero
>
> Personally, I have evolved from 2) to 1).  The argument was that
> omitting atoms will confuse some end users,

there can be endless list of things how you can confuse the end-user, so 
I guess I put it aside and assume dealing with an educated individual.

> and refinement will
> essentially take care of it by increasing the B-factors.

Yes, stupid refinement would probably do it. phenix.refine will not do 
it since zero occupancy atoms will not contribute to the scattering, and 
their B-factors will be roughly similar to those of neighbor atoms.

>   I have seen
> some quite convincing evidence since that the disordered side chains do
> have a detectable effect on the rest of the model, and thus leaving them
> in makes a model worse.

I've seen both.

> Other half of the argument was of semantic
> nature and referred more to replacing disordered residues with alanines,
> which is silly

It is silly but honest. If you call TYR something like this

ATOM    134  N   TYR A  19      21.657 -76.614  65.963  1.00 28.50      
A    N
ATOM    135  CA  TYR A  19      23.064 -76.802  65.641  1.00 27.23      
A    C
ATOM    136  CB  TYR A  19      23.231 -77.079  64.157  1.00 30.04      
A    C
ATOM    137  C   TYR A  19      23.816 -75.537  66.027  1.00 27.07      
A    C
ATOM    138  O   TYR A  19      23.265 -74.434  65.976  1.00 24.32      
A    O

that would be weird too. Call it then "handicapped TYR" -;) And I guess 
to see something like this is confusing for the end user too (especially 
one who learns things). If I see something like this my first geuss 
would be "someone messed up the file while doing copy-paste".

> because we know from sequence it's not alanine

Yes, we know this. But before we really know this, we need:
1) extract sequence from PDB file;
2) get your correct sequence;
3) align them and see mismatches;
4) distinguish between model building (occasional) errors and 
intentional ones (due to ALA truncation).

> The third option (the one you gravitate towards) seems problematic to me
> for the following reasons.  The meaning of the occupancy is that the
> atom distribution in space is multimodal, and it spends certain fraction
> of time vibrating around the specified position.  So what is the meaning
> of zero occupancy?  This is the average atomic position, but it spends
> zero time here?  Makes no physical sense, and in fact is wrong since
> there is some non-zero probability that the disordered side chain will
> occupy the designated conformation.  Of course, structural model may be
> considered a *mathematical* model, and it does not have to be strictly
> interpretable (or interpretable the way I see most logical, anyway).

This is a valid argument, I agree. Better, one would need to run a bunch 
of identical refinements and obtain the ensemble that would tell you 
(more or less) the uncertainty and degree of confidence for each atom:
http://cci.lbl.gov/~afonine/p2.png

That would be a step forward towards a better option then setting the 
occupancy to zero.

> As for end-user argument, I would say that omitted atoms are better than
> high B-factors or zero occupancies,

I exclude high b-factors as an option because if the program does it 
then it is a bug that must be fixed.
I agree, making occupancy zero is kind of abusing it in order to say "I 
do not see this atom". But so far I have no feeling about what is more 
confusing:
- set occupancy to zero in order to say "I don't see it in the map", or
- call TYR (or whatever else) something that is according to the atom 
content is NOT TYR but is ALA.

Finally, when we model 4-6A or so resolution data why we stick atoms 
into those tubes of density? Do these densities really tell you where 
that specific atom or often even residue is?

Pavel.