The normal probability plot & how to use it.

When an isomorphous difference Patterson is calculated, GraphEnt will plot the normal probability diagram of the input data, together with a reference dotted line of gradient 1.0 and zero intercept12. The usage of the normal probability plots for accessing the usefulness (or otherwise) of a putative derivative is well documented and will not be discussed here (see Howell, P.L. & Smith, G.D. (1992), J. Appl. Cryst., 25, 81-86, and Abrahams, S.C. & Keve, E.T. (1971), Acta Cryst., A 27, 157-165). If you scaled your (macromolecular) data using the program scaleit from the CCP4 suite, then although you have not seen the plot, you have seen the variation of its gradient and intercept versus resolution (using the program xloggraph on the .log file written by scaleit). The reason for repeating the calculation here, is that the normal probability plot can also be used to select suspect data that do not fit an otherwise linear trend. The important thing is that the selection is not performed on the basis of just the magnitude of the difference (ie || FPH| - | FP||, as happens in scaleit), but on the basis of both the observed amplitudes and their standard deviations. The normal probability plot together with the large contributions to '' table (files CHIcontributions.dat and CHIcontributions.ps), which is produced after the calculation is over, should allow you to justifiably select outliers13.

This is achieved as follows : GraphEnt will write out an ASCII file (named Normplot_tails.dat which contains the hkl indeces for all reflections that comprise the tails of the plot. These points are shown in the graphics window with a different colour. If some of these points deviate significantly from the rest of the plot, then they are candidates for rejection (note that some deviation from linearity will always be present near the tails. What you are looking for is an outstanding deviation.)

You can then match what you see in the plot with what is written in the Normplot_tails.dat, decide which reflections to exclude, write their indeces in an ASCII file with the name REJECT.HKL, and then re-run the program using the MAXENT_AUTO.IN file after adding the keyword REJECT (see page ). Because this sounds quite complicated, I will now give a detailed example to show how it works :

We start with just one .mtz file containing data for a putative derivative :


crystal2 ~/test
crystal2 ~/test d
total 260
-rw-r--r--   1 glykos   sys        262300 Dec 16 15:45 from_scaleit.mtz
crystal2 ~/test
crystal2 ~/test mtzdump hklin from_scaleit.mtz

##########################################################
##########################################################
##########################################################
### CCP PROGRAM SUITE: MTZDUMP     VERSION 3.5: 18/06/98##
##########################################################

...............

OVERALL FILE STATISTICS for resolution range   0.001 -   0.245
=======================
Col Sort    Min    Max    Num      %     Mean     Mean   Resolution   Type Column
num order               Missing complete          abs.   Low    High       label

1 ASC    -46      35      0  100.00    -11.3     18.0  35.81   2.02   H  H
2 NONE     0      11      0  100.00      4.0      4.0  35.81   2.02   H  K
3 NONE     0      31      0  100.00     12.3     12.3  35.81   2.02   H  L
4 NONE    4.4   902.0     3   99.96    92.65    92.65  35.81   2.02   F  FP
5 NONE    0.6    26.2     3   99.96     3.34     3.34  35.81   2.02   Q  SIGFP
6 NONE    8.7   956.3  3500   51.49   137.07   137.07  18.78   2.50   F  FPH
7 NONE    1.2    41.5  3500   51.49     7.95     7.95  18.78   2.50   Q  SIGPH
8 NONE  -73.2    72.3  3718   48.47     0.29     7.29  18.78   2.51   D  DPH
9 NONE    0.0    66.8  3718   48.47    11.85    11.85  18.78   2.51   Q  SIGDPH

No. of reflections used in FILE STATISTICS     7215

LIST OF REFLECTIONS
===================

...............

MTZDUMP:   Normal termination of mtzdump
Times: User:       0.2s System:    0.1s Elapsed:    0:03
crystal2 ~/test
crystal2 ~/test


Then, we run GraphEnt on the centrosymmetric [010] projection :


crystal2 ~/test
crystal2 ~/test GraphEnt h0l 10 3 from_scaleit.mtz

- Assuming that input is a .mtz file. Interpreting ...
..............................................

- Now trying lambda = 0.010000
.............................................
- Initial value for lambda set to 1000.000000

___________________________________________________________________________________________________________________________

- MAXENT starts here

Chi**2 :        1593.822     R : 1.0000       Lambda :   1000.00000       Nobs :     366
Chi**2 :        1588.187     R : 0.9992       Lambda :   1000.00000       Nobs :     366
........................................................................................
Chi**2 :         365.790     R : 0.5621       Lambda :    945.19320       Nobs :     366
803 cycles in 74 seconds, giving an average of 0.092 seconds per cycle.
___________________________________________________________________________________________________________________________

CONVERGENCE ACHIEVED.
The final R-factor between the observed
and calculated amplitudes is  0.5621040

........................................

Normal termination ? (100 seconds)


Now we have all these files :


crystal2 ~/test d
total 652
-rw-r--r--   1 glykos   sys            68 Dec 16 15:51 CHIcontributions.dat
-rw-r--r--   1 glykos   sys         37224 Dec 16 15:51 CHIcontributions.ps
-rw-r--r--   1 glykos   sys         31101 Dec 16 15:49 MAXENT_AUTO.IN
-rw-r--r--   1 glykos   sys         30595 Dec 16 15:48 MAXENT_FROM_MTZ.in
-rw-r--r--   1 glykos   sys           103 Dec 16 15:48 MAXENT_FROM_MTZ_ANOMALOUS.in
-rw-r--r--   1 glykos   sys         10365 Dec 16 15:48 Normal_probability.ps
-rw-r--r--   1 glykos   sys           825 Dec 16 15:48 Normplot_tails.dat
-rw-r--r--   1 glykos   sys        132176 Dec 16 15:50 conventional.map
-rw-r--r--   1 glykos   sys        262300 Dec 16 15:45 from_scaleit.mtz
-rw-r--r--   1 glykos   sys        132176 Dec 16 15:51 maxent.map
crystal2 ~/test


Both CHIcontributions.dat and Normplot_tails.dat point to problems with reflections 0,0,11 and -12,0,8 :


crystal2 ~/test
crystal2 ~/test
crystal2 ~/test more CHIcontributions.dat
0    0   11          55.19882
-12    0    8          59.04416
crystal2 ~/test
crystal2 ~/test
crystal2 ~/test more Normplot_tails.dat
0     0     6            -2.99385       -30.69588
2     0     4            -2.64107       -28.07780
-12     0     8            -2.46310       -26.91301
0     0    11            -2.34000       -25.43124
-4     0     8            -2.24461       -22.29077
0     0     7            -2.16611       -21.85669
4     0    10            -2.09905       -18.73302
-16     0     5            +2.04028       +10.47118
8     0     6            +2.09905       +10.55087
4     0     4            +2.16611       +10.55754
-8     0     9            +2.24461       +11.08962
6     0     3            +2.34000       +11.90654
4     0     6            +2.46310       +12.23197
-16     0    10            +2.64107       +12.45890
2     0     6            +2.99385       +13.12762
crystal2 ~/test
crystal2 ~/test


The normal probability plot suggests that all seven reflections in the lower left-hand side corner are suspect. Its somewhat sigmoidal shape suggests the presence of non-normally distributed (systematic) errors :

Let's repeat the calculation but with these seven reflections excluded from the calculation. The first step is to create a file with the name REJECT.HKL whose first three columns contain the indeces of the reflections to be excluded :

crystal2 ~/test
crystal2 ~/test cp Normplot_tails.dat REJECT.HKL
crystal2 ~/test ed REJECT.HKL
crystal2 ~/test more REJECT.HKL
0     0     6            -2.99385       -30.69588
2     0     4            -2.64107       -28.07780
-12     0     8            -2.46310       -26.91301
0     0    11            -2.34000       -25.43124
-4     0     8            -2.24461       -22.29077
0     0     7            -2.16611       -21.85669
4     0    10            -2.09905       -18.73302
crystal2 ~/test
crystal2 ~/test


Then, we edit the file MAXENT_AUTO.IN and we add the keyword REJECT :


crystal2 ~/test
crystal2 ~/test ed MAXENT_AUTO.IN
crystal2 ~/test more -20 MAXENT_AUTO.IN
REJECT
CELL               94.14900        24.17000        64.31901        90.00000       130.36700        90.00000
SPACEGROUP   1
MAP_FORMAT   CCP4
DIFF_PATT
PERMUTATION  3 1 2
GRID            128    256      1
GRACYCLES    80
GRATWOWINDOWS
REFLECTIONS
-30     0     9          89.88602         3.43968       123.75751        12.84017
-30     0    10         126.17858         3.93975       110.84611        10.25688
-30     0    11          38.71215         5.14720        36.43570        15.66436
-30     0    12         165.68549         4.99690       154.67838         7.42726
-30     0    13          38.65771         4.30664        43.59790        16.74030
-30     0    14         158.72888         4.75254       159.23166         5.49528
-30     0    15          86.40644         3.25414        84.79811        15.51947
-30     0    16         150.11438         4.57498       146.66685         5.11194
-30     0    17         132.07582         4.08662       164.78131         5.11169
-30     0    18          21.89952         8.06613        23.18951        10.87039
.................................................................................
crystal2 ~/test
crystal2 ~/test


... and we run it again, but this time giving as input the MAXENT_AUTO.IN file :


crystal2 ~/test
crystal2 ~/test GraphEnt MAXENT_AUTO.IN

Keyword           REJECT : 7 reflections specified in REJECT.HKL.
Keyword             CELL : Cell dimensions set to  94.15  24.17  64.32  90.00 130.37  90.00
Keyword       SPACEGROUP : space group number set to 1
Keyword       MAP_FORMAT : CCP4 map file selected.
Keyword        DIFF_PATT : Difference Patterson map run [h k l FP sig(FP) FPH sig(FPH)].
Keyword      PERMUTATION : Permutation set to 3 1 2
Keyword             GRID : Grid set to   128   256     1
Keyword        GRACYCLES : Plot every 80 cycles.
Keyword    GRATWOWINDOWS : Will keep conventional map plot.
Keyword      REFLECTIONS : start reading reflections.
Reflection rejected :   -12     0     8
Reflection rejected :    -4     0     8
Reflection rejected :     0     0     6
Reflection rejected :     0     0     7
Reflection rejected :     0     0    11
Reflection rejected :     2     0     4
Reflection rejected :     4     0    10
___________________________________________________________________________________________________________________________

...........................................................................................................................

Normal termination ? (32 seconds)


NOTE WELL : Because the normal probability plot is calculated with data expanded to P1, each point on the plot may actually correspond to a superposition of several symmetry-equivalent reflections. When you reject data, you MUST reject all symmetry equivalent reflections that are present in your P1 data set. Failure to do so will show-up in your maps as absence of the expected symmetry elements. Now : under normal circumstances the Normplot_tails.dat file will contain all symmetry equivalent reflections, except if these are near the assumed linear part of the plot. In this case, I'm afraid that you will have to manually add the indeces of the missing equivalents in the REJECT.HKL file (sorry).

Footnotes

... intercept12
Plotting will be performed only if GraphEnt was compiled with graphics support (ie with PGPLOT, see section 6.1)). Even in the absence of PGPLOT, the normal probability plot will still be calculated, and the numbers will be written to an ASCII file which can be used as input to almost any plotting program (file MAXENT_normal_prob_plot.dat).
... outliers13
As I understand it, the choice to treat as suspect (or even to reject) all reflections that give values of || FPH| - | FP|| more than something times the rmsd of isomorphous differences, is due to the inability of the conventional Fourier synthesis to take into account the standard deviations of the measurements. Let me give an example : suppose that for the 312 reflection, | FP312| = 103, (| FP312|) = 14, | FPH312| = 183, (| FPH312|) = 120, and assume for the sake of argument that the rmsd of the observed differences is 20 e-. Then, we can ignore the fact that the measurement of | FPH312| is loosy, and reject this reflection as highly improbable''. This is of course nonsense : the standard deviation of the difference | FPH312| - | FP312| is 134 e-, which means that with the observed difference of 80 e- we can not even say at the 50% significance level that the amplitudes | FP312| and | FPH312| are indeed different. The trouble is that if you include the reflection in your Fourier synthesis, it will probably make a mess out of your map because in the case of the conventional synthesis you treat all differences as if having zero standard deviation. Needless to say that the maxent map not only is insensitive to such differences, but that you should actually avoid rejecting anothing until you are certain that for some reason the standard deviations are wrong.

NMG, Nov 2002