PON parameterization

Introduction
This is additional infomation to the PON parameterization as described by http://www.teokem.lu.se/~ulf/Methods/ponparm.html

The method can be found in:

"Automated Molecular Mechanics Parameterization with Simultaneous Utilization of Experimental and Quantum Chemical Data", Per-Ola Norrby and Tommy Liljefors, ''J. Comput. Chem.'' 1998, 19, 1146-1166.

New implementation for Amber (Dec. 2006)'': P. Rydberg, L. Olsen, P.-O. Norrby and U. Ryde'' "A general transition-state force field for cytochrome P450 hydroxylation", J. Chem. Theory Comput. 2007, 3, 1765-1773

The example used is a parameterization carried out for the active site of [Fe]-hydrogenase (Hedegård, Ryde, In prep)

Preliminary Setup
The first many steps (1. to 11.) do not involve any parameter optimization but serve as preparation of the relevant files (obtaining a QM Hessian, and preparing amber input files with correct atom types etc.).

1. Structure and frequencies with Gaussian or Turbomole and calculate RESP charges (Gaussian)
With turbomole, optimize the structure as usual with
 * Structure and Frequencies (Gaussian or Turbomole)

jobex -backup -c 200 -ri

From this structure, insert:

$dipgrad  file=hessian

$hessian  file=hessian

$vibrational normal modes  file=hessian

$maxcor (70% of available memory is recommended)>

and then use

aoforce -backup -c 200 -ri

Further infomation on Turbomole in Lund can be found at http://www.teokem.lu.se/~ulf/Methods/turbomole.html
 * RESP charges is best done with Gaussian (make an example input file). Usually one can use the Merz-Kollmann (MK) scheme (see http://www.teokem.lu.se/~ulf/Methods/resp.html).
 * Construct pdb (from turbomole, use turbotopdb > pdb or mimic > pdb - the latter is just a Lund alias)
 * Use antechamber (see Link above - the link also contain information about more tricky cases such as polarizable force fields, or if there are problems with antechamber). If antechamber works, many of the steps below can be skipped and amberfiles res.in and res.dat will be genereted with correct M, B, E. 3 symbols).

2. Amber input files with antechamber or define
Describe: Note that we here only use describe to get parameters (start-guesses) and amber files (.in and .dat). Describe will also do several things in addition concerning the geometry - this is can be skipped.
 * Use antechamber (see Link above - the link also contain information about more tricky cases such as polarizable force fields, or if there are problems with antechamber). If antechamber works, many of the steps below can be skipped and amberfiles res.in and res.dat will be genereted with correct M, B, E. 3 symbols).
 * 2. In case antechamber does not work, we use the program describe to build an amber topology file, typically called residue.in e.g. feh.in in our case (do not renumber the atoms if not necessary). Describe extracts force constants of bonds, angles and dihedrals from the Hessian using the method desribed by: J. M. Seminario, Int. J. Quantum Chem. 1996, 60, 1271-1277.
 * 1) Run describe and follow instructions (see also comments below)
 * 2) Select input (turbomole here)
 * 3) Select scaling (for B3LYP, 1 is generally fine)
 * 4) The next things can be skipped for PON parametrization (geometry related)
 * 5) construct atmtyp file (press enter). Note: When we wish to reparameterize an active site, we will make a unique atomtype for each individual atom (and thus optimize parameters for all atoms). Methyl group hydrogens can, however, get the same atom types.
 * 6) The atmtype file will have the atom names from turbomole. Modify the atmtyp file (see template below), and save a copy of old res.in and res.dat files and rerun describe the same way as above but this time read in the atmtyp.
 * 7) Check the res.in file:
 * 8) * Add RESP charges if missing
 * 9) * Relabel atom names if necessary and fix M, S, B, 3 and E labels.
 * 10) * Note: M, S, B, 3 and E are most likely not correct for when constructed for metal centers by describe (in describe, default is M and heavy atoms and E on hydrogens). Therefore the file must be modified. M = Main chain, E = End atom, S = Side chain,3 = Usually for C in methyl groups). There is a short tutorial on these labels at http://ambermd.org/doc/prep.html Usually this will also mean that one has to relabel the atoms in the res.in file
 * 11) * Add LOOPS if missing (see example res.in file below) - this should be done for rings
 * 12) * Add IMPROPER if missing (see example res.in file below). Determine whether rings are flat (see phenyl alanine's .in file for an example or see below for feh.in)

atmtyp file (template below, taken from part of the active site of FeH2ase)
Parameters for Fe-H2ase active site, atom types unique for all atoms except CH3 and CH2, 7/1-16 # Title followed by blank line 

FEH </tt> # residue name

HA    Hc # atom_name (use the name from the pdb), atmom _type</tt>

CB    Cc</tt>

HB2   Hc</tt>

HB3   Hc</tt>

SG    Sc</tt>
 * 1) IMPORTANT: The atomtype must NOT coincide with existing atomtypes in Amber - this will lead to distaster!

res.in file example
0    0    2</tt>

feh.dat</tt>

Parameters for Fe-H2ase active site, atom types unique for all atoms except CH3</tt>

feh.dat</tt>

  FEH  INT     1</tt>

 CHANGE OMIT DU   BEG</tt>

   0.00000</tt>

   1  DUMM  DU    M     0.0000  0.0000   .0000   .0000</tt>

   2  DUMM  DU    M     0.0000  1.0000   .0000   .0000</tt>

   3  DUMM  DU    M     1.0000  1.0000   .0000   .0000</tt>

<tt>   4  HA    Hc    M     8.1434 -2.8437 21.6027  0.086770</tt>

<tt>   5  CB    Cc    M     7.6796 -3.8276 21.7690 -0.127063</tt>

<tt>   6  HB2   Hc    E     8.4609 -4.5998 21.6968  0.060572</tt>

<tt>   7  HB3   Hc    E     6.9388 -4.0010 20.9739  0.060572</tt>

<tt>   8  SG    Sc    M     6.9326 -3.8194 23.4452 -0.447847</tt>

<tt>   9  FE    Fe    M     6.1257 -5.9659 24.0301  0.501975</tt>

<tt>  10  CO1   Co    S     5.4501 -7.5149 24.6158  0.059706</tt>

<tt>  11  OC1   Oc    E     5.0305 -8.5111 25.0039 -0.188459</tt>

<tt>  12  CO2   cO    S     6.0879 -6.4412 22.3144  0.189272</tt>

<tt>  13  OC2   oC    E     6.0468 -6.7207 21.2028 -0.218542</tt>

<tt>  14  OW    Ow    B     4.1589 -4.8577 23.6759 -0.688580</tt>

<tt>  15  HW1   hW    E     3.7597 -5.1557 22.8461  0.384445</tt>

<tt>16  HW2   Hw    E     4.7340 -4.0969 23.4131  0.286564</tt>

<tt>  17  C8    C8    M     7.9473 -6.5351 24.1731  0.278762</tt>

<tt>  18  O8    o8    E     8.5986 -7.1974 23.3982 -0.472728</tt>

<tt>  19  C7    C7    M     8.6508 -5.9355 25.4082 -0.067468</tt>

<tt>  20  H71   h7    E     9.3054 -6.7007 25.8607  0.092807</tt>

<tt>  21  H72   h7    E     9.3150 -5.1425 25.0213  0.092807</tt>

<tt>  22  C6    C6    M     7.6916 -5.3643 26.4039 -0.207512</tt>

<tt>  23  C5    C5    S     8.1095 -4.9145 27.6584  0.091602</tt>

<tt>  24  C5M   C9    3     9.5606 -4.9479 28.0657 -0.109615</tt>

<tt>  25  HM1   h6    E     9.9441 -5.9815 28.1061  0.049705</tt>

<tt>  26  HM2   h6    E    10.1904 -4.4030 27.3435  0.049705</tt>

<tt>  27  HM3   h6    E     9.7038 -4.4957 29.0548  0.049705</tt>

<tt>  28  N1    Np    M     6.3955 -5.3413 26.0071 -0.027346</tt>

<tt>  29  C2    C2    M     5.4611 -4.9164 26.8651  0.226877</tt>

<tt>  30  O2    o2    S     4.1972 -5.0319 26.4304 -0.420783</tt>

<tt>  31  H2    h8    E     3.5798 -4.7157 27.1012  0.387308</tt>

<tt>  32  C3    C3    M     5.7633 -4.4030 28.1404 -0.107862</tt>

<tt>  33  C3M   C1    3     4.6777 -3.8845 29.0533 -0.158906</tt>

<tt>  34  H31   h9    E     5.0438 -3.0787 29.7127  0.066277</tt>

<tt>  35  H32   h9    E     3.8529 -3.4194 28.4873  0.066277</tt>

<tt>  36  H33   h9    E     4.2393 -4.6670 29.6993  0.066277</tt>

<tt>  37  C4    C4    M     7.1145 -4.4222 28.5276  0.139582</tt>

<tt>  38  O3P   o3    M     7.5112 -3.9809 29.7415 -0.446029</tt>

<tt>  39  H3P   h0    E     6.7491 -3.7001 30.2644  0.401176</tt>

<tt>LOOP</tt>

<tt>C4   C5</tt>

<tt>IMPROPER</tt>

<tt>C2   C6   N1   FE</tt>

<tt>C3   N1   C2   O2</tt>

<tt>C4   C2   C3   C3M</tt>

<tt>C5   C3   C4   O3P</tt>

<tt>C6   C4   C5   C5M</tt>

<tt>N1   C5   C6   C7</tt>

<tt>DONE</tt>

<tt>STOP</tt>

3. Construct prmtop and prmcrd files

 * Copy .dat file to frcmod.hemts (cp feh.dat frcmod.hemts)
 * Check non-bonded parameters in the frcmod.hemts (.dat) file: ALL hydrogens should have vdW parameters - parameters for sp2 and sp3 carbons should be different (see Amber force field for "inspiration")
 * run tleap -f leap_com (input: A pdb file for the residue with RESP chages, the res.in file and the res.dat (here called "frcmod.hemts"). A sample leap_com input is given below
 * Check parameters with changeparm
 * use changeparm (read prmtop and the pdb file in, i.e. here feh.pdb?)
 * Run command e (energies should be close to zero - changeparm will suggest new values, right?
 * Use new values and run changeparm again

leap_com input sample
Sample taken from FeH2ase

<tt>source leaprc.ff14SB # Use Amber 14 force field</tt>

<tt>addAtomTypes { # Add undefined atom types to prevent errors in tleap</tt>

<tt> {"Hc" "H" "sp3"}</tt>

<tt> {"Hw" "H" "sp3"}</tt>

<tt> {"hW" "H" "sp3"}</tt>

<tt> {"h6" "H" "sp3"}</tt>

<tt> {"h7" "H" "sp3"}</tt>

<tt> {"h8" "H" "sp3"}</tt>

<tt> {"h9" "H" "sp3"}</tt>

<tt> {"h0" "H" "sp3"}</tt>

<tt> {"C1" "C" "sp3"}</tt>

<tt> {"C7" "C" "sp3"}</tt>

<tt> {"C9" "C" "sp3"}</tt>

<tt> {"Cc" "C" "sp3"}</tt>

<tt> {"C2" "C" "sp2"}</tt>

<tt> {"C3" "C" "sp2"}</tt>

<tt> {"C4" "C" "sp2"}</tt>

<tt> {"C5" "C" "sp2"}</tt>

<tt> {"C6" "C" "sp2"}</tt>

<tt> {"C8" "C" "sp2"}</tt>

<tt> {"Co" "C" "sp2"}</tt>

<tt> {"cO" "C" "sp2"}</tt>

<tt> {"Np" "N" "sp2"}</tt>

<tt> {"o2" "O" "sp3"}</tt>

<tt> {"o3" "O" "sp3"}</tt>

<tt> {"o8" "O" "sp2"}</tt>

<tt> {"Ow" "O" "sp2"}</tt>

<tt> {"oC" "O" "sp2"}</tt>

<tt> {"Oc" "O" "sp2"}</tt>

<tt> {"Sc" "O" "sp3"}</tt>

<tt> {"Fe" "Fe" "sp3"}</tt>

<tt> }</tt>

<tt>loadAmberPrep feh.in</tt>

<tt>loadAmberParams frcmod.hemts</tt>

<tt>x=loadpdb feh.pdb</tt>

<tt>saveamberparm x prmtop prmcrd</tt>

<tt>quit</tt>

4. Construct the file hessian (this is the MM hessian)

 * Save a copy of the QM hessian from turbomole (cp hessian hessian-qm)

Note that for writing out the hessian a modified version of AMBER is needed (the modifications needed are shown here http://www.teokem.lu.se/~ulf/Methods/ponparm.html#Modifications_of_Amber_nmode). Test that the inputs are constructed correctly with: An MM hessian file (called "hessian" should now be constructed).
 * Construct the input files nmodemm.in (http://www.teokem.lu.se/~ulf/Methods/ponparm.html#Sample_nmodemm.in_file) and nmode.in (http://www.teokem.lu.se/~ulf/Methods/ponparm.html#Sample_nmode.in_file).
 * Run nmode -O -i nmodemm.in -o nmodemm.out -p prmtop -c prmcrd -r mdrest
 * Run nmodeh -O -i  nmode.in -o  nmode.out -p prmtop -c mdrest -r temp

1. Make PON input

 * run mkpar > paropt

the mkpar script uses the *dat file to construct the input in the right format for the PON program (CalcLoop). paropt contains 5 columns:


 * 1) the line in the *dat file # columns in .dat file  # the current value # step factor # format

The "values" are e.g. force constants and equilibrium distances/angles for bonded and angular parameters. For dihedrals, only the force constants are optimized.
 * Check paropt - remove parameters that should not be fitted (we did not remove any - only those already zeroth in the <tt>frcmod.hemts/*dat file</tt>

2. Make D1 and scaling files (contains Hessian to be read by PON programs )

 * run GetHess_t turbomole.out (gives D1.cal and scaling - if D1 exists, save a copy D1-orig)

3. Make list of dihedrals (large i.e. > 150 degrees should be removed from penalty function)

 * run PonInit

4. Run PON CalcLoop (can take a while, so submit it to the queue)

 * NOTE: The input in frcmod.hemts is heavily formatted. All commas must be aligned!
 * Check the paropt file for "***" This is an format error that sometimes occur!


 * run CalcLoop (output Loop.log)
 * Check convergence of penalty function with poncheck.sh (should be around 0.25*#data_points)
 * Check results and possibly re-run CalcLoop

5. Analysis and further parameter optimization after first run
It is normally not done just after one iteration.
 * run ponresult > pon.results. Look for (follows roughly the order of pon.results)
 * The penalty function value i.e. Chi^2 in Norrby & Liljefors (1998)
 * Number of negative second derivaties
 * Negative JTJ elements; J = Jacobian matrix, I guess -see Eqs. 6 and 7 in Norrby & Liljefors (1998)
 * Cases where the first derivative > second derivative: As quoted from Norrby & Liljefors 1998): "When close to an optimum value parameters set, the second derivatives should be positive and large compared to the first derivates" - When the opposite is the case, the corresponding parameter is probably bad...
 * # of zeroth force constants for - divide them into 1) bonds (bc), 2) angles (a) and dihedrals (dc)
 * # Poorly determinated parameters
 * r^2 (Pearson correlation coefficient)
 * MAD
 * Number of elements that differ > 100 between QM and MM hessian
 * RMS contributions to bonds ("distances and energy"), "angles and energy", dihedrals (w also non-zero fc)


 * Change parameters that needs change in frcmod.hemts (e.g. angles with force constants < 10 and equilibrium angles > 120 should be set large/smaller


 * Make copy of paropt (to paropt.old) and run ponmkpar. Run now CalcLoop again
 * NOTE: it can be necessary to re-run mkpar (MkPar > paropt) in between CalcLoop optimizations. (e.g. sometimes step-values are set to zero!). Therefore: Always carefully check paropt before starting the calculation!
 * CalcLoopHC is buggy: Do not run this.
 * When to stop: Low penalty function and no negative second derivatives and no zeroed bond or angle force constants is typically a good sign!

6. Merging PON with amber
The finished force field needs to be merged with the amber force fields. This is achieved by runnining

tleap -s -f leap.in

A sample leap file is given below with examples.
 * Atomtypes (as given in the xxx files) have to be defined in the amber style

/common/sw/alarik/pkg/bio/AMBER/Amber14/dat/leap/parm/
 * Make and new "junction.dat" file: Around the junctions to the active site, amber will not be able to locate even standard paramters (bonds, angles and dihedrals),  since parameters the standard amino acid atomtypes (as defined in amber) and the new PON atomtypes are not defined. The paramters can be adapted from the standard ones, but applying the PON atomtypes instead
 * Inspiration for *.in and *dat files can be found in

/common/sw/alarik/pkg/bio/AMBER/Amber14/dat/leap/prep/