Gromacs

Cut-off schemes

    This page describes the differences between the tradional cut-off scheme used in Gromacs and the new Verlet scheme introduced in version 4.6.

     

     

    Group vs Verlet pair-lists

    Traditionally Gromacs has used pair-lists based on groups of atoms. These groups of atoms were orginally charge-groups, which were necessary with plain cut-off electrostatics. With the use of PME (or reaction-field with a buffer) charge groups were no longer necessary. Most force fields and MD packages do not use charge groups. In Gromacs the group based cut-off scheme is still used, mainly because it allows for extremely efficient non-bonded kernels for water, which is the most abundant molecule in (bio-)molecular simulations. The group cut-off scheme can be combined with a buffered pair-list, but this is tedious as is needs to be combined with tabulated potentials with continuous energy and force at the cut-off.

    This main reason for implementing the, more common, buffered Verlet list scheme in version 4.6 was that a group scheme is inconvenient for streaming architectures such as GPUs. The new verlet scheme also works well on CPUs with SSE and AVX. Only for systems with a lot of water where energy conservation is not of primary concern the group pair-list scheme is faster. The Verlet list scheme has buffered neighborlists with exact cut-off's. Both the LJ and Coulomb potential are by default shifted to zero by subtracting the value at the cut-off. This ensures that the energy is the integral of the force. Still it is advisable to have small forces at the cut-off, hence to use PME or reaction-field with infinite epsilon.

    The Verlet list scheme uses a new code path for the non-bonded interactions. In this code path charge groups are completely ignored. Particle pair forces (and energies when necessary) are calculated in groups of 4vs4 or 4vs8 particles. This is convenient for streaming but leads to a significant amount of zero interactions being calculated beyond the cut-off; this does not happen in the standard setup with the group cut-off scheme.

    Non-bonded scheme feature comparison

    All Gromacs features not directly related to non-bonded interactions are supported in both schemes. The aim is to support all Gromacs non-bonded features in the new Verlet scheme, but the implementation might take some time. In the 4.6 release the most commonly used features are supported, such that all standard simulations should run by just setting the mdp option cutoff-scheme=Verlet. A table of compatible features is given below.

    Non-bonded interaction feature group Verlet
    unbuffered cut-off scheme X  
    buffered cut-off scheme X X
    exact cut-off shift/switch X
    cut-off X X
    reaction-field X X
    PME X X
    shifted interactions force+energy energy
    switched interactions X  
    dispersion correction X X
    non-periodic systems X Z  + walls
    implicit solvent X  
    free energy perturbed non-bondeds X  
    group energy contributions X only CPU
    energy group exclusions X  
    pull code, restraints, freeze, ... X X
    AdResS multiscale X  
    OpenMP multi-threading only PME X
    native GPU support   X

    Performance

    The performance of the group cut-off scheme depends very much on the composition of the system and the use of buffering. Due to very efficient non-bonded kernels for interations with water, anything with a lot of water runs very fast. But if you want properly buffered interactions, you need to add a buffer that takes into account both charge group size and diffusion. This makes simulations much slower. The performance of the Verlet scheme with the new non-bonded kernels is independent of system composition and is intended to always run with a buffered pair-list. Typically buffer size is 0 to 10% of the cut-off, so you could win a bit of peformance by reducing or removing the buffer, but this might not be a good trade-off.

    The table below shows a performance comparison of most of the relevant setups. Any atomistic model will have performance comparable to tips3p (which has LJ on the hydrogens), unless a united atom force field is used. The performance of a protein in water will be a mix of the tip3p and tips3p performance. The group scheme is optimized for water interactions, which means a single charge group containing one atom with LJ and 2 or 3 atoms without LJ. The "group" non-bonded kernels for water are roughly twice as fast as a comparable system with LJ and/or without charge groups. The implementation of the Verlet cut-off scheme has no specific optimizations, except for only calculating half of the LJ interactions if less than half of the particles have LJ. Note that the Verlet scheme completely ignores charge groups. For molecules solvated in water the scaling of the Verlet scheme is better than that of the group scheme, as the load is more balanced.

    Water, 3000 atoms, 1.0 nm cut-off, PME with 0.11 nm grid, dt=2 fs, Intel Core i7 2600, 3.4 GHz + Nvidia GTX660Ti
    system group, unbuffered group, buffered Verlet, buffered Verlet, buf, GPU
      8 MPI-threads 8 MPI-threads 8 OpenMP threads 8 OpenMP threads
    tip3p, charge groups 208 ns/day 116 ns/day 170 ns/day 450 ns/day
    tips3p, charge groups 129 ns/day 63 ns/day 162 ns/day 450 ns/day

    tips3p, no charge grps

    104 ns/day 75 ns/day 162 ns/day 450 ns/day

    How to use the Verlet scheme

    You can use the Verlet cut-off scheme simply by setting in your mdp file:

    cutoff-scheme = Verlet
    

    The verlet-buffer-drift option will by default add a pair-list buffer for a target energy drift of 0.005 kJ/mol/ns per atom. The effective drift is usually much lower, as grompp assumes constant particle velocities. (Note that in single precision for normal atomistic simulations constraints cause a drift somewhere around 0.0001 kJ/mol/ns per atom, so it doesn't make sense to go much lower.) Details on (combinations of) options are given in the mdp options manual page. When a GPU is used, nstlist is automatically increased by mdrun, usually to 20 or more; rlist is increased along to stay below the target energy drift. Currently cut-off, potential-shifted, LJ interactions and reaction-field and PME electrostatics are supported. Compared to the unbuffered group scheme, the Verlet scheme will have more accurate particle-particle PME (and LJ) forces. The published PME accuracy formulas apply directly to this setup. For balanced real/reciprocal space accuracy, you could therefore increase ewald_rtol to 1e-4 and increase the grid spacing by about 10% compared to the usual setup with the group scheme. But this will lead to a larger pair-list buffer and therefore the performance might not improve much. We are working on a more systematic study of the accuracy of PME options.

    With the Verlet scheme multi-level (MPI+OpenMP) and heterogenous GPU accelerated parallelization is supported, for details see the parallelization page

    Further informaton

    For further information on algorithmic and implementation details of the Verlet cut-off scheme, drift tolerance and the MxN kernels, as well as detailed performance analysis, consult the following article: 

    Páll, S. & Hess, B. A flexible algorithm for calculating pair interactions on SIMD architectures. Comput. Phys. Commun. 184, 2641–2650 (2013).

    Page last modified 21:26, 17 Aug 2015 by mabraham