Gromacs

GROMACS-OpenMM

    Version as of 16:26, 22 Sep 2019

    to this version.

    Return to Version archive.

    View current version

    With the release of version 4.5, GROMACS provides support for GPU-accelerated MD simulations using the OpenMM library, which is developed as part of the Simbios NIH Center for Biomedical Computation at Stanford. This development of freely available open source software (both the Gromacs and OpenMM parts) would not have been possible without generous support both in the EU (ERC, SSF, VR) as well as the US (NIH, NSF), which are all kindly acknowledged. 

     

    The current release is targeted at developers and advanced users and care should be taken before production use. Please note that this is an area in rapid progress!

     

    When trying out the accelerated GROMACS-GPU binaries, please be aware that you might have to change some settings to really make your simulations shine on GPUs. There are already some fields (e.g. Implicit solvent) where the Cuda version is way ahead of the CPU, and for other stuff (in particular PME) you can improve performance by adjusting your settings. 

    Limitations

    The following should be noted before using the GPU accelerated mdrun-gpu:

    • The current release runs only on modern nVidia GPU hardware with CUDA support. Make sure that the necessary CUDA drivers and libraries for your operating system are already installed.
    • Multiple GPU cards are not supported.
    • Only a fairly small subset of the GROMACS features and options are supported on the GPUs. See below for a detailed list.
    • Consumer level GPU cards are known to often have problems with faulty memory. It is recommended that a full memory check of the cards is done at least once (for example, using the memtest=full option). A partial memory check (for example, memtest=15) before and after the simulation run would help spot problems resulting from overheating of the graphics card.
    • The maximum size of the simulated systems depends on the available GPU memory, for example, a GTX280 with 1GB memory has been tested with systems of up to about 100,000 atoms.
    • In order to take a full advantage of the GPU platform features, many algorithms have been implemented in a very different way than they are on the CPUs. Therefore numercal correspondence between some properties of the systems' state should not be expected. Moreover, the values will likely vary when simulations are done on different GPU hardware. However, sufficiently long trajectories should produce comparable statistical averages.
    • Frequent retrieval of system state information such as trajectory coordinates and energies can greatly influence the performance of the program due to slow CPU<–>GPU memory transfer speed.
    • MD algorithms are complex, and although the Gromacs code is highly tuned for them, they often do not translate very well onto the streaming architetures. Realistic expectations about the achievable speed-up from tests with GTX280: for small protein systems in implicit solvent using all-vs-all kernels the acceleration can be as high as 20 times, but in most other setups involving cutoffs and PME the acceleration is usually only about 5 times relative to a 3GHz CPU core. 

    Supported features

    • Integrators: md/md-vv/md-vv-avek, sd/sd1 and bd. OpenMM implements only the velocity-verlet algorithm for MD simulations. Option md is accepted but keep in mind that the actual algorithm is not leap-frog. Thus all three options md, md-vv and md-vv-avek are equivalent. Similarly, options sd and sd1 are also equivalent.
    • Long-range interactions: Reaction-Field, Ewald, PME, No-cutoff, Cut-off.
      • for No-cutoff use rcoulomb=0 and rvdw=0
      • for Ewald summation only 3D geometry is supported, while dipole correction is not.
      • the cut-off method is supported only for implicit solvent simulations.
    • Temperature control: Supported only with the sd/sd1, bd, md/md-vv/md-vv-avek integrators. OpenMM implements only the Andersen thermostat. All values for tcoupl are thus accepted and equivalent to andersen. Multiple temperature coupling groups are not supported, only tc-grps=System will work.
    • Force Fields: Supported FF are Amber, CHARMM. GROMOS and OPLS-AA are not supported.
      • CMAP dihedrals in CHARMM are not support, so use the -nocmap option with pdb2gmx.
    • Implicit solvent: Supported only with reaction-field electrostatics. The only supported algorithm for GB is OBC, and the default Gromacs values for the scale factors are hardcoded in OpenMM, i.e. obc alpha=1, obc beta=0.8 and obc gamma=4.85.
    • Constraints: Constraints in OpenMM are done by a combination of SHAKE, SETTLE and CCMA. Accuracy is based on the SHAKE tolerance as set by the shake_tol option.
    • Periodic Boundary Conditions: Only pbc=xyz and pbc=no in rectangular cells (boxes) are supported.
    • Pressure control: OpenMM implements the Monte Carlo barostat. All values for Pcoupl are thus accepted.
    • Simulated annealing: Not supported.
    • Pulling: Not supported.
    • Restraints: Distant, orientational, angle and dihedral restraints are not supported in the current implementation.
    • Free energy calculations: Not supported in the current implementation.
    • Walls: Not supported.
    • Non-equilibrium MD: Option acc_grps is not supported.
    • Electric Fields: Not supported.
    • QMMM: Not supported. 

    Installing and running GROMACS-GPU

    Using precompiled binaries

    Gromacs-GPU can be installed either from the officially distributed binary or source packages. We provide precompiled binaries built for and tested on the most common Linux, Windows, and Mac OS operating systems. Using the binary distribution is highly recommended and it should work in most cases. Below we summarize how to get the GPU accelerated mdrun-gpu work.

    Prerequisites

    The current GROMACS-GPU release uses OpenMM acceleration, the necessary libraries and plugins are included in the binary packages for version 2.0. Both the OpenMM library and Gromacs-GPU require version 3.1 of the CUDA libraries and compatible NVIDIA driver (i.e. version > 256). Last but not least, to run GPU accelerated simulations, a CUDA-enabled graphics card is necessary. Molecular dynamics algorithms are very demanding and unlike in other application areas, only high-end graphics cards are capable of providing performance comparable to or higher then modern CPUs. For this reason, mdrun-gpu is compatible with only a subset of CUDA-enabled GPUs (for detailed list see section 6.9.3) and by default it does not run if detects non-compatible hardware. For details about compatibility of NVIDIA drivers with the CUDA library and devices consult the NVIDIA developer page.

    Summary of prerequisites:
        • NVIDIA CUDA libraries;
        • NVIDIA driver;
        • NVIDIA CUDA-enabled GPU.

    Downloads

    GROMACS-GPU binaries: 

    Source OS Release Date MD5-sum
    gromacs-4.5-GPU-beta2_linux-X86_64.tar.gz Ubuntu 9.10 2010-08-02 afbda83bdced97a1574ef7960ee92c54
    gromacs-4.5-GPU-beta2_linux-X86.tar.gz Ubuntu 9.10 2010-08-03 8848ab571d36779728c79b52a6c9022a
    gromacs-4.5-GPU-beta2-MacOS.tar.gz MacOS 10.5 2010-08-03 41a69de5e9635ec0db828f19b1b845d2
    gromacs-4.5-GPU-beta1_windows-X86.tar.gz   2010-05-31 265639e2dae51d08ee1b4f57cb8b5ede

    Note: For Linux distributions with older glibc, such as CentOS 5.4, the binaries must be recompiled from source (see below).

    Installing

    1. Download and unpack the binary package for the respective OS and architecture. Copy the content of the package to your normal Gromacs installation directory (or to a custom location). Note that the distributed Gromacs-GPU packages do not contain the entire set of tools and utilities included in a full Gromacs installation. Therefore, it is recommended to have a ≥v4.5 standard Gromacs installation along the GPU accelerated one.
    2. Add the openmm/lib directory to your library path, e.g. in bash:
      export LD_LIBRARY_PATH=path_to_gromacs/openmm/lib:$LD_LIBRARY_PATH
      If there are other OpenMM versions installed, make sure that the supplied libraries have preference when running mdrun-gpu. Also, make sure that the CUDA libraries installed match the version of CUDA with which Gromacs-GPU is compiled.
    3. Set the OPENMM_PLUGIN_DIR environment variable to contain the path to the openmm/lib/plugins directory, e.g. in bash:
      export OPENMM_PLUGIN_DIR=path_to_gromacs/openmm/lib/plugins
    4. At this point, running the command path_to_gromacs/bin/mdrun-gpu -h should display the standard mdrun help which means that the binary runs and all the necessary libraries are accessible.

    Compiling and custom installation of GROMACS-GPU

    The GPU accelerated mdrun can be compiled on Linux, Mac OS and Windows operating systems, both for 32 and 64 bit. Besides the prerequisites discussed above, in order to compile mdrun-gpu the following additional software is required:

    • Cmake version ≥ 2.6.4
    • CUDA-compatible compiler:
      • MSVC 8 or 9 on Windows
      • gcc 4.4 on Linux and Mac OS
    • CUDA toolkit 3.1
    • OpenMM-2.0 libraries and header files
      • NB: this version has a bug in the CUDA platform where the velocities are half the value of what they should be but only in the first integration step. This has been fixed in the OpenMM-svn repository and will be available in future releases of the library.

    Note that the current Gromacs-GPU release is compatible with OpenMM version 2.0. While future versions might be compatible, using the officially supported and tested OpenMM versions is strongly encouraged. OpenMM binaries as well as source code can be obtained from the project's homepage, and you can read more about the underlying idea in the paper. Also note that it is essential that the the same version of CUDA is used to compile both mdrun-gpu and the OpenMM libraries.

    To compile mdrun-gpu change the directory top level of the source tree and execute the following commands:

    export OPENMM_ROOT_DIR=path_to_custom_openmm_installation
    cmake -DGMX_OPENMM=ON [-DCMAKE_INSTALL_PREFIX=desired_install_path]
    make mdrun
    make install-mdrun

    Gromacs-GPU specific mdrun features

    Besides the usual command line options, mdrun-gpu also supports a set of “device options”, that are meant to give control over acceleration related functionalities. These options can be used in the following form:

    mdrun-gpu -device "ACCELERATION:[DEV OPTION=VALUE,]... [OPTION].."
    

    The option-list prefix ACCELERATION specifies which acceleration library should be used. At the moment, the only supported value is OpenMM. This is followed by the list of comma-separated DEV_OPTION=VALUE option-value pairs which define parameters for the selected acceleration platform. The entire device option string is case insensitive. Below we summarize the available options (of the OpenMM acceleration library) and their possible values.

    Platform Selects the GPGPU platform to be used, currently the only supported value is CUDA (in future OpenCL support will be added).

    DeviceID The numeric identifier of the CUDA device on which the simulation will be carried out. The default value is 0, i.e the first device.

    Memtest GPUs, especially consumer-level devices, are prone to memory errors. There might be various reasons for "soft errors" to happen including (factory) overclocking, overheating, faulty hardware etc, but the result is always the same: unreliable, possibly incorrect results. Therefore, gromacs-gpu has a built-in mechanism for testing the GPU memory in order to catch the obviously faulty hardware. A set of tests are performed before and after each simulation and if errors are detected, the execution is aborted.  Accepted values for this option are any integer ≤15 with an optional “s” prefix representing the approximate amount of time in seconds that should be spent on testing, the default value is memtest=15s. It is possible to completely turn off memory testing by setting memtest=off, however this is not advisable.

    Force-device Option that enables running mdrun-gpu devices that are not supported but CUDA-capable. Using this option might results in very low performance or even crashes and therefore it is not encouraged. Note, that both the option names and the values are case-insensitive.

    Hardware and software compatibility list

    Compatible OpenMM versions:

    • v2.0

    Compatible NVIDIA CUDA versions (also see OpenMM version compatibility above):

    • v3.1

    Compatible hardware (for details consult the NVIDIA CUDA GPUs list):

    • G92/G94:
      • GeForce 9800 GX2/GTX/GTX+/GT
      • GeForce 9800M GT
      • GeForce GTS 150, 250
      • GeForce GTX 280M, 285M
      • Quadro FX 4700
      • Quadro Plex 2100 D4
    • GT200:
      • GeForce GTX 260, 270, 280, 285, 295
      • Tesla C1060, S1070, M1060
      • Quadro FX 4800, 5800
      • Quadro CX
      • Quadro Plex 2200 D2, 2200 S4
    • GF100 (Fermi)
      • GeForce GTX 460, 465, 470, 480
      • Tesla C2050, C2070, S2050, S2070
         

    GPU Benchmarks

    Apart from interest in new technology and algorithms, the obvious reason to do simulations on GPUs is to improve performance, and you are most likely interested in speedup relative to the CPU version. This is of course our target too, but it is important to understand that the heavily accelerated/tuned assembly kernels we have developed for x86 over the last decade makes this relative speedup a quite difficult challenge! Thus, rather than looking at relative speedup you should compare raw absolute performance for matching settings. Relative speedup is meaningless unless you use the same comparison baseline!

    In general, the first important point to get achieve good performance is to understand that GPUs are different from CPUs. While you can try to just run your present simulation you might get significantly better performance with slightly different settings, and if you are willing to make more fundamental trade-offs you can sometimes even get order-of-magnitude speedups. 

    Due to the different algorithms used some of the parameters in the input mdp files are interpreted slightly differently for the GPU, so below we have created a set of benchmarks that try to create settings that are as close to equivalent as possible for a CPU and GPU. This is not to say they will be ideal for you, but by explaining some of the differences we hope to help you make an informed decision, and hopefully use the hardware in a better way.

     

    General advantages of the GPU version

    • The algorithms used on the GPU will automatically guarantee that all interactions inside the cutoff are calculated every step, which effectively is equivalent to a slightly longer cutoff.
    • The GPU kernels excel at floating-point intensive calculations, in particular OBC implicit-solvent calculations.
    • Due to the higher nonbonded kernel performance, it is quite efficient to use longer cutoffs (which is also useful for implicit solvent)
    • The accuracy of the PME solver is slightly higher than the default Gromacs values. The kernels are quite conservative in this regard, and never resort to linear interpolation or other lower-accuracy alternatives.
    • It beats the CPU version in most cases, even when compared to using 8 cores on a cluster node (The CPU version is automatically threaded from v. 4.5)

     

    General disadvantages of the current GPU version

    • Parallel runs don't work yet. We're still working on this, but to make a long story short it's very challenging to achieve performance that actually beats multiple nodes using CPUs. This will be supported in a future version, though.
    • Not all Gromacs features are supported yet, such as triclinic unit cells or virtual interaction sites required for virtual hydrogens.
    • Forcefields that do not use combination rules for Lennard-Jones interactions are not supported yet.
    • You cannot yet run multi-million atom simulations on present GPU hardware with 2-4GB of memory.
    • File I/O is more expensive relative to the CPU version, so be careful not to write coordinates every 100 steps!

     

    Benchmark systems

    To help you evaluate hardware and provide settings you can copy in your own simulations we have created a couple of comparison systems. The are all based on the 159-residue protein dihydrofolate reductase with either ~7000 waters or implicit solvent.

     

     

     

     

    Benchmark results

     

    Recommendations

    It is ultimately up to you as a user to decide what simulations setups to use, but we would like to emphasize the simply amazing implicit solvent performance provided by GPUs. If money and resources are completely unimportant you can always get quite good parallel performance from a CPU-cluster too, but by adding a GPU to a workstation it is suddenly possible to reach the microsecond/day range e.g. for protein model refinement. Similarly, GPUs excel at throughput style computations even in clusters, including many explicit-solvent simulations (and here we always compare to using all x86 cores on the cluster nodes). We're certainly interested in your own experiences and recommendations/tips/tricks!

    Page last modified 19:03, 12 Oct 2010 by lindahl