GPU acceleration

    Table of contents
    No headers

    Version as of 12:43, 26 May 2020

    to this version.

    Return to Version archive.

    View current version

    erc_logojpg.jpgGromacs version 4.6 and later include a brand-new native GPU acceleration developed in Stockholm under the framework of a grant from the European Research Council (#209825), with heroic efforts in particular by Szilard Pall. This replaces all previous trial GPU code, and comes with a number of important features:


    • The new GPU code is fast, and we mean it. Rather than speaking about relative speed, or speedup for a few special cases, this code is typically much faster (3-5x) even when compared to Gromacs running on all cores of a typical desktop. If you put two GPUs in a high-end cluster node, this too will result in a significant acceleration.
    • We have designed a new architecture where we use both CPUs and GPUs for the calculation.
      • This means we support a much wider range of settings with GPUs - pretty much any interactions based on reaction-field or PME work.
      • It also means we can use multiple GPUs efficiently, and the GPU acceleration works in combination with Gromacs' domain decomposition and load balancing code too, for arbitrary triclinic cells.
      • By using the CPU for part of the calculation, we retain full support for virtual interaction sites and other speedup techniques - you can use GPUs with very long timesteps and maintain accuracy (we're not simply making the hydrogens heavy, but properly removing selected internal degrees of freedom).
    • GPU acceleration is now a core part of Gromacs - as long as you have the Cuda development libraries installed it will be enabled automatically during Gromacs configuration.


    The underlying idea of the new GPU acceleration is the core md loop, as illustrated below. There are lots of things that are computed every step, but the most compute-intensive part is the nonbonded force calculation. The biggest assett of Gromacs - and our biggest challenge - is that this iteration is already very fast when running on CPUs in Gromacs, sometimes in the order of half a millisecond. Incidentally, this is why it has been so difficult to get a substantial GPU speedup in Gromacs - if each step took 100ms it would be trivial to speed up (which is why some codes show amazing relative speed). 





    Page last modified 08:28, 4 Nov 2013 by lindahl