Heterogeneous parallelization and GPU acceleration

../_images/md-tasks-offload-sketch_v4.png

From laptops to the largest supercomputers, modern computer hardware increasingly relies on graphics processing units (GPU) along CPUs for computation. GPUs have revolutionized the field of MD and made high simulation performance both accessible as well as cost-effective. GROMACS has supported GPU acceleration since version 4.5 (2010), and natively since version 4.6 (2013). The GROMACS native GPU support relies on a heterogeneous parallelization scheme which uses both CPU and GPU in parallel for computation. This not only allows harnessing each compute unit for the tasks they are best enabling greater simulation performance, but also provides a solid foundation for broad feature support. The GROMACS simulation engine is designed with the goal to be able to flexibly assign tasks either to CPU or GPU. The most computationally intensive tasks, typically most force computation, like the short-range nonbonded and long-range PME electrostatics are offloaded to the GPU, while the CPU can compute other tasks like bonded interactions in parallel, as well as doing complex tasks like domain decomposition and neighbor search. This scheme ensures that regardless off the features used, nearly all GROMACS simulations can make use of GPU acceleration, without having the entire feature set ported to GPUs. This is made possible by computing such features not ported to GPUs on the CPU instead and doing so in parallel with GPU execution. GROMACS has highly optimized CPU code, hence many features can be seamlessly supported without significant performance loss. The majority of computational performance is increasingly provided by GPUs and these come with specialized high-performance interconnects. To make efficient use of high performance GPUs and GPU-dense servers, GROMACS releases since 2020 also supports a GPU-resident mode and direct GPU communication which aim to prioritize keeping the GPU busy while leaving the CPU available to support features like AWH and free energy calculation.

Reference

Szilárd Páll, Artem Zhmurov, Paul Bauer, Mark Abraham, Magnus Lundborg, Alan Gray, Berk Hess, and Erik Lindahl, “Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS”, J. Chem. Phys. 153, 134110 (2020) doi:10.1063/5.0018516