Versions 4.6.x

    Version as of 23:38, 17 Oct 2019

    to this version.

    Return to Version archive.

    View current version

    Release notes for 4.6

    New features
    • New Verlet non-bonded scheme which, by default, uses exact cut-off's and a buffered pair-list.
    • Multi-level hybrid parallelization (MPI + OpenMP + CUDA):
      • full OpenMP multithreading with the Verlet scheme;
      • OpenMP mulitthreading for PME-only nodes with the group scheme;
      • native GPU acceleration using CUDA (supporte NVIDIA hardware).
    • New x86 SIMD non-bonded kernels for the usual cut-off scheme, called group scheme and the new verlet scheme, use x86 SIMD intrinsics (no more assembly code):
      • SSE2
      • SSE4.1
      • AVX-128-FMA (for AMD Bulldozer/Piledriver)
      • AVX-256 (for Intel Sandy/Ivy Bridge)
    • Automated OpenMP thread count choice to use all available cores.
    • Automated CPU affinity setting: locking processes or threads to cores.
    • Automated PP-PME (task) load-balancing: balancing non-bonded force and PME mesh workload when the two are executed on different compute-resources (i.e CPU and GPU or different CPUs). This enables GPU-CPU and PP-PME process load balancing by shifting work from the mesh to the non-bonded calculation.
    • PPPM/P3M with analytical derivative at the same cost and with the same features as PME.
    • New, advanced free energy sampling techniques.
    • Build configuration now uses CMake, configure+autoconf/make no longer supported. (The CMake build system features with a lot of automation and cleverness under the hood and we know that the it might not always prove to be as rock-solid as the old one. However, far more advanced and complex, so bare with us while we iron out issues that come up along the way.)

    No critical bugfixes. All important fixes are also present in 4.5.6 and documented there.

    Changes that might affect your results

    None for simulations set up with the traditional group cut-off scheme.

    When switching from the group scheme to the Verlet scheme, simulations can get more accurate due to the exact cut-off treatment and buffering (this will, of course, depend on the original cut-off settings used). See the section Cut-off schemes for details.

    Other important changes compared to 4.5
    mdrun does now thread affinity setting by deafult
    This means that when runing multiple mdrun processes on the same machine, one has to either provide a core "pin offset" using the -pinoffset command line option, or turn off internall affinities and take the performance hit (or alternatively manage affinities externally).
    The choice of compiler matters more
    With the switch to SIMD intrinsics, up-to-date SIMD CPU acceleration support, OpenMP, the compiler used matters more, both in terms of performance and ability to compile GROMACS correctly. The recommended compilers that are known to work (=compile GROMACS correctly) and provide good performance on x86/AMD64 are gcc 4.5 and later, Intel Compilers 12.0 and clang 3.1 (note the lack of OpenMP support which can cause 30%+ performance loss). For further details see ???.

    4.6-beta2 (2012-12-06)

    Bugfixes and improvements
    • re-enabled AdResS feature (only generic kernels for now);
    • improved OpenMP parallelization performance of non-bonded force calculation with Verlet scheme;
    • fixed segv in Verlet pair-search with trilinic domain-decomposition;
    • fixed incorrect virial with virtual sites and OpenMP;
    • fixed labelling of g_hbond plots;
    • fixed compilation issue with cmake 2.8.10 and GPU acceleration;
    • fixed issues with multi-sim runs and GPU-acceleration.

    4.6-beta1 (2012-11-30)

    First beta, yay*! See the release notes above.

    (*No previous version in the 4.6 series so no list of bugfixes and improvements here.)

    Page last modified 23:40, 19 Dec 2012 by pszilard