Versions 4.6.x

    Version as of 23:08, 17 Oct 2019

    to this version.

    Return to Version archive.

    View current version

    Release notes for 4.6 (2013-01-19)

    New features
    • New Verlet non-bonded scheme which, by default, uses exact cut-off's and a buffered pair-list.
    • Multi-level hybrid parallelization (MPI + OpenMP + CUDA):
      • full OpenMP multithreading with the Verlet scheme;
      • OpenMP mulitthreading for PME-only nodes with the group scheme;
      • native GPU acceleration using CUDA (supporte NVIDIA hardware).
    • New x86 SIMD non-bonded kernels for the usual cut-off scheme, called group scheme and the new verlet scheme, use x86 SIMD intrinsics (no more assembly code):
      • SSE2
      • SSE4.1
      • AVX-128-FMA (for AMD Bulldozer/Piledriver)
      • AVX-256 (for Intel Sandy/Ivy Bridge)
    • Automated OpenMP thread count choice to use all available cores.
    • Automated CPU affinity setting: locking processes or threads to cores.
    • Automated PP-PME (task) load-balancing: balancing non-bonded force and PME mesh workload when the two are executed on different compute-resources (i.e CPU and GPU or different CPUs). This enables GPU-CPU and PP-PME process load balancing by shifting work from the mesh to the non-bonded calculation.
    • PPPM/P3M with analytical derivative at the same cost and with the same features as PME.
    • New, advanced free energy sampling techniques.
    • AdResS adaptive resolution simulation support.
    • Build configuration now uses CMake, configure+autoconf/make no longer supported. (The CMake build system features with a lot of automation and cleverness under the hood and we know that the it might not always prove to be as rock-solid as the old one. However, far more advanced and complex, so bear with us while we iron out issues that come up along the way.)
    • g_hbond now utilizes OpenMP.

    No critical bugfixes. This version is based on 4.5.6 and all important fixes are "inherited"  and therefore documented in the 4.5.6 release notes.

    Changes that might affect your results

    None for simulations set up with the traditional group cut-off scheme.

    When switching from the group scheme to the Verlet scheme, integration of the equations of motion can get more accurate due to the exact cut-off treatment and buffering (this will, of course, depend on the original cut-off settings used). See the section Cut-off schemes for details.

    Other important changes compared to 4.5
    mdrun does now thread affinity setting
    This means that when runing multiple mdrun processes on the same machine, one has to either provide a core "pin offset" using the -pinoffset command line option, or turn off internal affinities and take the performance hit (or alternatively manage affinities externally).
    The choice of compiler matters more
    With the switch to SIMD intrinsics, up-to-date SIMD CPU acceleration support, OpenMP, the compiler used matters more both in terms the ability to compile GROMACS correctly  and from the point of view of mdrun performance. The recommended compilers that are known to work (=compile GROMACS correctly) and provide good performance on x86/AMD64 are: gcc 4.5 and later, Intel Compilers 12.0 and later and clang 3.1 (note the lack of OpenMP support which can cause 30%+ performance loss). In all cases you are strongly advised to use the most recent patch level available. GROMACS makes extensive use of compiler intrinsics to get the most out of your hardware, so if you use a compiler that is older than your hardware you are asking for trouble, because all the compilers have had bugs in their intrinsics implementations. For further details see ???.
    Bugfixes and improvements since beta3
    • fixed performance bug with SD+BD integration and OpenMP multi-threading (#1121)
    • fixes to free energy code, output & g_bar compatibility (#1090)
    • fixed multi-threading with hybrid GPU+CPU mode (#1100)
    • fixed GB interactions (#1096)
    • fixed issues with pressure control and infrequent evaluation
    • fixes for md-vv and rerun
    • fixed resetting states with parallel Verlet scheme
    • fixed nbnxn no LJ comb.rule AVX256 PME kernel
    • fix for compiler flag handling (#1038, #1040)
    • fixed bug with Verlet + DD + bonded atom communication
    • fixed SSE/AVX compilation under Windows (#1092, #1093, #1068)
    • fixed a bug with multiple exchanges
    • fixed nbnxn AVX-256 Ewald table pointer alignment
    • fixed GMXRC so we are not polluting standard shell variables
    • thread-MPI fixes for i386 llvm & simplification of atomics
    • added work-around for gcc bug in AVX intrinsincs formal parameter


    • regressiontests can be run from the build tree now (make check)
    • efficiency improvements for PP-PME load balancing + DD DLB
    • using topology information for thread affinity setting
    • added mdp option 'calc-lambda-neighbors'
    • made g_tune_pme work correctly again with thread-MPI mdrun
    • Blue Gene build system support and documnetation
    • made SHAKE work again with particle decompostion
    • g_sans - add trajectory avereging


    4.6-beta3 (2012-12-22)

    Bugfixes and improvements
    • fixed pressure in MTTK when using constrants and dispersion correction (#1061)
    • fixed expanded ensemble and Hamiltonian replica exchange
    • fixed Andersen temperature coupling use of random number generator in parallel
    • fixed Andersen temperature coupling when constraints present
    • fixed LINCS with virtual sites - there were bugs with how LINCS did constraining based upon the forces introduced when some parallelization was added previously
    • forced c++ linking of GPU utility routines
    • fixed GPU pair search when not using x86 CPU acceleration (#1042, #1062)
    • fixed compilation with PGI compiler
    • fixes to pull code (#1071)
    • fixed license details reported by GROMACS tools
    • completed removal of Fortran kernels (we are not aware of any systems where these would run faster than the corresponding non-accelerated C kernels by enough to be worth our effort, and probably the new force-only C kernels will be faster than the old Fortran kernels on any system where the disparity between Fortran and C compiler optimization is noticeable; speak up if any of this is a problem for you!)
    • completed removal of Power6 accelerated kernels (currently we lack the resources to implement accelerated kernels for Power architectures, and probably the new force-only C kernels will show results comparable with the old accelerated Power kernels; speak up if any of this is a problem for you - particularly if you have resources to offer to fix it!)
    • re-implemented Verlet kernels on AVX-256 hardware for better performance
    • prepared Verlet kernels for future development of non-x86 SIMD support (e.g. BlueGene SIMD acceleration support is planned)
    • improvements to functioning and documentation of mechanism to specify GPU IDs to mdrun 
    • improved communication when using P-LINCS and only constraints on bonds to hydrogen
    • GROMACS provides template code for user implementations of a custom GROMACS tool in share/template, which now builds by default (and correctly!), but does not install
    • fixed header file for use by external code linking to GROMACS
    • fixes to the Reference CMake build type we used for generating reference versions of our regression tests
    • added CMake option to disable printing of GROMACS cool quotes
    • minor improvements to CMake messages to user
    • CMake cleanup

    4.6-beta2 (2012-12-06)

    Bugfixes and improvements
    • re-enabled AdResS feature (only generic kernels for now);
    • improved OpenMP parallelization performance of non-bonded force calculation with Verlet scheme;
    • fixed segv in Verlet pair-search with trilinic domain-decomposition;
    • fixed incorrect virial with virtual sites and OpenMP;
    • fixed labelling of g_hbond plots;
    • fixed compilation issue with cmake 2.8.10 and GPU acceleration;
    • fixed issues with multi-sim runs and GPU-acceleration.

    4.6-beta1 (2012-11-30)

    First beta, yay*! See the release notes above.

    (*No previous version in the 4.6 series so no list of bugfixes and improvements here.)

    Page last modified 12:58, 19 Jan 2013 by pszilard