Regression Tests

    Version as of 22:15, 25 Aug 2019

    to this version.

    Return to Version archive.

    View current version

    The Gromacs regression tests are mainted on gerrit. They can be obtained by

    git clone

    or  for developers:

    git clone ssh://

    To add new tests for mdrun features, consult the policy below, then

    1. Create a new folder inside e.g. complex
    2. Create inside your test folder grompp.mdp, conf.gro and (and itp files if your requires those)
    3. Run both in single and double precision to create reference values
    4. Upload tests to gerrit. The commit message should contain the version number / commit id of the code version use to generate the reference values.

    To add new tests for tools

    1. Create a new cfg file or add lines to an existing cfg file in the tools folder
    2. Run to create referene values 
    3. Upload tests to gerrit

    An example commit which adds tests is here.

    Jenkins automatically runs all tests for all build configurations. The regressiontests results are show on the Jenkins page under "Test" and a test report looks like this.

    Jenkins also automatically computes the code coverage. This is currently only done when the regressiontests are modifified not when the tests are modified. The code coverage of the regressiontests for the 4.5 source is here and for the regressiontests and the unit tests together for the master branch source is here.

    Policy for generating regression tests

    Goal: Provide a way to detect errors introduced in old code paths by new or modified features
    Method: Compare output of reference input and reference code with the same input with new code, and use that to signal significant differences

    Issues to be resolved

    Full conditions of the reference version of the test needs to be documented. We don’t know now what bug we might suspect later might have been present in either the test or the reference case. We need to provide as much information as we reasonably can now, so that we can attempt to replicate it exactly later if that’s needed when trouble happens. This means we need:

    • code requirements
      • code needs a commit hash that exists in the main GROMACS repo (so not some developer repo that might get rebased, and definitely not a dirty one)
      • what branch the commit is on is of secondary importance, but has to make sense for the functionality being tested and the time span over which that functionality has been stable (or not!)
    • compiler requirements
      • open source
      • code base likely to remain available and compilable on future hardware
      • version we trust from previous experience
      • we’re not going to make trouble by requiring test makers to have a particular hand-compiled compiler whose pedigree we know - Mark would rather have an extra test contributed by a volunteer developer than really be able to know whether a particular compiler bug was present in our compiler when we made the reference case (if/when we do need to know that about some suspected compiler bug, then we will be recompiling with later versions of that compiler anyway, and those will tell us what we need to know). Yes, this is not perfect.
      • so gcc 4.7 sounds like a good way to do the above
    • build requirements
      • -O0 optimization level (Mark’s reflex was to be happy with -O2, but for some kinds of tests being sure of IEEE conformance with -O0 is worth it, because we are often trying to cut numerical corners and want to have chances of knowing we’re wrong). This means that if the compiler can’t be reproduced for some reason (e.g. vendor won’t support it) there’s a decent chance that some other compiler will do a comparable job.
      • Later, if we later add regular checking of ensemble consistency over longer time scales, we aren’t going to be doing that with -O0 code. One of the purposes of such tests is to do so under “battle conditions” where we can’t control numerical reproducibility (e.g. in parallel).
      • use built-in versions of GROMACS dependencies (Generally, we will be regularly testing code that uses the external dependencies that we encourage people to use for performance, so we can be reasonably sure that this process will gradually eliminate bugs in our built-in code. When tracing problems, we don’t have the resources to go looking for whether bugs existed in the external libfoo used during the test. We don’t want to constrain test creators to have to install some particular version of some library they don’t care about. We don’t want to have to discuss updating that dependency. We don’t want to have to try to detect and document dependency versions (unless, say, the test is actually about whether FFTW is conformant to our requirements)
      • minimize the use of acceleration levels at configure time (no SIMD, no GPU, generally no parallelism)
    • test case requirements
      • reasonably likely to be responsive to a range of possible problems without producing false signal. The reproducible quantity might be numerical or an ensemble average (with error estimates), or other things?
      • Mark doubts we will ever be able to afford to write function-level test code, so the purpose of the tests is to signal, rather than diagnose, the problem
    • run time requirements
      • the code path of the reference test should depend somewhat on the nature of the test, but generally use code that is easy to check by eye for correctness (so interaction-specific C non-bonded kernels are OK)
      • prefer commodity hardware (so x86 for the moment)

    For this to be useful we need

    • to make it reasonably easy for any contributor to satisfy the above, so
    • wiki and README needs a sample CMake invocation that works now and is reasonably future-proof.
    Page last modified 15:43, 24 Nov 2012 by mabraham