Gromacs

Regression Tests

    Where is the test suite?

    The Gromacs regression tests are maintained on git.gromacs.org and gerrit.gromacs.org.

    Downloading the regression test suite

    They can be obtained by

    git clone https://gerrit.gromacs.org/regressiontests

    or  for developers:

    git clone ssh://user@gerrit.gromacs.org/regressiontests
    

    To run the regressiontests the Gromacs binaries have to be in the path. Either by

    source /path/to/gromacs/bin/GMXRC

    or before installing (depending on the GROMACS version ) by

    export PATH=/path/to/build/src/kernel:/path/to/build/src/tools:/path/to/build/bin/:$PATH

    The tests are then run by

    ./gmxtest.pl all -noverbose

    Adding new mdrun tests

    Consult the policy below, then

    1. Compile a version of GROMACS using the Reference build type (-DCMAKE_BUILD_TYPE=Reference), preferring to use gcc 4.7 for consistency
    2. Create a new folder inside e.g. complex
    3. Create inside your test folder grompp.mdp, conf.gro and topol.top (and itp files if your topol.top requires those)
    4. Run ./gmxtest.pl both in single and double precision to create reference values (perhaps using -only <regexp> to save time)
    5. Check the .log files to see that the mdrun settings were what you really intended
    6. Write a README to describe the scope and limitations of the test
    7. Upload tests to gerrit.

    If the new tests cannot run without matching changes to the code, then of course each will fail to pass a normal Jenkins verification. To verify the changed code against the changed tests, logging into Jenkins to trigger Gromacs_Gerrit_master_* using "Build with Parameters" to select both the refspecs, remembering to choose the appropriate latest patch set for each. 

    An example commit which adds tests is here.

    Adding new tests for tools

    1. Create a new cfg file or add lines to an existing cfg file in the tools folder
    2. Run ./gmxtest.pl to create reference values (preferably using a Reference build type, as above, though this is less critical than it is with mdrun)
    3. Upload tests to gerrit. The commit message should contain the first few digits of the gerrit Change-Id (e.g. I9c633345, rather than an explicit URL) of the version of the code that generated the reference values, perhaps with text to describe what the test does.

    Updating existing tests

    To update reference values for tests to reflect code changes, etc. generally do as above, but include

    1. Run ./gmxtest.pl refclean (unfortunately the -only mechanism does not work here)
    2. Remove or replace any appropriate files (e.g. reference_[sd].*, README files)
    3. Run gmxtest.pl to generate the new tests
    4. Add only the relevant changes to a new commit 

    An example commit which updates tests is here.

    When are the tests run?

    Jenkins automatically runs all tests for all build configurations. The regressiontests results are show on the Jenkins page under "Test" and a test report looks like this.

    Test failures

    There is not yet an automated mechanism for Jenkins to know that a code patch and a test patch each under review need each other to work properly and give a +1 verify from Jenkins Buildbot. Jenkins can be instructed to do the right thing, which should generally be the procedure before a developer can apply a +1 verify. 

    Coverage

    Jenkins also automatically computes the code coverage. This is currently only done when the regressiontests are modified not when the tests are modified. The code coverage of the regressiontests for the 4.5 source is here and for the regressiontests and the unit tests together for the master branch source is here.

    Policy for generating regression tests

    Goal: Provide a way to detect errors introduced in old code paths by new or modified features
    Method: Compare output of reference input and reference code with the same input with new code, and use that to signal significant differences

    Short version: consult README files in the regression test set.

    Discussion of issues underlying the policies

    Full conditions of the reference version of the test needs to be documented. We don’t know now what bug we might suspect later might have been present in either the test or the reference case. We need to provide as much information as we reasonably can now, so that we can attempt to replicate it exactly later if that’s needed when trouble happens. This means we need:

    • code requirements
      • code needs a commit hash that exists in the main GROMACS or gerrit repo (so not some developer repo that might get rebased, and definitely not a dirty one)
      • what branch the commit is on is of secondary importance, but has to make sense for the functionality being tested and the time span over which that functionality has been stable (or not!)
    • compiler requirements
      • open source
      • code base likely to remain available and compilable on future hardware
      • version we trust from previous experience
      • we’re not going to make trouble by requiring test makers to have a particular hand-compiled compiler whose pedigree we know - Mark would rather have an extra test contributed by a volunteer developer than really be able to know whether a particular compiler bug was present in our compiler when we made the reference case (if/when we do need to know that about some suspected compiler bug, then we will be recompiling with later versions of that compiler anyway, and those will tell us what we need to know). Yes, this is not perfect.
      • so gcc 4.7 sounds like a good way to do the above
    • build requirements
      • -O0 optimization level (Mark’s reflex was to be happy with -O2, but for some kinds of tests being sure of IEEE conformance with -O0 is worth it, because we are often trying to cut numerical corners and want to have chances of knowing we’re wrong). This means that if the compiler can’t be reproduced for some reason (e.g. vendor won’t support it) there’s a decent chance that some other compiler will do a comparable job.
      • Later, if we later add regular checking of ensemble consistency over longer time scales, we aren’t going to be doing that with -O0 code. One of the purposes of such tests is to do so under “battle conditions” where we can’t control numerical reproducibility (e.g. in parallel).
      • use built-in versions of GROMACS dependencies (Generally, we will be regularly testing code that uses the external dependencies that we encourage people to use for performance, so we can be reasonably sure that this process will gradually eliminate bugs in our built-in code. When tracing problems, we don’t have the resources to go looking for whether bugs existed in the external libfoo used during the test. We don’t want to constrain test creators to have to install some particular version of some library they don’t care about. We don’t want to have to discuss updating that dependency. We don’t want to have to try to detect and document dependency versions (unless, say, the test is actually about whether FFTW is conformant to our requirements)
      • minimize the use of acceleration levels at configure time (no SIMD, no GPU, generally no parallelism)
    • test case requirements
      • reasonably likely to be responsive to a range of possible problems without producing false signal. The reproducible quantity might be numerical or an ensemble average (with error estimates), or other things?
      • Mark doubts we will ever be able to afford to write function-level test code, so the purpose of the tests is to signal, rather than diagnose, the problem
    • run time requirements
      • the code path of the reference test should depend somewhat on the nature of the test, but generally use code that is easy to check by eye for correctness (so interaction-specific C non-bonded kernels are OK)
      • prefer commodity hardware (so x86 for the moment)

    For this to be useful we need

    • to make it reasonably easy for any contributor to satisfy the above, so
    • wiki and README needs a sample CMake invocation that works now and is reasonably future-proof - satisified with Reference build type.
    Page last modified 19:37, 30 Jun 2016 by mabraham