Handling Errors

    Table of contents
    No headers

    Version as of 14:45, 28 Jan 2020

    to this version.

    Return to Version archive.

    View current version


    To make Gromacs behave like a proper library, we need to change the way errors etc. are handled. Basically, the library should not print out anything to stdio/stderr unless it is part of the API specification, and even then, there should be a way for the user to suppress the output. Also, the library should never terminate the program without the user having control over this. There are two different cases, which are discussed separately below. Currently, these issues are under discussion, and there are no concrete guidelines yet.

    1) In cases when the library meets an error after which it does not make sense to continue processing, it should return an error code and let the caller decide what to do.

    • In this case, there should be a global list of possible error codes, and the library should return one of these. In addition, it should call an error handler with a more detailed description of the reason for the error. The default error handler could still abort the program, but the user could replace the error handler if there was need for it. The callback mechanism could be something very similar to what is currently in gmx_error.h, but the return codes should be defined there as well. The global error codes should include at least:
      • Out of memory (without exceptions we are likely forced to abort() on these cases with C++, but the return value should still be there)
      • File not found
      • Other OS I/O error
      • Invalid user input
      • Simulation instability
      • Invalid API call/value/internal error (we can also have a policy that the program should assert in such cases)
    • A simple implementation of this is already present in src/gromacs/fatalerror/.
    • If the error is in violation of the API for the function (e.g., the programmer passes an invalid value), it is OK to assert(). The basic rule is that assert() is OK for conditions that should never happen unless there is a programming error; it is not OK to assert() if user input is incorrect.

    2) The library wants to report a warning or a non-fatal error, but is still able to continue processing. For example, what grompp does now with notes, warnings, and errors.

    • One way would be to have a common reporting interface for such cases. All library functions that potentially need it, would take as an extra parameter an object that implements this interface, and could then call functions in the interface to report warnings. We could have a default implementation that would still write out everything to stderr.

    Points for discussion:

    • How to handle functions that may fail as part of normal operation? E.g., a function that accesses data, and by design should also be callable when no data is available. These should not call the error handler, but should they return 0 and use another variable for reporting whether the call was successful? Or should we have a designated error code(s), e.g. all negative values, for such cases?
    • We may not want to riddle performance-sensitive code with a lot of error-checking, but for debugging, it is useful to be able to pinpoint where things start to go wrong instead of observing a crash that may occur much later. Having the error checks in performance-sensitive code as asserts is one way, but there may be others that would be more suitable.
    • How to handle cases when the reason for the error is detected within a relatively deep call graph, but there is not enough information in that context to print an error message that's useful to the user? Five options:
      • Don't do anything, live with cryptic error messages.
      • Signal errors from the inner scopes with return values only and call the error handler only from an outer scope where enough context is known. Can make debugging harder, because the original reason for the error is no longer accessible if one breaks in the error handler. If this becomes a problem, could have a separate macro (ugh) that is used in the inner scope to call the error handler in development versions, but expands to nothing in release versions.
      • Pass enough information through all the function calls. If this information would not be otherwise needed for anything, it will make the code less modular.
      • Call the error handler from both scopes (with different values so that calls from different scopes can be recognized). Will make the error handler itself more complex.
      • Use the facilities from 2) above in such cases. Can easily result in overly complex code for handling simple errors.
    • Should the error handler be global, or thread-local? Similarly, for 2), should the error reporter object be thread-local, or be passed as an parameter?
    • If we decide to use exceptions, the following things should in addition be considered:
      • Exceptions should only be used within the Gromacs library, and a generic mechanism should be written to translate exceptions to error codes before returning from the library.
      • Which cases of 1) are candidates for exceptions? Out-of-memory issues and unexpected file system issues are at least likely candidates. Incorrect user input in general should not result in an exception.
      • When an exception is thrown, when should the fatal error handler be called? A natural location would be at the point when the exception is thrown.
      • What information should be stored in an exception object?
    • Do we want to write our own assert? This would give more control over the produced error message, and also to control whether non-performance-sensitive asserts would remain in production code.
    Page last modified 16:33, 23 Mar 2011 by tmurtola