Doing Restarts


    To achieve an exact restart of a simulation, one must preserve all the state variables of the system. In practice, this translates into preserving coordinates, velocities, and energy components in high precision. Most of the discussion below addresses how to restart a crashed simulation in GROMACS 3.x. GROMACS 4.x is much simpler and can be dealt with first.

    Version 4.x

    With the introduction of checkpointing, the instructions given below for 3.x are partly obsolete. They should still work, but the simple cases of breaking up a long mdrun or recovering from a crash are now easier. If a simulation crashes, make use of the state.cpt file that is written; it contains all of the information necessary to continue the simulation. In order to pick up from where the simulation stopped, simply use the -cpi and -append options to mdrun. Note that -append is the default in 4.5.

    Before doing anything, back up your files! Then use

    mdrun -s topol.tpr -cpi state.cpt

    This command will continue the simulation from the point when the checkpoint file was written. Unless the -noappend option is used, remaining output will be appended to the existing files (energy, trajectory, log, etc). Since the checkpoint files contains checksums of all the output files, this procedure is fool-proof and will never overwrite or append to partial previous output files or modified output files.

    In version 4.0 appending the new output files onto the old ones was not the default, so you should add the -append option:

    mdrun -s topol.tpr -cpi state.cpt -append

    And since there were no checksums, you need to make sure you append to the correct, unmodified files.

    If you need to generate a new .tpr (e.g. because you have to extend the simulation beyond your planned number of steps, or to change the .mdp file options) then you can follow the instructions here, but need not supply more than the .tpr file to tpbconv so long as you supply the old checkpoint file to the subsequent mdrun. If you change the integrator or ensemble, you should pass the checkpoint file to tpbconv only, not to mdrun, since the state might change and thus output files can not be appended.

    Note that mdrun will write state.cpt and state_prev.cpt files. As you can see from their time stamps, one was written approximately at the checkpoint interval before the other (15 mins by default). Or you can use gmxcheck to see what is in them.

    Version 3.x

    In GROMACS version 3.x, there are two methods for restarting a simulation. One uses grompp and requires an .mdp file and a coordinate file (just for the atom names), and the other uses tpbconv and requires a .tpr file. To achieve a restart that preserves a simulation ensemble, both of these also require a full-precision trajectory (i.e. .trr file), and when using pressure coupling or Nosé-Hoover temperature coupling both also require an .edr file. Note that when switching from a constant-volume ensemble to a constant-pressure ensemble, you should not supply the .edr file, since the old one doesn't have the relevant terms in it, and the presence of the file on the command line means GROMACS expects such terms.

    Note that you will need a trajectory frame with both coordinates and velocities (and possibly energies) in order to do a restart, so consider that when choosing your nstxout, nstvout and/or nstenergy in your .mdp file. Either grompp or tpbconv will allow you to use a -time option to choose the point of the supplied trajectory that will be used for the restart.

    Using grompp

    With grompp, you need to reconsider the contents of .mdp file options like tinit, init_step and nsteps. For an exact continuation, these choices should be respectively the start time for the very first part of this simulation, the step number from which the simulation is to restart, and the number of steps to do in this segment. These choices affect the starting values for the new simulation, but do not affect the choice of the frame for the starting configuration, which will be the last useable frame (by default), or selected with the -time option. Make sure to set gen_vel = no when doing a restart, otherwise the velocities in the input trajectory are ignored. Use unconstrained_start where appropriate. Read section 7.3 of the GROMACS Manual, and if necessary, ask yourself why you didn't do this before you ran the first simulation. Simpler, if possible, is to use a tpbconv restart, however if you need to change your ensemble or output options, etc. then grompp is the only way.

    Using tpbconv

    With tpbconv, use the -t and -e options to supply information from the end of your simulation, and either -extend or -until options appropriately. Read the man page! tpbconv will preserve all of the parameters in the original run input file, and things like the number of processors that were requested on the original grompp command line for the simulation.

    Note that the -extend option will extend the remaining runtime, so the new simulation will start from the point you choose with the other options to tpbconv, continue until the expected simulation ends, and then extend by the amount you chose.

    For those who are worried about the gen_vel they used in their initial .mdp file overwriting their equilibrated velocities, that parameter is read once by grompp at the point of generation, and never again.

    For those who are worried about the fact that their input .tpr file has velocities, as well as their input trajectory file, fear not... tpbconv creates a continuation of your run, not some bizarre hybrid with new positions and old velocities.

    After a crash

    You will need to restart from a .trr frame with both positions and velocities, and if necessary an .edr frame from the same time. Whether this is possible will depend on your choices of the above .mdp file parameters. You can use gmxcheck to see which frame gives you the latest good restart.

    Planning for a crash

    While you'd prefer not to have one, it makes sense to choose your parameters so that you get the best value for your computer time and disk space. The first thing to ensure is that you have energies every time you have full-precision positions and velocities. The latter combination are usually fairly expensive to store, and so if you need frequent positions for analysis, you should be using nstxtcout accordingly. If you have a low probability of a crash, then a full-precision frame every few hours of simulation seems a reasonable compromise between the cost of storage and the cost of re-running simulation to the point of the last crash. If you have frequent crashes, you may wish to write full-precision frames often, and use trjconv after the fact to increase the granularity of your .trr file and reduce long-term storage costs. You should also light a fire under your cluster administrators, of course!

    Under GROMACS 4.x, checkpointing takes care of this issue, but you may wish to choose your checkpoint interval on the command line to mdrun.

    Page last modified 00:55, 25 Dec 2012 by mabraham