Extending Simulations

    It is possible to extend or continue a simulation that has completed, and even those that have crashed (see Doing Restarts). This is actually good technique for the handling of longer simulations to reduce time lost due to crashes of the computer(s) being utilised. It is only possible to continue seamlessly from a checkpoint file, or the point at which the last full precision coordinates and velocities are available for the system. Even though some coordinate file formats can contain velocities (e.g. .gro format) these are not precise enough for an exact restart. Therefore, you have to ensure that full-precision data is written at sufficiently short lengths of time (several times per day?) to avoid loss of too much time due to crashes. How to achieve this varies with the version of GROMACS you are using.

    Version 4 and Newer

    A simulation that has terminated, but not completed (e.g. because of queue limits,  power failure or the use of the -maxh option of mdrun) can be continued without needing to use tpbconv (which is now called gmx convert-tpr as of version 5.0). You may or may not wish to use -append in your mdrun command in this case. Otherwise, a simulation that has completed can be extended using tpbconvmdrun and checkpoint files (.cpt). First, the number of steps or time has to be changed in the .tpr file, then the simulation is continued from the last checkpoint with mdrun. This will produce a simulation that will be the "same" as if a continuous run was made (but see reproducibility for more discussion).

    tpbconv -s previous.tpr -extend timetoextendby -o next.tpr
    mdrun -s next.tpr -cpi previous.cpt

    You might want to use the -append option of mdrun to append the new output to the old files. Note that this will only work when the old output files have not been modified by the user.  Appending is the default behavior as of version 4.5.

    If you would like to change the default filenames while running a lengthy simulation in manageable parts, then cyclically running commands such as the following will work: (with suitable values for name and time)

    mdrun -deffnm ${name} -cpi ${name} -append
    tpbconv -s ${name} -o ${name}_new.tpr -extend ${time}
    mv ${name}_new.tpr ${name}.tpr
    If you felt the need to archive your checkpoint and run input files, then you could do that, too. If you used -noappend, then mdrun will add numerical suffixes to a series of files based on your name, just as described in mdrun -h.

    When running in a queuing system, it is useful to set the number of steps you want for the total simulation with grompp or tpbconv and use the -maxh option of mdrun to gracefully stop the simulation before the queue time ends. With this procedure you can simply continue by letting mdrun read checkpoint files and no other tools are required. However, if your queueing system permits job suspension, the -maxh mechanism will be unaware of the time spent suspended, and you may simulate for less wall time than you would expect.

    The time can also be extended using the -until and -nsteps options with tpbconv.

    A simulation can be continued without the checkpoint file, which will be non-binary identical and will have small errors that, for most situations, are negligible. The reason for the errors is that the trajectory and energy files do not store all the state variables of the thermostats and barostats.  If this is the case, you must make use of the version 3.3.3 procedure below.

    Changing .mdp file options

    If you wish/need to change .mdp file options, then either

    grompp -f new.mdp -c old.tpr -o new.tpr -t old.cpt
    mdrun -s new.tpr


    grompp -f new.mdp -c old.tpr -o new.tpr
    mdrun -s new.tpr -cpi old.cpt

    should work. The former is necessary under GROMACS 4.x if the thermodynamic ensemble has changed. (Someone said "If your old.cpt is for a run that has finished, then use tpbconv -extend after grompp and before mdrun." but mabraham disagrees. A run finishing is judged by the contents of the .cpt in the context of the .tpr. So, if the latter is changed, then the run isn't finished.)

    Version 3.3.3 and Before

    To continue a simulation the coordinates and velocities are required in full precision, which means the .trr trajectory file must be used; trajectories in .xtc format are insufficient, as they are in reduced precision and do not have velocity information. The frequency with which coordinates and velocities are written to the .trr file is set by nstxout and nstvout in the .mdp file. The process is handled by tpbconv in the following manner:

    tpbconv -s previous.tpr -f previous.trr -e previous.edr -o next.tpr -extend timetoextend

    Then feed the resulting .tpr file back into mdrun. Note that there must be a frame in your input where you have written all three of positions, velocities and energies, so that tpbconv can combine them into a new starting point.

    Exact vs binary identical continuation

    If you had a computer with unlimited precision, or if you integrated the time discretized equations of motion by hand, exact continuation would lead to identical results. But since computer have limited precision (single or double precision) and MD is chaotic, trajectories will diverge very rapidly if one bit (the least a significant one) is rounded differently. Such trajectories will all be equally valid, but very different. Continuation using a checkpoint file, using the same code compiled with the same compiler and running on the same computer architecture using the same number of processors will lead to binary identical results, unless timing code is used. Timing code is used by the dynamic load balancing and (by default) by the FFTW FFT library.

    In most cases you don't care that a trajectory is not binary identical when you stop it and continue it, as opposed to keeping it running in a single run. The most common case where you do care is when you have a crash at e.g. step 567894 and you want to reproduce it to track down if it is a problem with your system or a bug. mdrun has an option -reprod to (try to) force binary identical simulations. Note that in certain cases this might cost a lot of performance.

    Page last modified 13:37, 4 Mar 2016 by JLemkul?