|
Extending SimulationsTable of contentsIt is possible to extend or continue a simulation that has completed, and even those that have crashed (see Doing Restarts). This is actually good technique for the handling of longer simulations to reduce time lost due to crashes of the computer(s) being utilised. It is only possible to continue seamlessly from a checkpoint file, or the point at which the last full precision coordinates and velocities are available for the system. Even though some coordinate file formats can contain velocities (e.g. .gro format) these are not precise enough for an exact restart. Therefore, you have to ensure that full-precision data is written at sufficiently short lengths of time (several times per day?) to avoid loss of too much time due to crashes. How to achieve this varies with the version of GROMACS you are using. Version 4 and NewerA simulation that has terminated, but not completed (e.g. because of queue limits, power failure or the use of the -maxh option of mdrun) can be continued without needing to use tpbconv (which is now called gmx convert-tpr as of version 5.0). You may or may not wish to use -append in your mdrun command in this case. Otherwise, a simulation that has completed can be extended using tpbconv, mdrun and checkpoint files (.cpt). First, the number of steps or time has to be changed in the .tpr file, then the simulation is continued from the last checkpoint with mdrun. This will produce a simulation that will be the "same" as if a continuous run was made (but see reproducibility for more discussion). tpbconv -s previous.tpr -extend timetoextendby -o next.tpr mdrun -s next.tpr -cpi previous.cpt You might want to use the -append option of mdrun to append the new output to the old files. Note that this will only work when the old output files have not been modified by the user. Appending is the default behavior as of version 4.5. If you would like to change the default filenames while running a lengthy simulation in manageable parts, then cyclically running commands such as the following will work: (with suitable values for name and time) mdrun -deffnm ${name} -cpi ${name} -append tpbconv -s ${name} -o ${name}_new.tpr -extend ${time} mv ${name}_new.tpr ${name}.tpr If you felt the need to archive your checkpoint and run input files, then you could do that, too. If you used -noappend, then mdrun will add numerical suffixes to a series of files based on your name, just as described in mdrun -h.
When running in a queuing system, it is useful to set the number of steps you want for the total simulation with grompp or tpbconv and use the -maxh option of mdrun to gracefully stop the simulation before the queue time ends. With this procedure you can simply continue by letting mdrun read checkpoint files and no other tools are required. However, if your queueing system permits job suspension, the -maxh mechanism will be unaware of the time spent suspended, and you may simulate for less wall time than you would expect. The time can also be extended using the -until and -nsteps options with tpbconv. A simulation can be continued without the checkpoint file, which will be non-binary identical and will have small errors that, for most situations, are negligible. The reason for the errors is that the trajectory and energy files do not store all the state variables of the thermostats and barostats. If this is the case, you must make use of the version 3.3.3 procedure below. Changing .mdp file optionsIf you wish/need to change .mdp file options, then either grompp -f new.mdp -c old.tpr -o new.tpr -t old.cpt mdrun -s new.tpr or grompp -f new.mdp -c old.tpr -o new.tpr mdrun -s new.tpr -cpi old.cpt should work. The former is necessary under GROMACS 4.x if the thermodynamic ensemble has changed. (Someone said "If your Version 3.3.3 and BeforeTo continue a simulation the coordinates and velocities are required in full precision, which means the .trr trajectory file must be used; trajectories in .xtc format are insufficient, as they are in reduced precision and do not have velocity information. The frequency with which coordinates and velocities are written to the .trr file is set by tpbconv -s previous.tpr -f previous.trr -e previous.edr -o next.tpr -extend timetoextend Then feed the resulting .tpr file back into mdrun. Note that there must be a frame in your input where you have written all three of positions, velocities and energies, so that tpbconv can combine them into a new starting point. Exact vs binary identical continuationIf you had a computer with unlimited precision, or if you integrated the time discretized equations of motion by hand, exact continuation would lead to identical results. But since computer have limited precision (single or double precision) and MD is chaotic, trajectories will diverge very rapidly if one bit (the least a significant one) is rounded differently. Such trajectories will all be equally valid, but very different. Continuation using a checkpoint file, using the same code compiled with the same compiler and running on the same computer architecture using the same number of processors will lead to binary identical results, unless timing code is used. Timing code is used by the dynamic load balancing and (by default) by the FFTW FFT library. In most cases you don't care that a trajectory is not binary identical when you stop it and continue it, as opposed to keeping it running in a single run. The most common case where you do care is when you have a crash at e.g. step 567894 and you want to reproduce it to track down if it is a problem with your system or a bug. mdrun has an option |