TRAJNG BASIC DOCUMENTATION

INTRODUCTION

TRAJNG (Trajectory next generation) is a program library for handling
molecular dynamics (MD) trajectories. It can store coordinates, and
optionally velocities and the H-matrix. Coordinates and velocities are
stored with user-specified precision. In addition, program specific
information (text strings) can optionally be stored in the beginning
of each file. Atomic labels can also optionally be stored once in the
beginning of the file.

MD frames (snapshots/dumps) are stored in blocks (with user-specified
length), where additional compression can be performed between each
frame in the block if that is more efficient. The program also stores
indices to the start of each block of frames for future fast searching
in the trajectory file. The indices can be stored either inside each
trajectory file or in a separate index file. The indices are not
required for reading the file, but speeds up searching if present.

All values stored in the trajectory file are either integers or
fixed-point values. The integers are externally only referenced as
individual bytes, making the files portable to little, big, or
mixed-endian systems. Text strings are converted to ASCII before
beeing written to the file and converted back from ASCII after having
been read. Floating point values are not written to the trajectory
files. In all cases floating point values are required they are first
converted to fixed point values.

ASSUMPTIONS

8 bit bytes and at least 32 bit ints are assumed by the source. The
code is written in C89 (with optional C99 stdint.h and/or long
long). The code does not require 64 bit integers, but 64 bit integers
can be used to improve the speed of multiplication and division of
large integers. 64 bit integers are also required to manipulate and
store indices within the same file (default mode trajectory files, see
the section DEFAULT MODE AND COMPATIBILITY MODE TRAJECTORY FILES), but
storage of indices in a separate index file (compatibility mode) does
not require 64 bit integers. Code that calls the trajng library does
not have to be 64 bit aware.

LIMITATIONS

Precision smaller than 1e-9 cannot be handled. If your values are
smaller than this, scale them appropriately. How large values can be
handled depends on the user-specified precision. The largest value
storable at a precision P is about +/- P*2**29. For a precision of
0.001 nm this means that the largest possible object is about +/- 0.5
mm.

COMPILATION - UNIX

Build and install the trajng library, the trajtool program, and the
example programs by doing

./configure
make
make install

Options to the configure script:

--prefix=DIR : Install to a non-standard directory, such as
  --prefix=$HOME/trajng_install

--enable-fortran : If you want to build also the fortran example
  programs. Fortran wrappers will always be compiled. This option only
  additionally build the fortran examples.

--disable-largefile : Disable support for large files. Only
  compatibility mode files can be written.

--enable-compatibility-mode : Only compatibility mode files can be
  written. This will remove the need for 64 bit ints and large file
  support.

COMPILATION - WINDOWS

In the directory win32 there exists a MS Visual Studio solution (built
with Visual Studio 2008) called "trajng.sln". Open it with Visual
Studio and build either the Debug or Release version. This will
generate trajng.dll, trajng.lib, and trajtool.exe in the Debug and
Release directories, respectively. The testsuite will also be built.

TEST SUITE COMPILATION - UNIX 

After having built the library build and run the testsuite by doing
make check

NOTE: Running the testsuite requires about 1 GB of RAM and 30 GB
of free disk space.

TEST SUITE COMPILATION - WINDOWS

On windows the testsuite is built together with the library. 

RUNNING THE TEST SUITE

The test suite is run in two steps. In the first step, the trajectory
files are generated and basic checks performed. In the second step the
trajectory files are read and the content verified. Both steps are run
by issuing "make check" on UNIX. On WINDOWS execute gen.bat in the
win32 directory to do the first step. On WINDOWS the second step is
run by executing run.bat in the win32 directory. On WINDOWS there are
also batch files available for running the testsuite for the Debug
build: gendebug.bat and rundebug.bat. Dividing the tests into two
stages allows the trajectory files to be generated on one machine
type, copied and verified on a second machine type to ensure
portability. To do this on UNIX first do "make check" on both
platforms. Then copy the tesulting .tng and .tnx files from one
platform to the other. Then run the "run.sh" script in the testsuite
directory.

NOTE: Running the testsuite requires about 1 GB of RAM and 30
GB of free disk space.

INSTALLATION - UNIX

The make install command installs the library and trajtool program.

INSTALLATION - WINDOWS

Currently no automatic installer is available. When the Release has
been built you can copy the trajng.h file from the src directory and
the trajng.lib file from the win32/Release directory to appropriate
places for compilation. Copy the trajtool.exe program and trajng.dll
from the win32/Release directory to appropriate places in the search
path for execution.

THE TRAJTOOL PROGRAM

trajtool can be used for examining and verifying the content of trajng
trajectory files. It can also be used for compressing and unpacking
text trajectory files. Execute "trajtool traj.tng" to obtain basic
information about a trajectory file. Execute "trajtool -v -s traj.tng"
to verify that seeks on the file is working. Execute "trajtool -v -S
traj.tng" to verify that large seeks on the file is working (default
mode trajectories only). Execute "trajtool -d traj.tng" to read all
data in the file to ensure that the file is not corrupted. Run
"trajtool -h" for more information about the various options to
trajtool.

USAGE OF THE TRAJNG LIBRARY (C and C++)

void *TrajngOpenWrite(char *name, int natoms, int chunky, double precision,
                      int writebox,
                      int writevel, double velprecision,
                      int compatibility_mode,
                      int speed);

Open a trajectory file for writing. name is the name of the file,
natoms is the number of atoms. chunky is the number of frames in each
chunk (block of frames). 100 is a good starting point as a value for
frames. precision is the precision in the values to be stored. If nm
are stored, 0.001 is a reasonable value, if Angstroms are stored 0.01
is a reasonable value. writebox is a flag whether to write the box
matrix (H-matrix) in the frames; set to 0 to not write box and 1 to
write the box. writevel is a flag whether to write the velocities in
the frames; set to 0 to not write velocities and 1 to write
velocities. velprecision is the precision in velocities. For units of
Angstrom/ps 0.1 is a reasonable value for computing the velocity auto
correlation function. compatibility_mode is a flag to select whether
to write files in compatibility mode (see below). Set to 1 to always
write files in compatibility mode. Set to 0 to write files in default
mode if possible. The last parameter is speed. Normally this should be
set to 0 to select default speed settings. See the SPEED section for
more details. The function returns a handle, similar to a file
handle. If the file cannot be opened NULL is returned.

int TrajngSetProgramInfo(void *handle, char *program_info);

Set program info string. handle is the handle returned by
TrajngOpenWrite. program_info is the text string to be written to the
trajectory file. Call this before writing the very first frame. The
function returns 0 for successful operation and non-zero otherwise.

int TrajngSetAtomLabels(void *handle, char **atom_labels);

Set atomic labels. handle is the handle returned by
TrajngOpenWrite. atom_labels is a pointer to the first element of an
array (of the same length as the number of atoms) of pointers to text
strings. Call this before writing the very first frame. The function
returns 0 for successful operation and non-zero otherwise.

int TrajngWrite(void *handle,double *H, double *coords, double *vels,
                int stride, int framenumber, double time, double lambda);

Write a frame using double precision data. handle is the handle
returned by TrajngOpenWrite. H is the box matrix (H-matrix) (which
will be ignored if writebox is set to 0). coords is an array of the
atomic coordinates which should be organized as x, y, z of the first
atom, x, y, z of the second atom and so on for best compression. vels
is an array of the velocities (which will be ignored if writevel is
set to 0). Stride is usually set to 3, but can be set to other values
if the input coordinate and velocity arrays contain additional
components in the coords / vels arrays. For instance if atomic charges
is stored as well in the coords array stride can be set to
4. framenumber is the frame number. time and lambda are, obviously,
time and lambda, respectively.

int TrajngWritef(void *handle,float *H, float *coords, float *vels,
                 int stride, int framenumber, float time, float lambda);

Write a frame using single precision data. The arguments are the same
as for TrajngWrite apart from floating point values being in single
precision.

void *TrajngOpenRead(char *name);

Open a trajectory file for reading.  name is the name of the file. The
function returns a handle, similar to a file handle. If the file
cannot be opened NULL is returned.

char *TrajngGetProgramInfo(void *handle);

Query a open trajectory file for program info.  handle is the handle
returned by TrajngOpenRead. If program specific information is
available in the file it is returned, otherwise NULL is returned. Do
not free the return value.

char **TrajngGetAtomLabels(void *handle);

Query a open trajectory file for atom labels.  handle is the handle
returned by TrajngOpenRead. If atom labels are available, a pointer to
the first element of an array (of the same length as the number of
atoms) of pointers to text strings is returned, otherwise NULL is
returned. Do not free the return value.

int TrajngNatoms(void *handle);

Query a open trajectory file for the number of atoms.  handle is the
handle returned by TrajngOpenRead. The number of atoms in the open
file is returned.

int TrajngHasVel(void *handle);

Query a open trajectory file if it has velocities.  handle is the
handle returned by TrajngOpenRead. If velocities are present in the
file, 1 is returned, otherwise 0.

int TrajngHasBox(void *handle);

Query a open trajectory file if it has a box.  handle is the handle
returned by TrajngOpenRead. If a box matrix is present in the file, 1
is returned, otherwise 0.

int TrajngReadTry(void *handle);

Try to read a frame from a trajectory file.  handle is the handle
returned by TrajngOpenRead. If it is possible to read another frame
the value 0 is returned, otherwise 1 is returned. For compatibility
mode files, 0 is always returned.

int TrajngRead(void *handle,double *H, double *coords, double *vels, 
               int stride, int *framenumber, double *time, double *lambda);

Read a frame from a trajectory file using double precision data.  See
the documentation for TrajngWrite. The arrays will here be filled with
information from the current file location of the file referenced by
the handle.

int TrajngReadf(void *handle,float *H, float *coords, float *vels, 
                int stride, int *framenumber, float *time, float *lambda);

Read a frame from a trajectory file using single precision data. The
arguments are the same as for TrajngRead apart from floating point
values being in single precision.

int TrajngSeek(void *handle,int frame);

Go to a specific frame in a trajectory file.  handle is the handle
returned by TrajngOpenRead. Seek to frame frame to reposition the file
location. The next call to TrajngRead/TrajngReadf will read this
frame. On default mode capable systems default mode trajectory files
and compatibility mode files with external index file will be
repositioned quickly.

void TrajngClose(void *handle);

Close a trajectory file (either open for read or write). handle is the
handle returned by TrajngOpenRead/TrajngOpenWrite.

void TrajngSetMaxMem(int MaxMB);

Call this routine before any other trajng routine if you want to
change the approximately largest amount of memory used by trajng. The
value can be changed by calling TrajngSetMaxMem(int MaxMB) where MaxMB
is the approximate maximum number of megabytes of memory to use. 
The default is 1000 (1 GB per process/thread).
NOTE: This only affects the *creation* of trajectory files.

void *TrajngOpenWriteSpecify(char *name, int natoms, int chunky, double precision,
                             int writebox, int writevel, double velprecision,
                             int compatibility_mode,
                             int initial_coding, int initial_coding_parameter,
                             int coding, int coding_parameter,
                             int initial_vel_coding, int initial_vel_coding_parameter,
                             int vel_coding, int vel_coding_parameter,
                             int speed, int compute_chunky);

This routine has possibilities to give extended compared to the
TrajngOpenWrite routine. In addition to the parameters in
TrajngOpenWrite, the *_coding parameters specify which compression
algorithms to use.To use automatic selection of algorithms specify all
of them as -1. Use TrajngInfo on an open file to obtain information
about proper values for these parameters. The compute_chunky parameter
should usually be set to 1 to allow trajng to lower the value of
chunky in case the amount of memory required to use the provided
chunky parameter becomes large. Set compute_chunky to zero if you
really want to use the chunky parameter you have provided.

void TrajngInfo(void *handle,int *chunky, int *natoms, int *version,
                int *initial_coding, int *initial_coding_parameter,
                int *coding, int *coding_parameter,
                int *vel_coding, int *vel_coding_parameter,
                int *initial_vel_coding, int *initial_vel_coding_parameter,
                double *chosen_precision, double *chosen_velprecision);

Use this routine on an open file to obtain information about the
file. The compression parameters (*_coding) can be given to future
calls to TrajngOpenWriteSpecify. Note that if no frames have been read
from a file, some or all of these values may be -1. Also not that if
fewer than chunky frames have been written to a file, some or all of
these values may be -1. To use this routine for future calls to
TrajngOpenWriteSpecify, be sure to call this late enough.

The examples directory also contains examples on how to use the
library, see EXAMPLES below.

USAGE OF THE TRAJNG LIBRARY (FORTRAN-77)

Fortran wrappers have currently been tested on UNIX only. They make
the following subroutines and functions available:

      subroutine tngowr(name,natoms,chunky,prec,wb,wv,vprec,icomp,ispd)
      character*(*) name
      integer natoms,chunky,wb,wv,icomp,ispd
      double precision prec,vprec

Open a file for writing. name is the name of the file, natoms is the
number of atoms. chunky is the number of frames in each chunk. 100 is
a good starting point. prec is the precision in the values to be
stored. If nm are stored 0.001 is a reasonable value, if Angstroms are
stored 0.01 is a reasonable value. wb is a flag whether to write the
box matrix (H-matrix) in the frames; set to 0 to not write box and 1
to write the box. wv is a flag whether to write the velocities in the
frames; set to 0 to not write velocities and 1 to write
velocities. vprec is the precision in velocities. For units of
Angstrom/ps 0.1 is a reasonable value for computing the velocity auto
correlation function. icompat is a flag to select whether to write
files in compatibility mode (see below). Set to 1 to always write
files in compatibility mode. Set to 0 to write files in default mode
if possible. ispd should usually be set to 0. See the SPEED section
for more details.

      subroutine tngspi(text)
      character*(*) text

Set program information. text is the text string to be written to the
trajectory file. Call this after opening a file for writing and before
writing the very first frame.


      subroutine tngsal(iatom,label)
      character(*) label
      integer iatom

Set atomic labels.  Set the label of atom iatom (with Fortran
numbering starting the counting from 1!) to the text string
label. Call this routine for each atom after opening a file for
writing and before writing the very first frame. If you fail to call
this for every atom no atomic labels will be written to the trajectory
file.

      subroutine tngwrd(H,coords,vels,stride,fn,time,lambda)
      double precision H(9),coords(*),vels(*),time,lambda
      integer stride,fn

Write a frame in double precision. H is the box matrix (H-matrix)
(which will be ignored if wb is set to 0). coords is an array of the
atomic coordinates which should be organized as x, y, z of the first
atom, x, y, z of the second atom and so on for best compression. vels
is an array of the velocities (which will be ignored if wv is set to
0). Stride is usually set to 3, but can be set to other values if the
input coordinate and velocity arrays contain additional components in
the coords / vels arrays. For instance if atomic charges is stored as
well in the coords array stride can be set to 4. fn is the frame
number. time and lambda are, obviously, time and lambda, respectively.

      subroutine tngwrs(H,coords,vels,stride,fn,time,lambda)
      real H(9),coords(*),vels(*),time,lambda
      integer stride,fn

Write a frame in single precision.  The arguments are the same as for
tngwrd apart from floating point values being in single precision.

      subroutine tngord(name)
      character*(*) name

Open a file for reading. name is the name of the file to be opened
for reading.

      subroutine tnggpi(text)
      character*(*) text

Get program information.  Get the program specific text string. Call
this after opening a file for reading.  If none is found the fortran string is
filled by blanks and no error is reported.

      subroutine tnggal(iatom,label)
      character(*) label
      integer iatom

Get atomic labels.  Get the atomic label as a text string for atom
iatom (with Fortran numbering starting the counting from 1!). Call
this after opening a file for reading.  If none is found the fortran string is
filled by blanks and no error is reported.

      function itngat()
      integer itngat

Get number of atoms from file open for reading.  Reference this
function after opening a file for reading.

      function itnghv()
      integer itnghv

Get info if file has velocities.  Reference this function after
opening a file for reading.

      function itnghb()
      integer itnghb

Get info if file has box.  Reference this function after opening a
file for reading.

      function itngrt()
      integer itngrt

ReadTry: Query if a file has more frames. If it has it returns 0
otherwise 1 (as in error or EOF).  Reference this function after
opening a file for reading. This will return EOF only for files
written in default mode. For compatibility mode trajectories it will
always return 0.

      subroutine tngrdd(H,coords,vels,stride,fn,time,lambda)
      double precision H(9),coords(*),vels(*),time,lambda
      integer stride,fn

Read a frame in double precision.  See the documentation for
tngwrd. The arrays will here be filled with information from the
current file location.

      subroutine tngrds(H,coords,vels,stride,fn,time,lambda)
      real H(9),coords(*),vels(*),time,lambda
      integer stride,fn

Read a frame in single precision.  See the documentation for
tngwrs. The arrays will here be filled with information from the
current file location.

      subroutine tngsek(iframe)
      integer iframe

Seek to frame iframe (with numbering following the Fortran convention
of having the first frame be frame 1, in contrast to the C convention
of having the first frame be frame 0) to reposition the file
location. The next call to tngrdd/tngrds will read this frame. On
default mode capable systems default mode trajectory files and
compatibility mode files with external index file will be repositioned
quickly.

      subroutine tngclw()

Close the file that is currently open for writing.

      subroutine tngclr()

Close the file that is currently open for reading.

The fortran wrappers allow one frame to be open for writing at the
same time as having one file open for reading. They are not
thread-safe.

The examples directory also contains examples on how to use the
library, see EXAMPLES below.

EXAMPLES

The examples directory contains one example in C and one in Fortran-77
on how to write trajectory files from a simple MD simulation. In both
cases a Lennard-Jones liquid is simulated. It also contains one
example in C and one in Fortran on how to read trajectory files. The
examples compute the radial distribution functions of the
Lennard-Jones liquids. For UNIX the examples are compiled together
with the library and trajtool. Typing make should be enough. For
WINDOWS the directory examples\win32 contains a MS Visual Studio
solution file. Compile the Debug or Release version as desired (after
having compiled the corresponding trajng libraries, see
above!). Running the examples under windows requires copying of the
trajng.dll file to somewhere in the search path.

DEFAULT MODE AND COMPATIBILITY MODE TRAJECTORY FILES

Trajng can create two kinds of trajectory files. In the default mode
it stores the indices (pointers to the chunks of data) in the same
file as where the data is. In compatibility mode it stores the indices
in a separate file. The trajectory files should have file extension
.tng and index files extension .tnx. On a system capable of handling
default mode trajectory files, the index files can be used to quickly
search to data, even when the files have been generated on a system
that cannot handle default mode trajectories. The index files or the
indices themselves are not required for correct operation, and
trajectory files can be read even if the index file is deleted or lost,
so on all systems the trajectory files, whether beeing default mode or
compatibility mode, can be read. The following table summarizes the
features:

Trajectory                Default mode          Non default mode
file                      capable system        capable system

default mode tng          Can write and         Cannot write. Can read
                          read quickly.         but not perform searches.
compat. mode tng+tnx      Can write and         Can write and read 
                          read quickly.         but not perform searches.
compat. mode tng          Can read but not      Can read but not
                          perfom searches.      perfom searches.

A default mode system requires: 64 bit integers and a working
implementation of fseeko/fello, where off_t is 64 bit or more and
supports manipulation as an integer (in particular it should be
possible to subtract one off_t with another off_t and the result cast
into a 64 bit integer, or _fseeki64/_telli64 (WINDOWS).  The testsuite
contains several tests to ensure that this is working.

Compiling for compatibility mode only:
configure --enable-compatibility-mode.

Compiling for default (and compatible) mode: 64 bit integers are
required. Large file support is needed. The configure script checks
for both these requirements. Additionally the program performs tests
to ensure that off_t is at least 64 bits. If it is not, only
compatibility mode will be used, even if --enable-compatibility-mode
has not been specified. A warning will be emitted upon file opening in
such a case.

SPEED setting.

The speed/ispd parameter controls the selection of compression
algorithms, and can be used to choose whether fast compression or good
compression should be done. To use the default settings use 0, which
currently gets mapped to speed setting 4. This gives good compression,
but does not try expensive algorithms that seldom give better
compression. The environment variable TRANJG_SPEED can also be set to
override this parameter.

The following table details which algorithms are used:

SPEED    Algorithms
1        Fast algorithms only. This excludes all BWLZH algorithms and
         the XTC3 algorithm.
2        Same as 1 and also includes the XTC3 algorithm using base compression
         only.
3        Same as 2 and also includes the XTC3 algorithm which will use BWLZH
         compression when it seems likely to give better
         compression. Also includes the interframe BWLZH algorithm for
         coordinates and velocities. The one-to-one BWLZH
         algorithm is enabled for velocities.
4        Enable the inter frame BWLZH algorithm for the coordinates.
5        Enable the LZ77 part of the BWLZH algorithm.
6        Enable the intra frame BWLZH algorithm for the coordinates. Always try
         the BWLZH compression in the XTC3 algorithm.

The compression speed also depends strongly on the actual data, but
the following timings and compression ratios can be used as a guide
for the setting (serial compression on 2.5 GHz Intel Core2 Quad CPU),
when there is relatively little correlation between the data, for a
big (1'000'000 particles) system:
SPEED                                RATIO   atoms/second
1      (inter frame, variable base)  1:2.6    1'800'000
1      (XTC2)                        1:3.0    1'100'000
2      (XTC3+base)                   1:3.3      550'000
3      (XTC3+BWLZH)                  1:3.4      220'000
5      (XTC3+BWLZH)                  1:3.4      210'000

The decompression speed is also determined largely by the compression
algorithm:
SPEED                                decompressed atoms/second
1      (inter frame, variable base)  4'600'000
1      (XTC2)                        1'300'000
2      (XTC3+base)                   1'100'000
3      (XTC3+BWLZH)                    800'000
5      (XTC3+BWLZH)                    800'000

Compression speed for a system consisting of 10'000 particles with
substantial correlation between the data:
SPEED                                RATIO  atoms/second
1      (inter frame, variable base)  1:13   660'000
4      (inter frame, BWLZH)          1:25   170'000
5      (inter frame, BWLZH)          1:26    15'000

Decompression speed:
SPEED                                atoms/second
1      (inter frame, variable base)  19'000'000
4      (inter frame, BWLZH)           5'500'000
5      (inter frame, BWLZH)           5'500'000

