Gromacs

Project ideas

    Version as of 12:14, 16 Sep 2019

    to this version.

    Return to Version archive.

    View current version

    This page describes projects related to GROMACS which are up for grabs by anyone wanting to contribute or do some research on one of  the proposed the scientific, HPC, or scientific topics.

    Rules of the game - how to join?

     

    [More content should come here in particular on: general information, policies, list of projects with description and mentor/contact info, contributors, etc.]

     

    Note that the content of this page should be considered a preliminary draft and it is currently being added as well as reviewied. If you find anything interesting feel free to send a mail to the developers' list (gmx-developers@gromacs.org) or to Szilárd Páll (pszilard ATATAT kth DOTDOT se).

     

    Projects

    Intel MIC support: implementing asymmetric offload mechanism

    Contact: Mikhail Plotnikov

    Description: In order to use Intel MIC (Xeon Phi) accelerators efficiently, an asynchronous task offloading to the accelerator is required - similar to the asynchronous execution model implemented with NVIDIA GPUs. This would involve implementing a heterogenous offload mechanism of accelerated short-range (non-bonded and bonded) force kernels. While simple offload mechanisms do exist for MIC, these lack the flexibility and performance required in GROMACS. The optimal solution is to implement a coarse, high-level, task-parallelization which separates particle-particle workload (bonded and non-bonded force calculations) the same way as PME "ranks" separate workload and enable execution on dedicated MPI ranks.

    For further details see the related feature requests: 1181 and 1187.

    Information on MIC heterogenous offload can be found on these slides.

    Explore usablity of ispc for SIMD kernels

    Contact: ???

    Description: The Intel ispc is an open-source compiler for a C-based SPMD programming language and compiles efficient code for SIMD units and Intel Mic. It would be interesting to port some of the GROMACS SIMD force kernels (e.g. group or Verlet scheme non-bonded or bonded) and see what advantages/disadvantages does this SPMD compiler provide.

    Explore usablity of OpenCL for CPU SIMD and GPU kernels

    Contact: mark.abraham@scilifelab.se

    Description: OpenCL is currently the only platform that offers the possiblity of having a common language and programming model with the ability to target a wide range of platforms from SIMD capable CPUs to accelerators to more exotic chips like the Adapteva Parallela. Hence, our aim is to enable the use of OpenCL in GROMACS by developing the required infrastructure (build system, "glue"/device management code, etc.) as well as experiment with porting compute kernels to various architectures. While it is known that OpenCL can in many cases not compete with a native optimization (e.g AVX for CPUs or CUDA for NVIDIA GPUs), having the infrastructure ready and being able to try out new optimizations or some promising hardware would be of great benefit.

    The work could start with setting up build system and general OpenCL device/data management infrastructure and by porting the current CUDA kernels which should be straightforward. Currently, the main platforms of interest are (non-exhaustive list, we're open for suggestions):

    • CUDA GPUs
    • AMD GPUs and APUs
    • Intel/AMD CPUs (SIMD acceleration: SSE/AVX)
    • Intel GPUs (why not :)
    • Samsung Exynos 5 / Cortex A5 with ARM Mali-T600 (used in the the Mont Blanc project's third gen machine)

    Implement native seconary structure analysis (e.g. DSSP)

    Contact: mark.abraham@scilifelab.se

    Description: Detecting how secondary structure of proteins changes over the course of a simulation is often an important stage in analysis. There are several widely accepted geometric definitions of secondary structure, but DSSP. Currently, GROMACS uses an external executable to assign secondary structure defined by DSSP on each frame of one of our trajectories. It would be nicer if we could implement that in our own code in our new C++ analysis framework, so we can do such analyses faster, or in parallel, or offer different secondary structure assignment schemes. Also, more flexible output schemes to visualize the secondary structure assignment, and/or post-process would be valuable to GROMACS users.

    Identify and output a more general matrix format for analysis tools to write

    Contact: mark.abraham@scilifelab.se

    Description: Various GROMACS analysis tools write their output in a matrix. Currently, that is in an X Pixmap. It would be much better to be able to get a matrix of actual numbers so that further post-processing, graphing or visualization by different tools is possible. This would require identifying a suitable flexible format, and implementing code to write it in the new C++ analysis framework.

    Modularize pdb2gmx (and friends)

    Contact: mark.abraham@scilifelab.se

    Description: Users of GROMACS normally need to do considerable pre-processing to construct a starting configuration of atoms and then infer/define the chemical bonding topology so that a model physics can be constructed for it. There are lots of tricky details of

    • naming conventions of residues and atoms and inferring chemical metadata from them,
    • building missing atoms (often the source is an experiment, which cannot always locate every atom, or do so precisely),
    • setting up suitable protein/polymer termini ("do what I mean" is very hard to write code to do!)
    • dealing with inputs that describe logically distinct protein/polymer chains, and those that describe a single chain that is somehow broken! 
    • checking whether a sanitized input can be expressed by the chosen force field (i.e. the model physics the user would like to use)
    • providing a framework for the user to extend the range of protein/polymer building blocks
    • cross-linking the building blocks in flexible ways

    Currently pdb2gmx handles all the above details, and defers other issues to separate tools, e.g.

    • constructing periodic boxes is done by editconf
    • constructing solvent boxes, solvating structures, and inserting individual molecules is all done with genbox
    • replication of structures is done by genconf
    • replacing solvent molecules by other molecules (ions) is done by genion
    • generation of atomic restraints with genrestr

    The grompp simulation pre-processor then takes the output of a series of preparation stages from the above tools and checks that the final configuration, topology and model physics make sense and can be simulated. Then it writes a portable binary output file from which a simulation can actually start.

    There is far too much functionality in pdb2gmx (and probably its friends) for one tool, which violates the Unix philosophy of "do one thing well." It is not easy to maintain or extend. So thinking about how best to compartmentalize its functionality will be very useful. Many forcefields have parameterization tools that will extend them, and there is separate work underway to try to make it easier to incorporate those topology fragments into GROMACS simulations.

    Implement new clustering methods, perhaps on GPUs

    Contact: mark.abraham@scilifelab.se

    Description: Analysis of a GROMACS simulation often wants to identify how often a set of conformations arose over the duration. Often the set of conformations will not be known in advance. So various kinds of pattern-recognition techniques can be useful to detect those conformations, their frequencies and the rates of transitions between them. There are various "clustering" techniques that rely on computing differences between individual configurations, but these do not scale to the millions of individual configurations that might be sampled in a molecular dynamics simulations, because it is rarely feasible to either cache the whole matrix of differences, or recompute differences as required. Investigating good methods for grouping data from complex data sets is necessary. Then we'd have to work out how to either pre-process GROMACS data to use existing software, or to make new implementations, possibly in parallel, or using SIMD, or using GPUs in order to leverage new computational power to make algorithms tractable.

    Extend the applicability of GROMACS multi-simulations to GPUs

    Contact: mark.abraham@scilifelab.se

    Description: A single simulation of a process rarely produces valid statistics about the observed phenomena, so often GROMACS users seek to do multiple simulations. There are algorithms such as REMD that exploit running multiple distinct simulations simultaneously. Sometimes you can make better use of a large computational facility by packaging up many small simulations as one big simulation. Accordingly, GROMACS implements a multi-simulation option that runs distinct simulations in distinct MPI communicators. The new GPU implementation and associated detection in GROMACS 4.6 is not fully integrated with the multi-simulation technology, and this limits users ability to get the most from their GPU hardware in the above kinds of situations. Fixing this would require generalizing the way users can specify on the mdrun command line which GPUs should be used by which components of a multi-simulation, in a way that can cope with a simulation spread over more than one physical node, or more than one simulation within a physical node, where a physical node might have more than one GPU available.

    Page last modified 15:00, 27 Mar 2013 by mabraham