Project ideas

    This page describes projects related to GROMACS which are up for grabs by anyone wanting to contribute or do some research on one of  the proposed the scientific, HPC, or scientific topics.

    Rules of the game - how to join in the fun?

    The GROMACS development community is distributed across the globe, so lots of our discussion takes place in written forums. We do have occasional teleconferences, as announced on the developers mailing list. People interested in taking part are highly encouraged to

    • join our mailing list for general discussion, how-to questions, project suggestions, etc. (please do introduce yourself there!),
    • register on our Redmine project management site and start/join topic-focussed discussions there, and
    • register on our Gerrit code review site with an OpenID (e.g. gmail account), get a feel for the active development there, and contribute to the ongoing code review!

    When you're ready to start hacking on a project, check out our Gerrit page for details on getting the git version of our source code and being ready to interact with our gerrit code review management tool later on. There's no development manual per se, but we hope to start building one soon. The best way to learn about how things work is to ask some general questions in an appropriate place, do your background reading in the manual, and break out a debugger and see it work.

    Most of the highly active developers are physically located in Sweden, in the research groups of Professors Erik Lindahl, Berk Hess and David van der Spoel. We do try to be as open about the decisions we make in Stockholm as we reasonably can. We don't want to be a dictatorship, but we can't afford to be a Polish parliament either! There are priorities driven by the needs of our research-based funding, and that will limit our ability to support others' projects and interests.

    You will be welcome to the copyright of any body of new code you contribute. New code that goes into core parts of GROMACS that we are likely to maintain on an ongoing basis will likely also be copyrighted by members of the GROMACS team. The code as a whole will be distributed under the terms of the LGPL v2.1 open-source license, and code is contributed on the understanding that you agree to distribution using that license.

    During 2013, we are planning to convert the code base from C to the minimum of C++ functionality we think delivers us long-term development value. This will not be a ground-up re-write, but rather introducing some C++ APIs in a gradual manner. So, there will be some API instability in our  while we are doing this. Projects that have or will fork from GROMACS at or before the 4.6 major release will have re-integration issues to manage at some point in their lifetime. Accordingly, we encourage new development to base from our master branch, and to manage integration issues gradually. That way you get a chance to see if the new API design omits something that you really need, and you get to see that in time to lobby for a change!

    Possible projects

    Intel MIC support: implementing asymmetric offload mechanism

    Project in progress, if you're interested in helping out with testing or exploring further kernel or offload optmimations, join the discussion on the redmine isseus linked below.

    Contact: Roland Schulz

    Description: In order to use Intel MIC (Xeon Phi) accelerators efficiently, an asynchronous task offloading to the accelerator is required - similar to the asynchronous execution model implemented with NVIDIA GPUs. This would involve implementing a heterogenous offload mechanism of accelerated short-range (non-bonded and bonded) force kernels. While simple offload mechanisms do exist for MIC, these lack the flexibility and performance required in GROMACS. The optimal solution is to implement a coarse, high-level, task-parallelization which separates particle-particle workload (bonded and non-bonded force calculations) the same way as PME "ranks" separate workload and enable execution on dedicated MPI ranks.

    For further details see the related feature requests: 11811187 and 1394.

    Information on MIC heterogenous offload can be found on these slides.

    Explore usablity of ispc for SIMD kernels

    Contact: ???

    Description: The Intel ispc is an open-source compiler for a C-based SPMD programming language and compiles efficient code for SIMD units and Intel Mic. It would be interesting to port some of the GROMACS SIMD force kernels (e.g. group or Verlet scheme non-bonded or bonded) and see what advantages/disadvantages does this SPMD compiler provide.

    Likely skill requirements: understanding the role of these kernels in molecular dynamics, C, ispc

    Explore usablity of OpenCL for CPU SIMD and GPU kernels


    Description: OpenCL is currently the only platform that offers the possiblity of having a common language and programming model with the ability to target a wide range of platforms from SIMD capable CPUs to accelerators to more exotic chips like the Adapteva Parallela. Hence, our aim is to enable the use of OpenCL in GROMACS by developing the required infrastructure (build system, "glue"/device management code, etc.) as well as experiment with porting compute kernels to various architectures. While it is known that OpenCL can in many cases not compete with a native optimization (e.g AVX for CPUs or CUDA for NVIDIA GPUs), having the infrastructure ready and being able to try out new optimizations or some promising hardware would be of great benefit.

    The work could start with setting up build system and general OpenCL device/data management infrastructure and by porting the current CUDA kernels which should be straightforward. Currently, the main platforms of interest are (non-exhaustive list, we're open for suggestions):

    • CUDA GPUs
    • AMD GPUs and APUs
    • Intel/AMD CPUs (SIMD acceleration: SSE/AVX)
    • Intel GPUs (why not :)
    • Samsung Exynos 5 / Cortex A5 with ARM Mali-T600 (used in the the Mont Blanc project's third gen machine)

    Likely skill requirements: OpenCL, CUDA, C

    Implement GPU code for long-ranged component of PME


    Description: Treating "long-range" electrostatic interactions is often crucial for molecular simulations. Algorithms such as PME break that up into components that can be computed efficiently by different kinds of kernels. The short-ranged kernels have been ported to GPUs and produce large speedups (even over existing optimized SIMD code), but the FFT-based long-ranged kernels remain as further work. The AMBER molecular dynamics toolkit already has such an implementation. Writing (preferably) OpenCL kernels to compute the long-ranged component of PME would be extremely beneficial for the long term scalability of this important simulation algorithm. This will involve complex interactions with parallelization layers such as MPI, and dealing with the existing domain decomposition of the simulation data.

    Likely skill requirements: understanding of the PME algorithm, FFTs, C, C++, OpenCL and/or CUDA

    Implement native secondary structure analysis (e.g. DSSP)


    Description: Detecting how secondary structure of proteins changes over the course of a simulation is often an important stage in analysis. There are several widely accepted geometric definitions of secondary structure, but DSSP. Currently, GROMACS uses an external executable to assign secondary structure defined by DSSP on each frame of one of our trajectories. It would be nicer if we could implement that in our own code in our new C++ analysis framework, so we can do such analyses faster, or in parallel, or offer different secondary structure assignment schemes. Also, more flexible output schemes to visualize the secondary structure assignment, and/or post-process would be valuable to GROMACS users.

    Likely skill requirements: understanding protein secondary structure at undergrad level, C++

    Identify and output a more general matrix format for analysis tools to write


    Description: Various GROMACS analysis tools write their output in a matrix. Currently, that is in an X Pixmap. It would be much better to be able to get a matrix of actual numbers so that further post-processing, graphing or visualization by different tools is possible. This would require identifying a suitable flexible format, and implementing code to write it in the new C++ analysis framework.

    Likely skill requirements: knowledge of graphing/visualization software, C++

    Modularize pdb2gmx (and friends)


    Description: Users of GROMACS normally need to do considerable pre-processing to construct a starting configuration of atoms and then infer/define the chemical bonding topology so that a model physics can be constructed for it. There are lots of tricky details of

    • naming conventions of residues and atoms and inferring chemical metadata from them,
    • building missing atoms (often the source is an experiment, which cannot always locate every atom, or do so precisely),
    • setting up suitable protein/polymer termini ("do what I mean" is very hard to write code to do!)
    • dealing with inputs that describe logically distinct protein/polymer chains, and those that describe a single chain that is somehow broken! 
    • checking whether a sanitized input can be expressed by the chosen force field (i.e. the model physics the user would like to use)
    • providing a framework for the user to extend the range of protein/polymer building blocks
    • cross-linking the building blocks in flexible ways

    Currently pdb2gmx handles all the above details, and defers other issues to separate tools, e.g.

    • constructing periodic boxes is done by editconf
    • constructing solvent boxes, solvating structures, and inserting individual molecules is all done with genbox
    • replication of structures is done by genconf
    • replacing solvent molecules by other molecules (ions) is done by genion
    • generation of atomic restraints with genrestr

    The grompp simulation pre-processor then takes the output of a series of preparation stages from the above tools and checks that the final configuration, topology and model physics make sense and can be simulated. Then it writes a portable binary output file from which a simulation can actually start.

    There is far too much functionality in pdb2gmx (and probably its friends) for one tool, which violates the Unix philosophy of "do one thing well." It is not easy to maintain or extend. So thinking about how best to compartmentalize its functionality will be very useful. Many forcefields have parameterization tools that will extend them, and there is separate work underway to try to make it easier to incorporate those topology fragments into GROMACS simulations.

    Likely skill requirements: understanding the process of preparing biomolecular systems for simulations, C++

    Implement new clustering methods, perhaps on GPUs


    Description: Analysis of a GROMACS simulation often wants to identify how often a set of conformations arose over the duration. Often the set of conformations will not be known in advance. So various kinds of pattern-recognition techniques can be useful to detect those conformations, their frequencies and the rates of transitions between them. There are various "clustering" techniques that rely on computing differences between individual configurations, but these do not scale to the millions of individual configurations that might be sampled in a molecular dynamics simulations, because it is rarely feasible to either cache the whole matrix of differences, or recompute differences as required. Investigating good methods for grouping data from complex data sets is necessary. Then we'd have to work out how to either pre-process GROMACS data to use existing software, or to make new implementations, possibly in parallel, or using SIMD, or using GPUs in order to leverage new computational power to make algorithms tractable.

    Likely skill requirements: pattern recognition, clustering, C++, OpenCL or CUDA,

    Simulation curation tool/database


    Description: Once a simulation is complete, there is normally a need to archive it together with descriptive metadata, so that is available in the future to colleagues. Just relying on an individual user's file system organization and naming scheme is insufficient. Normally publicly funded projects require their principal investigators to preserve the data for a period like 5 years, and that also means they have to be able to find it, and that means they have to require their employees to build a database of what simulations have been run and what is unique and common between those simulations. Conceptually, this is no more complex than curating your MP3 collection - you don't want to edit the files, but you need them organized by title, artist, genre, etc. so that you can find the song you want to listen to right now. It needs some kind of middleware so a user can drop the files in a shared filesystem and attach the metadata. Investigating what third-party technologies exist that could provide this, and writing code to make it easy for a user on the command line or at a web interface to manage the actual and meta data would likely be needed.

    Likely skill requirements: no specific requirements, other than a willingness to re-use existing wheels where they do part of the job well

    Multi-architecture binaries for x86


    Description: GROMACS simulation engine mdrun contains compute kernels that make extensive use of SIMD intrinsics supported on a wide range of HPC platforms. Currently, the target platform is fixed at configuration time, compiler flags are set accordingly, and hardware that does not support the machine instructions to which the intrinsics are compiled fails ungracefully. This makes life messy if a user's computing environment is heterogenous, and makes life difficult for GROMACS package maintainers for the various Linux distributions, and projects like Folding@Home that would like to avoid forcing users to download multiple binaries to avoid losing performance. Writing the glue code in CMake and C to make an mdrun binary (configurably) portable within the x86 family of processors would be very valuable for end users of GROMACS.

    Likely skill requirements: CMake, compiler-configuration, C

    Other highly MD-specific feature requests

    There are a number of other "wishes" that are very specific to particular MD contexts listed here. Many of these are infeasible until 2014 while we port a lot of the basic infrastructure to C++.

    Page last modified 18:52, 10 Jan 2014 by pszilard