Table of contents
    No headers

    First, I think we should separate the issue of domain decomposition from FFT, so the first task would be to create a truly parallel 3D FFT.

    Currently, the algorithm I implemented is essentially the same as used in FFTW2, where the grid is only decomposed along the x-axis. This is bad for two reasons: (i) it limits the number of nodes we can use for PME, and (ii) even when the direct-space system has beautiful balanced domain decomposition they have to send coordinates to a more or less orthogonal x-axis decomposition in fourier space.

    What we'd like is simply an algorithm that decomposed FFTs in 1D/2D/3D. This is not rocket science in terms of mathematics, but requires a bit of juggling with data and communication to make sure things end up in the right place. There is a pretty good but superficial paper by Eleftheriou that describes the high-level approach, but essentially it's a matter of:

    • Starting with a 3D domain decomposed grid
    • Communicate so each node has whole arrays e.g. along the z axis
    • Do 1D FFTs
    • Communicate/transponse so each node has arrays e.g. along the y axis
    • Do 1D FFTs
    • Communicate/transponse so each node has arrays along the last axis
    • Do 1D FFTs
    • Don't communicate more, do the convolution in transposed coordinates
    • Repeat everything backwards.

    It is probably useful to look at the current gmx_parallel_3dfft implementation. In a domain-decomposition version the transposes/communication is really the same operation, although there might be a need to write a small local subtranspose routine too. Since the parallel version will likely use much less memory per node and memory is cheap anyways, it is probably simpler to use separate in/out arrays instead of worrying about in-place permutations of data.

    --Lindahl 09:54, 2 July 2007 (CEST)

    It is probably good to look into this paper: I. J. Bush and I. T. Todorov and W. Smith, A daft DL_POLY1 distributed memory adaptation of the smoothed particle mesh Ewald method, Comput. Phys. Commun., 175, 323-329 (2006), making use of a novel 3D Fast Fourier Transform (DAFT) [ I.J. Bush, The Daresbury Advanced Fourier transform, Daresbury Laboratory, 1999]. --Spoel 14:45, 2 July 2007 (CEST)

    Steve Plimpton's implementation of parallel 3D FFTs may also be worth a look:

    Page last modified 00:21, 13 Oct 2009 by JLemkul?