
FFTW3DTable of contentsNo headersFirst, I think we should separate the issue of domain decomposition from FFT, so the first task would be to create a truly parallel 3D FFT. Currently, the algorithm I implemented is essentially the same as used in FFTW2, where the grid is only decomposed along the xaxis. This is bad for two reasons: (i) it limits the number of nodes we can use for PME, and (ii) even when the directspace system has beautiful balanced domain decomposition they have to send coordinates to a more or less orthogonal xaxis decomposition in fourier space. What we'd like is simply an algorithm that decomposed FFTs in 1D/2D/3D. This is not rocket science in terms of mathematics, but requires a bit of juggling with data and communication to make sure things end up in the right place. There is a pretty good but superficial paper by Eleftheriou that describes the highlevel approach, but essentially it's a matter of:
It is probably useful to look at the current gmx_parallel_3dfft implementation. In a domaindecomposition version the transposes/communication is really the same operation, although there might be a need to write a small local subtranspose routine too. Since the parallel version will likely use much less memory per node and memory is cheap anyways, it is probably simpler to use separate in/out arrays instead of worrying about inplace permutations of data. Lindahl 09:54, 2 July 2007 (CEST) It is probably good to look into this paper: I. J. Bush and I. T. Todorov and W. Smith, A daft DL_POLY1 distributed memory adaptation of the smoothed particle mesh Ewald method, Comput. Phys. Commun., 175, 323329 (2006), making use of a novel 3D Fast Fourier Transform (DAFT) [ I.J. Bush, The Daresbury Advanced Fourier transform, Daresbury Laboratory, 1999]. Spoel 14:45, 2 July 2007 (CEST) Steve Plimpton's implementation of parallel 3D FFTs may also be worth a look: http://www.sandia.gov/~sjplimp/docs/fft/README.html 