# FFTW-3D

Version as of 00:24, 25 Aug 2019

to this version.

View current version

First, I think we should separate the issue of domain decomposition from FFT, so the first task would be to create a truly parallel 3D FFT.

Currently, the algorithm I implemented is essentially the same as used in FFTW2, where the grid is only decomposed along the x-axis. This is bad for two reasons: (i) it limits the number of nodes we can use for PME, and (ii) even when the direct-space system has beautiful balanced domain decomposition they have to send coordinates to a more or less orthogonal x-axis decomposition in fourier space.

What we'd like is simply an algorithm that decomposed FFTs in 1D/2D/3D. This is not rocket science in terms of mathematics, but requires a bit of juggling with data and communication to make sure things end up in the right place. There is a pretty good but superficial paper by Eleftheriou that describes the high-level approach, but essentially it's a matter of:

• Starting with a 3D domain decomposed grid
• Communicate so each node has whole arrays e.g. along the z axis
• Do 1D FFTs
• Communicate/transponse so each node has arrays e.g. along the y axis
• Do 1D FFTs
• Communicate/transponse so each node has arrays along the last axis
• Do 1D FFTs
• Don't communicate more, do the convolution in transposed coordinates
• Repeat everything backwards.

It is probably useful to look at the current gmx_parallel_3dfft implementation. In a domain-decomposition version the transposes/communication is really the same operation, although there might be a need to write a small local subtranspose routine too. Since the parallel version will likely use much less memory per node and memory is cheap anyways, it is probably simpler to use separate in/out arrays instead of worrying about in-place permutations of data.

--Lindahl 09:54, 2 July 2007 (CEST)

It is probably good to look into this paper: I. J. Bush and I. T. Todorov and W. Smith, A daft DL_POLY1 distributed memory adaptation of the smoothed particle mesh Ewald method, Comput. Phys. Commun., 175, 323-329 (2006), making use of a novel 3D Fast Fourier Transform (DAFT) [ I.J. Bush, The Daresbury Advanced Fourier transform, Daresbury Laboratory, 1999]. --Spoel 14:45, 2 July 2007 (CEST)

Steve Plimpton's implementation of parallel 3D FFTs may also be worth a look: http://www.sandia.gov/~sjplimp/docs/fft/README.html