The Transform and its Inverse

Let us discuss now, in detail, how to implement numerically the finite Fourier transform and its inverse transform. The complete Fourier transform of a real scalar field is given by


\begin{displaymath}
\widetilde\varphi (\vec{k})=\frac{1}{N^{d}}\sum_{\vec{n}}
e^{\imath\frac{2\pi}{N}\vec{k}\cdot\vec{n}}\varphi(\vec{n}),
\end{displaymath}

but we will only calculate and store the independent real and imaginary parts, $\widetilde\varphi (\vec{k})=\mathfrak{R}(\vec{k})+\imath\mathfrak{I}(\vec{k})$, which are given by

\begin{eqnarray*}
\mathfrak{R}(\vec{k}) & = & \frac{1}{N^{d}}\sum_{\vec{n}}
\cos...
...\left(\frac{2\pi}{N}\vec{k}\cdot\vec{n}\right) \varphi(\vec{n}).
\end{eqnarray*}


For use in the numerical programs we will implement these in matrix form, using a single (usually very large) $N^{d}\times N^{d}$ transformation matrix,


\begin{displaymath}
\widetilde\varphi _{\kappa}=\frac{1}{N^{d}}
\sum_{\iota=1}^{N^{d}}F_{\kappa\iota}\varphi_{\iota},
\end{displaymath}

where $\widetilde\varphi _{\kappa}$ stands for either a real part or an imaginary part, depending on the value of the index $\kappa$, according to our classification of the independent variables in momentum space, and where, accordingly,


\begin{displaymath}
F_{\kappa\iota}=\cos\!\left(\frac{2\pi}{N}\vec{k}\cdot\vec{n...
...pa\iota}=\sin\!\left(\frac{2\pi}{N}\vec{k}\cdot\vec{n}\right),
\end{displaymath}

where $\kappa$ corresponds to $\vec{k}$ and $\iota$ to $\vec{n}$ via the respective indexation schemes. Since $\vec{k}\cdot\vec{n}$ has a limited set of possible values, much smaller than the number of elements $N^{2d}$ of $F_{\kappa\iota}$, and which will be used repeatedly for many different modes, in order to save both computation time and memory, we will calculate all necessary sine and cosine functions just once, beforehand, for future repeated use. We will store these values in a relatively small array $P(\tau)$ indexed by the integer $\tau=\vec{k}\cdot\vec{n}$, and construct a very large indexing matrix $\mathbb{I}_{\kappa\iota}$, of size $N^{d}\times N^{d}$, to pick up the necessary values of $P(\tau)$ for each combination of a mode in momentum space and a site in position space. Since this indexing matrix can be of $2$-byte integers rather than of the $8$-byte double-precision real numbers needed for the values of $P(\tau)$, in this way we save memory at the ratio of almost four to one. The range of the integer $\vec{k}\cdot\vec{n}$ is given by


\begin{displaymath}
d k_{m}N\leq(\vec{k}\cdot\vec{n})\leq d k_{M}N,
\end{displaymath}

so that the size of the small double-precision array is given by $d Nk_{M}-d Nk_{m}+1$, where $k_{M}-k_{m}=N-1$ for either odd or even $N$, so that the size of this array is $d N(N-1)+1$, of the order of $d N^{2}$, much smaller than $N^{2d}$ for the values of $d$ of most interest, from $d=3$ to $d=5$. When we deal with the values of $P(\tau)$ using this array, not all values along the array are actually going to be used, because not all integers in the range can in fact be written as $\vec{k}\cdot\vec{n}$. However, it can be verified that the rate of useful occupation of the array is large for the dimensions of interest, in fact, it is over $90 \%$ for dimensions from $d=3$ to $d=5$. Therefore, we have here only a very small waste of memory space, and in the interest of simplicity it is not worth while to try to improve this occupation rate.

Since we must store the values of both the sines and the cosines, we actually have to double the size of the array as it was described above. In order to put the cosines at positive values of the index and the sines at negative values, we add and subtract constants in order to use the ranges that follow,


\begin{displaymath}
-d N(N-1)-1\leq\left[\vec{k}\cdot\vec{n}-(d Nk_{M}+1)\right]\leq-1,
\end{displaymath}

to be used for the sines, and


\begin{displaymath}
1\leq\left[\vec{k}\cdot\vec{n}+(-d Nk_{m}+1)\right]\leq d N(N-1)+1,
\end{displaymath}

to be used for the cosines, so that the total range is given by


\begin{displaymath}
-d N(N-1)-1=\tau_{m}\leq\tau\leq\tau_{M}=d N(N-1)+1,
\end{displaymath}

The total size of the array is now given by $2d N(N-1)+3$, which is still much smaller than the indexing matrix. The indexing matrix $\mathbb{I}_{\kappa\iota}$ will return the values of $\tau$ in this range, and is to be used as the argument of $P(\tau)$, given values of $\kappa$ and $\iota$, so that we have


\begin{displaymath}
F_{\kappa\iota}=P(\mathbb{I}_{\kappa\iota}),
\end{displaymath}

and the transformation can now be written as


\begin{displaymath}
\widetilde\varphi _{\kappa}=\frac{1}{N^{d}}
\sum_{\iota=1}^{N^{d}}P(\mathbb{I}_{\kappa\iota})\varphi_{\iota},
\end{displaymath}

which includes all the independent real and imaginary parts, depending on the values of $\kappa$, in a single matrix operation, which is easy to implement efficiently.

Let us now examine the situation concerning the inverse transform. It is clear that a similar scheme should be used for it. Since the indexing matrix $\mathbb{I}_{\kappa\iota}$ is so large, and considering the possibility that a single program might need to use both the transform and its inverse, it is important to build the scheme for the inverse transform in such a way that it is able to use the same indexing matrix. As we will see, it is possible to do this, but unfortunately not in a very efficient way. For speed of execution the order of the indices matters, and should in fact be $\mathbb{I}_{\iota\kappa}$ for the direct transform and $\mathbb{I}_{\kappa\iota}$ for the inverse transform. Hence, one can save memory only so long as one needs only one of the two matrices or so long as the program is not dependent on the performance of one of the two transforms. The inverse Fourier transform is defined by


\begin{displaymath}
\varphi(\vec{n})=\sum_{\vec{k}}
e^{-\imath\frac{2\pi}{N}\vec{k}\cdot\vec{n}}\widetilde\varphi (\vec{k}),
\end{displaymath}

or, in $N^{d}$-dimensional matrix form,


\begin{displaymath}
\varphi_{\iota}=
\sum_{\kappa=\kappa_{m}}^{\kappa_{M}}F^{-1}_{\iota\kappa}\widetilde\varphi _{\kappa},
\end{displaymath}

where the limits of the sum over the momentum index are $\kappa_{m}=(N^{d}-1)k_{m}/(N-1)$ and $\kappa_{M}=(N^{d}-1)(1+k_{m}/(N-1))$. We must now deal explicitly with the fact that both the Fourier component of the field and the mode function are complex. Since we have the decomposition of the field $\widetilde\varphi (\vec{k})=\mathfrak{R}(\vec{k})+\imath\mathfrak{I}(\vec{k})$, as well as


\begin{displaymath}
e^{-\imath\frac{2\pi}{N}\vec{k}\cdot\vec{n}}=
\cos\!\left(\f...
...)
-\imath\sin\!\left(\frac{2\pi}{N}\vec{k}\cdot\vec{n}\right),
\end{displaymath}

it follows that we have

\begin{eqnarray*}
\varphi(\vec{n}) & = & \sum_{\vec{k}}\left[
\cos\!\left(\frac{...
...2\pi}{N}\vec{k}\cdot\vec{n}\right)\mathfrak{I}(\vec{k}) \right].
\end{eqnarray*}


Since the field $\varphi(\vec{n})$ is real, the sum giving its imaginary part must vanish, and we are left with


\begin{displaymath}
\varphi(\vec{n})=\sum_{\vec{k}}\left[
\cos\!\left(\frac{2\pi...
...pi}{N}\vec{k}\cdot\vec{n}\right)\mathfrak{I}(\vec{k}) \right].
\end{displaymath}

We must now consider separately the contributions to the sum from the real and complex modes. From the purely real modes we get the contributions


\begin{displaymath}
\sum_{\mbox{\scriptsize real }\vec{k}}
\cos\!\left(\frac{2\pi}{N}\vec{k}\cdot\vec{n}\right)\mathfrak{R}(\vec{k}),
\end{displaymath}

while from the complex modes we get the contributions


\begin{displaymath}
\sum_{\mbox{\scriptsize complex }\vec{k}}\left[
\cos\!\left(...
...pi}{N}\vec{k}\cdot\vec{n}\right)\mathfrak{I}(\vec{k}) \right].
\end{displaymath}

Since we keep only one copy of each independent $\mathfrak{R}(\vec{k})$ and one copy of each independent $\mathfrak{I}(\vec{k})$, while the sum above runs over both parts on all the complex modes, in the case of these modes we must multiply the contribution by a factor of two. In this way we get the correct result when we run over the modes and use only the part which is stored in each one. So in reality we will calculate this contribution as


\begin{displaymath}
\sum_{\mbox{\scriptsize complex }\vec{k}}\left[
2\cos\!\left...
...pi}{N}\vec{k}\cdot\vec{n}\right)\mathfrak{I}(\vec{k})
\right].
\end{displaymath}

These factors of two, which do not appear in the real modes, prevent us from implementing the inverse transform as a single matrix operation. The best way to implement it will be to first perform a complete matrix operation with the factors of two for all modes, and then subtract back once the contributions of the real modes,

\begin{eqnarray*}
\varphi(\vec{n}) & = & 2\sum_{\vec{k}}\left[
\cos\!\left(\frac...
...t(\frac{2\pi}{N}\vec{k}\cdot\vec{n}\right)\mathfrak{R}(\vec{k}).
\end{eqnarray*}


This second sum runs over only either $\varrho_{M}=1$ or $\varrho_{M}=2^{d}$ elements, depending on the parity of $N$, and is therefore much smaller than the first one, which runs over $N^{d}$ elements, specially for large lattices, representing therefore only a small additional computational effort. In order to do this it will be necessary to construct a small indexing array to pick the purely real modes out of the indexed vector of modes. Note that the first term of the inverse transformation matrix $F^{-1}_{\iota\kappa}$, as decomposed above, is equal to $2F_{\kappa\iota}$. In order to deal with the second term, consider an indexing array $\lambda(\varrho)$ of dimension $\varrho_{M}$, which for each $\varrho=1,\ldots,\varrho_{M}$, with $\varrho_{M}=1$ for odd $N$ and $\varrho_{M}=2^{d}$ for even $N$, returns the value of the index $\kappa$ of a purely real mode. Using it we can write the inverse transform as


\begin{displaymath}
\varphi_{\iota}=
2\sum_{\kappa=\kappa_{m}}^{\kappa_{M}}P(\ma...
...{\lambda(\varrho)\iota})\widetilde\varphi _{\lambda(\varrho)}.
\end{displaymath}

We have therefore the inverse transformation implemented by a large matrix operation followed by a small correction, which is still easy to implement efficiently. In the computer code we will actually have two indexation matrices $\mathbb{I}^{(t)}_{\kappa\iota}$ and $\mathbb{I}^{(i)}_{\iota\kappa}=\mathbb{I}^{(t)}_{\kappa\iota}$, differing only by the ordering of the indices, one for the direct transformation and one for the inverse transformation, so that both can be executed efficiently.