Fortran Routines

The technique described above for the implementation of the Fourier transforms implies strong limitations for the lattice size, both due to addressing issues and due to memory requirement issues. We will examine here these limitations, and we will see that the technique is still usable for lattices with the usual sizes in the important cases $d=3$ and $d=4$. For these lattices a set of routines implementing the transforms in the fastest possible way is supplied. For larger lattices a different set of routines is supplied, in which the indexation matrices $\mathbb{I}^{(t)}_{\kappa\iota}$ and $\mathbb{I}^{(i)}_{\iota\kappa}$ are not implemented directly in memory, but rather by functions that dynamically calculate and return the index $\tau$ of the phases, appropriately shifted, given values of the coordinate indices $\kappa$ and $\iota$. These routines are about $4$ to $5$ times slower, but can be used for very large lattices, with much smaller memory requirements.

First of all, note that since the indexation matrices are defined as $2$-byte signed integers, they are limited to the range $[-32767,32767]$, that is, to $65535$ different values. Therefore this last number is the maximum size of the indexed array of phases $P(\tau)$. This implies that there are limits to the possible lattice sizes $N$ in each dimension $d$, as shown in the following table:

$d$ $N_{\rm max}$ RAM$_{\rm max}$
1 181 64 kB
2 128 520 MB
3 105 2497 GB
4 91 8560 TB
5 81 22 EB

Here the corresponding maximum required amounts of RAM are also shown and the order of the unit multipliers is k, M, G, T, P and E, ranging from $2^{10}$ to $2^{60}$, so we see that in the case of the higher dimensions the memory requirements are truly enormous for the larger lattices still allowed by this indexing limitation. Specially for dimensions $d=4$ and $d=5$, these are larger lattices than one can possibly expect to run successfully on current computers. On $32$-bit processors there is another type of limitation on the values of $N$, which is far more serious for the larger values of $d$. The limitations of the signed $32$-bit integer addressing of the indexation matrices limits their size, and hence the size of the lattice, as shown in the following table:

$d$ $N_{\rm max}$ RAM$_{\rm max}$
1 32767 2 GB
2 181 2 GB
3 31 2 GB
4 13 2 GB
5 7 2 GB

In this case the addressing limitations correspond to a fixed maximum required memory, of course. In addition to this, the limitations of the signed $32$-bit integer addressing of the indexed array of phases also limits its size and the size of the lattice, but this one is a significant limitation only for the case $d=1$, which is relatively irrelevant to us, as shown in the following table:

$d$ $N_{\rm max}$ RAM$_{\rm max}$
1 11585 2 GB
2 8192 2 GB
3 6689 2 GB
4 5793 2 GB
5 5181 2 GB

Therefore, we see that this last limitation can be safely ignored. The corresponding limitations in the case of a $64$-bit processor with $48$-bit addressing are smaller, and not so significant, since in this case the memory availability limitations become relevant much before the addressing limitations themselves. The limits due to the $48$-bit addressing of the indexation matrices are shown in the following table:

$d$ $N_{\rm max}$ RAM$_{\rm max}$
1 8388607 128 TB
2 2896 128 TB
3 203 128 TB
4 53 128 TB
5 24 128 TB

The limits due to the $48$-bit addressing of the indexed array of phases are shown in the following table:

$d$ $N_{\rm max}$ RAM$_{\rm max}$
1 2965821 128 TB
2 2097152 128 TB
3 1712317 128 TB
4 1482910 128 TB
5 1326355 128 TB

If we consolidate all these indexing and addressing limitations in a single table, we see that the limits on a $32$-bit processor with $32$-bit logical addressing are given by the table below:

$d$ $N_{\rm max}$ RAM$_{\rm max}$
1 181 64 kB
2 128 520 MB
3 31 2 GB
4 13 2 GB
5 7 2 GB

The same limits on a $64$-bit processor with $48$-bit logical addressing are given by the table below:

$d$ $N_{\rm max}$ RAM$_{\rm max}$
1 181 64 kB
2 128 520 MB
3 105 2497 GB
4 53 128 TB
5 24 128 TB

A typical larger lattice in $d=4$ would have $N=10$, and in this case the transform can be used with reasonable memory requirements, about $200$ MB for a single indexation matrix, and about $400$ MB for two indexation matrices. A lattice of $N=20$ in $d=3$ can be used with memory requirements of about $128$ MB for one matrix and of about $256$ MB for two matrices. Therefore, we conclude that the fast version of the transforms can be used in practice in most simulations. Whenever there is not enough memory available, one can turn to the alternative routines.

There are Fortran routines available implementing the direct and inverse Fourier transforms, as described in this paper. There are also a few test programs that may be useful as examples of how to use the routines. As they currently stand, these routines are written for use in either $32$-bit or $64$-bit processors. However, sufficiently large lattices may require a $64$-bit processor, as well as lots of available memory. The files described in what follows, containing the source code, are freely available in a compressed tar file at the URL:

http://latt.if.usp.br/technical-pages/fmrsf/

There are files with the definition of the necessary data structures, as well as the initialization routines and the transforms themselves. These modules are meant to be integrated into larger programs at a source-code level. The files containing the data structures are include files, meant to be included in the modules that need access to the data structures they contain. All interchange of data among modules is made by means of common blocks. Each initialization routine defines the data in the common block in the corresponding include file. These initialization routines should be called once at the beginning of the program, one for each common block that is used in the program. Read the makefile or the source code in order to see which modules depend on which data structures.

Here are short explanations of the nature of each source-code file. First the files with the data structures:

fp_def.f:
an include file with the basic parameters defining the run; this is needed by all modules.

fp_cal.f:
an include file with other parameters, which are calculated using the basic ones; this is needed by most modules.

cb_param.f:
an include file with a common block containing additional calculated parameters.

cb_flags.f:
an include file with a common block containing the dimensionality flags.

cb_logic.f:
an include file with a common block containing the logical array for marking the sites as carrying either their real parts or their imaginary parts.

cb_field.f:
an include file with a common block containing the field components in position space and in momentum space.

cb_phase.f:
an include file with a common block containing the indexed array for the phases.

cb_trans.f:
an include file with a common block containing the large indexing matrix for the direct transform.

cb_trinv.f:
an include file with a common block containing the large indexing matrix for the inverse transform.

cb_traux.f:
an include file with a common block containing the auxiliary indexing array for the inverse transform.

The files containing the main routines, including the initialization routines and the transforms themselves:

check_pars.f:
a subroutine that verifies consistency of the basic parameters defining the run.

check_pars_32.f:
a subroutine that verifies consistency of the basic parameters regarding indexation and addressing issues on a $32$-bit machine with $32$-bit addressing capability.

check_pars_48.f:
a subroutine that verifies consistency of the basic parameters regarding indexation and addressing issues on a $64$-bit machine with $48$-bit addressing capability.

init_param.f:
a subroutine that initializes the common block in cb_param.f.

init_flags.f:
a subroutine that initializes the common block in cb_flags.f.

init_logic.f:
a subroutine that initializes the common block in cb_logic.f.

init_phase.f:
a subroutine that initializes the common block in cb_phase.f.

init_trans.f:
a subroutine that initializes the common block in cb_trans.f.

init_trinv.f:
a subroutine that initializes the common block in cb_trinv.f.

init_traux.f:
a subroutine that initializes the common block in cb_traux.f.

indphtrn.f:
a function that replaces the large indexing matrix in the common block in cb_trans.f.

indphinv.f:
a function that replaces the large indexing matrix in the common block in cb_trinv.f.

fourier_trans_fast.f:
a subroutine that implements the fast version of the direct Fourier transform.

fourier_inver_fast.f:
a subroutine that implements the fast version of the inverse Fourier transform.

fourier_trans_large.f:
a subroutine that implements the direct Fourier transform with much smaller memory requirements, for use on large lattices.

fourier_inver_large.f:
a subroutine that implements the inverse Fourier transform with much smaller memory requirements, for use on large lattices.

fourier_trans_slow.f:
a subroutine that implements the fast version of the direct Fourier transform in a much slower but somewhat less memory-intensive way.

fourier_inver_slow.f:
a subroutine that implements the fast version of the inverse Fourier transform in a much slower but somewhat less memory-intensive way.

Files containing the test programs and some auxiliary routines, including a couple of C interfaces to system facilities:

test_trans_inver_fast.f:
a program that composes the fast versions of the transform and of the inverse, in this order, and measures the resulting average error.

test_inver_trans_fast.f:
a program that composes the fast versions of the inverse and of the transform, in this order, and measures the resulting average error.

test_trans_inver_large.f:
a program that composes the large-lattice versions of the transform and of the inverse, in this order, and measures the resulting average error.

test_inver_trans_large.f:
a program that composes the large-lattice versions of the inverse and of the transform, in this order, and measures the resulting average error.

test_trans_inver_slow.f:
a program that composes the slow versions of the transform and of the inverse, in this order, and measures the resulting average error.

test_inver_trans_slow.f:
a program that composes the slow versions of the inverse and the transform, in this order, and measures the resulting average error.

urandom_ctof.c:
a Fortran-callable C interface to the system libraries, to get a random seed from the Linux kernel.

clock_ctof.c:
a Fortran-callable C interface to the system libraries, to get the CPU time of the current process from the system.

dranr-1.f:
the first random number generator from the authors of the book ``Numerical Recipes''.

dranr-2.f:
the second random number generator from the authors of the book ``Numerical Recipes''.

sdot.f:
a simple function that returns the scalar product of two vectors.