necheff.net

Fortran Hater's Handbook, Part 1

This is part 1 of what I intend to be a more or less never ending series of all my grievances with the Fortran language, environment, and culture. I firmly believe Fortran is a horrible language and if you are thinking of implementing anything in it, don't; if you have the misfortune of inheriting responsibility for an existing product already written in it, you have my condolences.

As a first topic, lets look at how awful Fortran makes it to distribute a library so that your organization and/or consumers can reuse code.

As a first cut, we'll look at the solution available in f90, f95, and f2003. Then, we'll compare what that train wreck looks like compared to C, any C standard dating back to C99, probably C89 too, I just didn't bother testing that far back and no one should really be writting C89 anymore anyways. Finally, we'll look at the "best" Fortran has to offer by making use of facilities introduced in f2008.

A copy of the "best" Fortran solution, including a Makefile, can be found here.

The following examples have been tested on Debian 12.1 using the system distributed compiler toolchain which includes gfortran 12.2.0, binutils 2.40, and glibc 2.36 (yes, we need to be concerned with the standard C library because Fortran is incapable of hosting itself or supporting freestanding binaries and so it commonly uses C as a host - we'll talk about this in a later post when I dispel the myth of Fortran performance).

Other assumptions involve using only free-format code (nobody wants fixed-format). And to all who would say "But, you don't need to bother with this if you don't use modules!", I say, nonsense. Modules are how the Fortran Standards Committee have chosen to organize code and type definitions, modules are how the Fortran Standards Committee have chosen to champion "Modern Fortran" be implemented. We are going to use modules and look at how the language has failed to adapt to contemporary needs. Going back to a pre-1990's implementation of the language as a "solution" is, in fact, just as bad if not worse.

The Naive Method, a.k.a the We Are Afraid of Compiler Updates Method

We'll start off with the following in the stupid.f90 source file as our library source code:


module stupid
contains
    module pure elemental function stupid_add(a, b) result(ans)
        integer, intent(in) :: a, b
        integer :: ans

        ans = a + b
    end function stupid_add

    module pure elemental function stupid_mult(a, b) result(ans)
        integer, intent(in) :: a, b
        integer :: ans

        ans = a * b
    end function stupid_mult
end module stupid

And to build the library:


$ gfortran -std=f2018 -pedantic -Wall -Wextra -fimplicit-none -fPIC -c stupid.f90
$ gfortran -shared -Wl,-soname,libstupid.so.0 -o libstupid.so.0 stupid.o

Now if you thought we could just distribute libstupid.so.0 and everything would be hunky-dory, you'd be dead wrong. Get a listing of your build directory and you should notice a file called stupid.mod. You'll need to distribute this too. Oh and by the way, the .mod file is implementation defined so your consumers will need to use the same compiler you used to build the shared object. They will very likely need to use a similar version of the compiler too!

What a pain in the ass.

These .mod files are forward declarations, basically just enough information to allow the compiler to do simple parameter checking on function calls and plop down a call in object code without seeing the full function definition which will be provided by the linker on down the line.

The Look How Easy C Makes It Method

If you are familiar with C, the .mod file is like a .h file only stupid.

Before I show you how to get around this stupidity using submodules added in Fortran 2008, let me show you what this would look like in C just so you can see how stupid Fortran is.

Start out by creating forward declarations in a file called stupid.h, this is essentially our public API:


#ifndef _STUPID_H
#define _STUPID_H 1

int stupid_add(int, int);
int stupid_mult(int, int);

#endif

Now, lets provide the definitions in stupid.c:


#include "stupid.h"

int stupid_add(int a, int b)
{
    return a + b;
}

int stupid_mult(int a, int b)
{
    return a * b;
}

And to create the shared object:


$ gcc -std=c99 -Wall -Wextra -pedantic -fPIC -c stupid.c
$ gcc -shared -Wl,-soname,libstupid.so.0 -fPIC -o libstupid.so.0 stupid.o

That is it, now libstupid.so.0 can be distributed with stupid.h and consumers can immediately start using libstupid with any standards complaint C compiler. All they need to do is put a #include in their code for stupid.h to see the forward declarations, and then instruct the linker to link against libstupid which provides the definitions. You can also place struct definitions in the .c file and create a typedef to them in the .h file which allows you to create opaque pointers at your API, further hiding the internal implementation details - either for proprietary reasons or because you don't like the idea of unwashed heathens doing dumb things to your internal state and then wondering why everything breaks when you update your library in a totally reasonable way.

And, if we really wanted, we could have put the definitions of stupid_add() and stupid_mult() in their own source files to help reduce cognitive load for someone having to come back later and add or fix functionality - the naive Fortran method requires the whole module to be in a single compilation unit!!!

The We Keep Our Compiler Fresh But Are Still Using Fortran For Some Unknown Reason Method

Fortran 2008 added the concept of submodules. We can improve the original implementation significantly, but because this is Fortran, the end result will still suck.

Lets add source code that the compiler will use to generate forward declarations in stupid.f90:


module stupid
    interface
        module pure elemental function stupid_add(a, b) result(ans)
            integer, intent(in) :: a, b
            integer :: ans
        end function stupid_add

        module pure elemental function stupid_mult(a, b) result(ans)
            integer, intent(in) :: a, b
            integer :: ans
        end function stupid_mult
    end interface
end module stupid

Now we can put the function definitions in submodules, and we aren't even limited to a single compilation unit (or what Fortran calls [sub]program units).

In stupid_add.f90:


submodule (stupid) stupid_sub_add
    contains
        module pure elemental function stupid_add(a, b) result(ans)
            integer, intent(in) :: a, b
            integer :: ans

            ans = a + b
        end function stupid_add
end submodule stupid_sub_add

And in stupid_mult.f90:


submodule (stupid) stupid_sub_mult
    contains
        module pure elemental function stupid_mult(a, b) result(ans)
            integer, intent(in) :: a, b
            integer :: ans

            ans = a * b
        end function stupid_mult
end submodule stupid_sub_mult

And finally, lets generate the shared object:


$ gfortran -std=f2018 -pedantic -Wall -Wextra -fimplicit-none -fPIC -c stupid.f90
$ gfortran -std=f2018 -pedantic -Wall -Wextra -fimplicit-none -fPIC -c stupid_add.f90
$ gfortran -std=f2018 -pedantic -Wall -Wextra -fimplicit-none -fPIC -c stupid_mult.f90
$ gfortran -shared -Wl,-soname,libstupid.so.0 -o libstupid.so.0 *.o

What is different this time is that now we only need to distribute the shared object and stupid.f90. But there is a catch, consumers will still need some kind of .mod file. gfortran provides the -fsyntax-only flag and ifort/ifx provides the -syntax-only flag which when stupid.f90 is run through with will produce the .mod file. That means consumers can produce a .mod file with the compiler they plan to build their application with and link libstupid with.

For example:


$ gfortran -fsyntax-only stupid.f90

Still incredibly inconvenient for basically no reason, but at least we don't have to play guess-the-compiler-distribution-and-version or maintain multiple library builds per release with all the popular compilers our consumers might use.

Other Observations

Something I always found dumb is that since .mod files contain the forward declarations, they are what allows you to compile objects in any order, but because .mod files are produced by the compiler, you actually still have to make sure the .mod file exists (i.e. the module is compiled) before compilation units that reference symbols exported by the module are encountered...which begs the question - why? Let that sink in for a minute, it took until Fortran 2008 for modules to actually be useful beyond over-engineered type checking (this is why it is so easy to find Fortran code bases that require multiple compile passes to generate a linked binary, to generate all the forward declarations. Sure, you could explicitly create interface blocks...but then why bother with a seporate module at that point?).

Fortran is incredibly verbose, like, so verbose it is physically exhausting to read and write. One element of this is the scoping and namespaces - or lack thereof. The main module was named "stupid" but each submodule had to have a unique name, "stupid_sub_add" and "stupid_sub_mult". And each function was prefixed with the string "stupid_" because there isn't a way to qualify a symbol with a module namespace when referenced, so we wouldn't want to stomp on other symbols named "add" or "mult". This leads to either lots of RSI because you have to "qualify" symbols by just concatenating crap onto the symbol name directly or (more likely) people just use the shorter more ambiguous names which eclipses standard library/built-in symbols or symbols exported from other libraries, leading to all kinds of weird unintended behavior.