The Fundamental Flaws of fork() which Corrupt Your State
fork(2)
is a well-known system call on POSIX systems.
However it has two major flaws:
- It only copies the thread calling
fork()
, all other threads are dead in the child process. - Resources which are unsafe to be shared like some file descriptors are shared with fork.
Only Copying the Calling Thread
So why does fork()
only copies the calling thread and not all threads?
It is a very good question actually.
Both approaches have trade-offs as noted in this stack overflow answer:
- If copying only the calling thread, any state used by those other threads such as locked mutexes are unusable in the child process. The child process is begging for a deadlock or inconsistent state as soon as it uses some state which was accessed used by other threads. The other threads are not made aware of the fork, they can do very little about it 1.
- If copying all threads, you would have those copied threads access resources in parallel to the corresponding parent process’s thread, and they cannot synchronize via a in-process mutex of course. But that problem already exist with copying a single thread, that runs in parallel to the parent thread, so it’s nothing new.
It’s unclear why POSIX or the designers of fork()
chose the first, it seems clearly the worse trade-off of the two. Maybe at that time threads were uncommon and people thought it was easier to say “forks and threads are incompatible, do not use both”, even though that is what many programs do nowadays.
As a result, a forked process is a partial, broken, corrupted, deadlocked copy of the parent process, unless the calling process only had a single thread.
There is of course no chance to change fork()
for compatibility reasons. It would be interesting if a fork that copies all threads became available, AFAIK there is no such functionality currently available (although since CRIU can restore threads on Linux it should be possible).
Unsafe Resources Shared by Fork
This problem is relatively well known by users of fork() and yet it is incredibly easy to corrupt your program if you do not manually unshare just one of these resources which is not safe to be shared.
This is even more confusing that these resources are not considered IPC (inter-process communication) resources, but in-process resources. But because of fork()
they are actually shared between processes even though they were not designed for that!
A couple examples:
- A socket or Unix Domain Socket or pipe, for instance used to communicate with some database or external service. If this is shared between both processes, you can get the response to a DB query of the other process, i.e., your database is now corrupted and you are leaking sensitive data (e.g. it shows the profile of a user to another). This happenned to me once, I thought it was the RAM getting corrupted, but no, it’s “just” a fundamental flaw of fork. The typical “hard way” to learn about fork’s flaws.
- A file on the filesystem. If you only append to it with single calls to
write(2)
like for logging the filesystem/kernel ensures those writes are atomic. But anything else and you are just corrupting that file.
And then there is long list of exceptions of resources which are not copied and break the mental model of “it’s just a copy”, some are useful and some are waiting to cause bugs in forks 2.
Man Page Notes
The fork(2)
man page makes the following notes:
- The child process is created with a single thread—the one that called
fork()
. The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use ofpthread_atfork(3)
may be helpful for dealing with problems that this can cause. - After a fork() in a multithreaded program, the child can safely call only async-signal-safe functions (see
signal-safety(7)
) until such time as it callsexecve(2)
.
As noted here pthread_atfork(3)
cannot actually be used to unlock mutexes like pthread_mutex
. And it is generally difficult to use this “at fork hook” to clean the shared resources, because closing in the parent is suboptimal (e.g. for a database socket it would need an extra reconnection and might break other threads, and closing in the child without using the resource at all (e.g., sending a close-db-connection message on the socket would close the parent too) is often not exposed by libraries because it’s quite hacky). In the Ruby world, Rails does it for databases it knows (but of course not for other connections it does not know about), while Sequel does not and expects you to manually do it. Hopefully you never forget, otherwise corruption awaits you. I find this extremely dangerous and insecure that IMO it is a good enough reason on its own to not use fork.
The second one is quite interesting, it says a fork of a program which had multiple threads is only allowed to call async-signal-safe functions (a small set of functions, no non-trivial C program only uses those) until it execve(2)
to execute another program. So the man page says it plainly: all the multithread programs out there that fork and don’t immediately execve() are unsafe. That includes all Rails applications using a forking webserver for instance. Note that most Ruby programs use multiple threads, for instance just using Timeout.timeout
or net/http
creates a Thread.
Why does it seem to work for Rails and forking webservers?
TODO because single thread at time of fork? But pitchfork probably does not do that, TODO check it Probably Ruby’s fork or system fork(2) should error if not single-threaded. fork not really supported on macOS: https://developer.apple.com/forums/thread/701601
TODO good explanation in https://bugs.ruby-lang.org/issues/14009#change-67194
A Concrete Example which Breaks in CRuby
CRuby tried to use getaddrinfo_a(3)
(from GLibC) to solve the fairly well-known problem that getaddrinfo(3)
cannot be interrupted by a signal (until it reaches its timeout typically 30-90 seconds, which feels like a POSIX bug).
getaddrinfo_a(3)
use at least a thread and a mutex to implement its interruptability.
Because of that, when forking and getaddrinfo_a(3)
was used in the parent process, in the child process: 1) the thread is dead, 2) the mutex can still be locked, and so DNS resolution via getaddrinfo_a(3)
in the child process is forever stuck or broken (see this and this for details).
TODO Note the same would happen e.g. with a Ruby Thread using a Mutex, and that Mutex being locked (and not by the current thread) at the time of fork. The Timeout stdlib could have this issue for example, and it’s probably difficult to fix, rather it is threading + fork are incompatible.
Why does the JVM not support fork?
A while ago I wondered why the JVM or more specifically the HotSpot JVM does not support forking. One can actually call fork() via FFI but the forked JVM will break very soon after. While I don’t know the exact reasons for the original decision, based on the above we can have some ideas. The JVM uses extra threads for JIT Compilers, for GC threads, for reference processing, for signal handling, etc. So all JVM processes are multithreaded, even if they do not use multiple Java threads. Suspending all these threads in a safe place so they don’t have any lock/resource acquired is incredibly challenging. And then restarting them in the fork and make them continue where they were is also extremely difficult. I think these problems would be much easier to solve if there was a fork that copied all threads.
TODO also because fork not portable e.g. Windows
As a note, it might be possible to implement fork on GraalVM Native Image, which has more control over threads. It is a significant undertaking though as it would need to solve the problems above.
Conclusion
I think fork() is fundamentally flawed in such a way that it should not be used (unless it’s immediately followed by execve()
but that’s not the case we discuss in this post).
The fact it can corrupt, deadlock, or break your program or external services with as much as forgetting to deal with one shared resource, and there is no proper general solution to fix that sharing automatically, seems too much a deal breaker to me.
I know many people would disagree though, and I guess their hands are sort of tied because for instance on CRuby the only way to achieve parallelism is multiple processes or forks (I don’t consider Ractors, they have so many bugs and anyway are not compatible enough). Forks have the advantage over new processes of avoiding doing the startup again and of sharing a portion of the memory between the forks. I do question though: Are the startup and memory gains worth the potential corruption of your process and leaking sensitive data? Passenger for instance supports creating new processes and not using fork. Puma does not currently, but I suspect it would be fairly easy to add.
TODO mention waitall issue in footnote?
The main alternative I see is to use language implementations without a global lock (GVL/GIL).
For Ruby, that means use TruffleRuby or JRuby.
Then threads run in parallel, the entire memory is shared, all CPUs can be used efficiently, there is no overhead of inter-process communication and of course no flaws of fork()
.
That leaves the startup concern, which I believe we can mostly address by persisting the JITed code on TruffleRuby.
-
Unless the program controls all threads, including threads used by its dependencies and can coordinate them on every fork(), extremely difficult. ↩
-
POSIX is full of weird exceptions, one somewhat related is fork() copies signal handlers (expected) but execve() does not copy signal handlers except handlers set to
SIG_IGN
, a good source of bugs which force people to use an empty function instead ofSIG_IGN
e.g. for SIGPIPE. ↩