Finding Thread errors early with Thread.report_on_exception
Ruby 2.4 introduced Thread.report_on_exception
to print exceptions to $stderr
when a Thread dies from an exception.
However, it was disabled by default.
In Ruby 2.5, I made Thread.report_on_exception
enabled by default.
This significantly improves debugging multithreaded programs in Ruby.
The problem statement
When a Thread dies from an exception in Ruby 2.4 or earlier, nothing happens.
The exception is silently swallowed by the Thread’s death.
The only way other threads can know about this is by using Thread#join
or
Thread#value
.
But in many cases, other threads might be blocked before calling Thread#join
or #value
and result in a deadlock or livelock, with no information to debug.
So let’s take a simple example:
1
2
3
4
5
6
queue = Queue.new
thread = Thread.new {
queue.push "abc".starts_with?("foo")
}
p queue.pop
thread.join
Running it on Ruby 2.4 gives:
Did not expected that?
That’s the pretty deadlock output in Ruby 2.4+.
But unfortunately it doesn’t point to the problem, it only tells us that the main Thread is stuck on line 5 in queue.pop
.
Yet, we have a queue.push
, so why did it not work?
Let’s run with Ruby 2.5:
The first part is what Thread.report_on_exception
does: print the exception that killed the Thread to $stderr
.
It clearly shows the error is on line 3 and that we have an extra s
in starts_with?
.
Let’s see what happens if we have a second Thread doing some kind of IO:
1
2
3
4
5
6
7
queue = Queue.new
thread = Thread.new {
queue.push "abc".starts_with?("foo")
}
io_thread = Thread.new { STDIN.gets }
p queue.pop
thread.join
On Ruby 2.4, it hangs.
When interrupting with Ctrl+C
the output is:
No helpful output at all here: we only know the pop
is blocking.
This is no longer a deadlock from Ruby’s point of view, because Ruby cannot know if external events like IO might finish, or if the io_thread
might push an object to the queue
.
On Ruby 2.5, this is printed as soon as the thread dies:
Now we can clearly see and quickly identify the problem.
You might have noticed the bug since the first code snippet in this simple example. Yet, a similar problem can happen for every bug or exception raised in some Thread different than the main Thread.
Thread.abort_on_exception
So how did Ruby programmers debug their multithreaded programs before?
They had to remember to always add the magic incantation Thread.abort_on_exception = true
before creating any Thread.
This is unfortunately very easy to forget (I did many times), and if forgotten this can waste hours of debugging.
When an exception kills a Thread and Thread.abort_on_exception
is true
, the exception is re-raised in the main Thread.
1
2
3
4
5
6
7
Thread.abort_on_exception = true
queue = Queue.new
thread = Thread.new {
queue.push "abc".starts_with?("foo")
}
p queue.pop
thread.join
On Ruby 2.4:
This points clearly to the problem. However, this raises the exception in the main Thread, likely causing it to end its execution. Good for debugging, bad for a larger application, where we might not want to end the program if one of many Threads dies unexpectedly.
Thread.report_on_exception
is more gentle by only printing the backtrace to $stderr
, and most importantly is now enabled by default.
Intended exceptions in threads
Sometimes, the extra output by Thread.report_on_exception
is undesirable,
for instance if exceptions raised in a Thread are intended.
This notably happens in tests.
I fixed around 35 tests in MRI’s test suite.
Similarly, around 21 specs were fixed in ruby/spec.
The most common type of change in tests is moving the assert
s in the Thread.
Original:
Better and avoiding extra Thread.report_on_exception
output:
This has the added bonus to reduce the scope where the intended exception should be raised (FrozenError
is quite specific here, but it was RuntimeError
in 2.4 which could be many different kinds of errors, not just a frozen object error).
There are other ways to avoid the extra output by report_on_exception
.
Quoting from the documentation of Thread.report_on_exception:
-
If the exception is not intended, the best is to fix the cause of the exception so it does not happen anymore.
-
If the exception is intended, it is likely better to
rescue
it closer to where it is raised rather then let it kill the Thread. -
If it is guaranteed the Thread will be joined with
#join
or#value
, then it is safe to disable this report withThread.current.report_on_exception = false
when starting the Thread. However, this might handle the exception much later, or not at all if the Thread is never joined due to the parent thread being blocked, etc (see the examples at the beginning of the blog post).
Conclusion
The initial feature request was done by @headius and based on what Java does (reporting exceptions in threads by default).
It took more than 5 years to convince matz and other MRI developers this is a feature worth having, to implement it, and to enable it by default.
So, sometimes, keeping arguing for a feature you believe in actually works.
A bug was found in DRb thanks to this feature and there is a growing number of report_on_exception
output in CI, giving better clues about what goes wrong in tests.
Here is my piece for a better programming experience with threads for Ruby 2.5.