Finding Thread errors early with Thread.report_on_exception

Ruby 2.4 introduced Thread.report_on_exception to print exceptions to $stderr when a Thread dies from an exception. However, it was disabled by default. In Ruby 2.5, I made Thread.report_on_exception enabled by default. This significantly improves debugging multithreaded programs in Ruby.

The problem statement

When a Thread dies from an exception in Ruby 2.4 or earlier, nothing happens. The exception is silently swallowed by the Thread’s death. The only way other threads can know about this is by using Thread#join or Thread#value. But in many cases, other threads might be blocked before calling Thread#join or #value and result in a deadlock or livelock, with no information to debug.

So let’s take a simple example:

1
2
3
4
5
6
queue = Queue.new
thread = Thread.new {
  queue.push "abc".starts_with?("foo")
}
p queue.pop
thread.join

Running it on Ruby 2.4 gives:

thread1.rb:5:in `pop': No live threads left. Deadlock? (fatal)
1 threads, 1 sleeps current:0x555d14c656a0 main thread:0x555d148f05e0
* #<Thread:0x555d14925ec0 sleep_forever>
   rb_thread_t:0x555d148f05e0 native:0x7fc7376eb700 int:0
   thread1.rb:5:in `pop'
   thread1.rb:5:in `<main>'
	from thread1.rb:5:in `<main>'

Did not expected that? That’s the pretty deadlock output in Ruby 2.4+. But unfortunately it doesn’t point to the problem, it only tells us that the main Thread is stuck on line 5 in queue.pop. Yet, we have a queue.push, so why did it not work?

Let’s run with Ruby 2.5:

#<Thread:0x55b966a87a80@thread1.rb:2 run> terminated with exception (report_on_exception is true):
thread1.rb:3:in `block in <main>': undefined method `starts_with?' for "abc":String (NoMethodError)
Did you mean?  start_with?

thread1.rb:5:in `pop': No live threads left. Deadlock? (fatal)
1 threads, 1 sleeps current:0x55b9667e3120 main thread:0x55b966709430
* #<Thread:0x55b96673cd08 sleep_forever>
   rb_thread_t:0x55b966709430 native:0x7f9250b56700 int:0
   thread1.rb:5:in `pop'
   thread1.rb:5:in `<main>'
	from thread1.rb:5:in `<main>'

The first part is what Thread.report_on_exception does: print the exception that killed the Thread to $stderr. It clearly shows the error is on line 3 and that we have an extra s in starts_with?.

Let’s see what happens if we have a second Thread doing some kind of IO:

1
2
3
4
5
6
7
queue = Queue.new
thread = Thread.new {
  queue.push "abc".starts_with?("foo")
}
io_thread = Thread.new { STDIN.gets }
p queue.pop
thread.join

On Ruby 2.4, it hangs. When interrupting with Ctrl+C the output is:

^Cthread2.rb:6:in `pop': Interrupt
	from thread2.rb:6:in `<main>'

No helpful output at all here: we only know the pop is blocking. This is no longer a deadlock from Ruby’s point of view, because Ruby cannot know if external events like IO might finish, or if the io_thread might push an object to the queue.

On Ruby 2.5, this is printed as soon as the thread dies:

#<Thread:0x5612dbd2f778@thread2.rb:2 run> terminated with exception (report_on_exception is true):
thread2.rb:3:in `block in <main>': undefined method `starts_with?' for "abc":String (NoMethodError)
Did you mean?  start_with?

Now we can clearly see and quickly identify the problem.

You might have noticed the bug since the first code snippet in this simple example. Yet, a similar problem can happen for every bug or exception raised in some Thread different than the main Thread.

Thread.abort_on_exception

So how did Ruby programmers debug their multithreaded programs before? They had to remember to always add the magic incantation Thread.abort_on_exception = true before creating any Thread. This is unfortunately very easy to forget (I did many times), and if forgotten this can waste hours of debugging.

When an exception kills a Thread and Thread.abort_on_exception is true, the exception is re-raised in the main Thread.

1
2
3
4
5
6
7
Thread.abort_on_exception = true
queue = Queue.new
thread = Thread.new {
  queue.push "abc".starts_with?("foo")
}
p queue.pop
thread.join

On Ruby 2.4:

thread3.rb:4:in `block in <main>': undefined method `starts_with?' for "abc":String (NoMethodError)
Did you mean?  start_with?

This points clearly to the problem. However, this raises the exception in the main Thread, likely causing it to end its execution. Good for debugging, bad for a larger application, where we might not want to end the program if one of many Threads dies unexpectedly.

Thread.report_on_exception is more gentle by only printing the backtrace to $stderr, and most importantly is now enabled by default.

Intended exceptions in threads

Sometimes, the extra output by Thread.report_on_exception is undesirable, for instance if exceptions raised in a Thread are intended. This notably happens in tests. I fixed around 35 tests in MRI’s test suite. Similarly, around 21 specs were fixed in ruby/spec. The most common type of change in tests is moving the asserts in the Thread. Original:

def test_thread_local_freeze
  assert_raise(FrozenError) do
    Thread.new do
      Thread.current[:foo] = :bar
      Thread.current.freeze
      Thread.current[:foo] = :baz
    end.join
  end
end

Better and avoiding extra Thread.report_on_exception output:

def test_thread_local_freeze
  Thread.new do
    Thread.current[:foo] = :bar
    Thread.current.freeze
    assert_raise(FrozenError) do
      Thread.current[:foo] = :baz
    end
  end.join
end

This has the added bonus to reduce the scope where the intended exception should be raised (FrozenError is quite specific here, but it was RuntimeError in 2.4 which could be many different kinds of errors, not just a frozen object error).

There are other ways to avoid the extra output by report_on_exception. Quoting from the documentation of Thread.report_on_exception:

Conclusion

The initial feature request was done by @headius and based on what Java does (reporting exceptions in threads by default). It took more than 5 years to convince matz and other MRI developers this is a feature worth having, to implement it, and to enable it by default. So, sometimes, keeping arguing for a feature you believe in actually works. A bug was found in DRb thanks to this feature and there is a growing number of report_on_exception output in CI, giving better clues about what goes wrong in tests. Here is my piece for a better programming experience with threads for Ruby 2.5.