Matching Regexps 200 Times Faster

You might have seen @byroot’s excellent blog post series on optimizing the json gem. From the first blog post it’s clear most of the time for generating JSON is spent in generate_json_string() and specifically in convert_UTF8_to_JSON(), i.e., in converting Ruby Strings to JSON Strings.

Because this is such a hot part of JSON generation, it has received a lot of attention. In part 3 @byroot shows a lookup table approach to implement convert_UTF8_to_JSON(), which is now used in the json gem.

There has been a couple attempts since then to optimize it further using SIMD instructions. This is however quite difficult and messy to do in C as it’s similar to writing inline assembly (N times, with N the number of SIMD variants), and it’s not portable so it needs both compile-time and run-time detection to pick the correct SIMD variant.

In this blog post we compare 3 contenders for the job of converting Ruby Strings to JSON Strings.

The pure-Ruby version is so simple it fits in a few lines (and a big Hash literal):

ESCAPE_MAP = {
  "\x0" => '\u0000', "\x1" => '\u0001', "\x2" => '\u0002', "\x3" => '\u0003',
  "\x4" => '\u0004', "\x5" => '\u0005', "\x6" => '\u0006', "\x7" => '\u0007',
  "\b" => '\b', "\t" => '\t', "\n" => '\n', "\xb" => '\u000b',
  "\f" => '\f', "\r" => '\r', "\xe" => '\u000e', "\xf" => '\u000f',
  "\x10" => '\u0010', "\x11" => '\u0011', "\x12" => '\u0012', "\x13" => '\u0013',
  "\x14" => '\u0014', "\x15" => '\u0015', "\x16" => '\u0016', "\x17" => '\u0017',
  "\x18" => '\u0018', "\x19" => '\u0019', "\x1a" => '\u001a', "\x1b" => '\u001b',
  "\x1c" => '\u001c', "\x1d" => '\u001d', "\x1e" => '\u001e', "\x1f" => '\u001f',
  '"' => '\"', '\\' => '\\\\',
}

def utf8_to_json(string)
  '"' + if /["\\\x0-\x1f]/.match?(string)
    string.gsub(/["\\\x0-\x1f]/, ESCAPE_MAP)
  else
    string
  end + '"'
end

The ESCAPE_MAP is a bit verbose but it’s a simple way to exhaustively store how to escape each character which needs to be escaped. The match? is not strictly necessary but it avoids the allocation of a new String for the common case of no character to escape.

We use ruby/json’s string encoding benchmarks with a small diff to run only those:

diff --git a/benchmark/encoder.rb b/benchmark/encoder.rb
index f0a05db..efbac2b 100644
--- a/benchmark/encoder.rb
+++ b/benchmark/encoder.rb
@@ -33,6 +33,8 @@ def implementations(ruby_obj)
 end
 
 def benchmark_encoding(benchmark_name, ruby_obj, check_expected: true, except: [])
+  return unless ["mixed utf8", "mostly utf8"].include?(benchmark_name)
   json_output = JSON.dump(ruby_obj)
   puts "== Encoding #{benchmark_name} (#{json_output.bytesize} bytes)"
 
@@ -40,6 +42,7 @@ def benchmark_encoding(benchmark_name, ruby_obj, check_expected: true, except: [
   except.each { |i| impls.delete(i) }
 
   Benchmark.ips do |x|
+    x.warmup = 5
     expected = ::JSON.dump(ruby_obj) if check_expected
     impls.values.each do |name, block|
       begin

We select the benchmarks we want with an early return and we increase warmup to increase the stability of the benchmark results. We use ONLY=json because the SIMD PR doesn’t have the json_coder variant (anyway both of these give very similar results for the selected benchmarks).

JSON String Escaping Benchmark

The benchmarks are straightforward:

# mixed utf8
JSON.generate([("a" * 5000) + "€" + ("a" * 5000)] * 500)
# mostly utf8
JSON.generate([("€" * 3333)] * 500)

They both dump to JSON an Array of 500 strings and end up generating a JSON result of 5MB.

Here are the results on my machine (AMD Ryzen 7 3700X 8-Core Processor), with frequency scaling disabled and the performance CPU governor:

C extension with CRuby 3.4.2 with YJIT (on ruby/json master):

$ ONLY=json ruby --yjit -Ilib:ext benchmark/encoder.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
mixed utf8    397.953 (± 4.5%) i/s    (2.51 ms/i)
mostly utf8   402.388 (± 0.5%) i/s    (2.49 ms/i)

We will use this as the baseline, specifically the mixed utf8 benchmark to keep things simple.

C extension + SIMD with CRuby 3.4.2 with YJIT (on the SIMD PR):

$ ONLY=json ruby --yjit -Ilib:ext benchmark/encoder.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
mixed utf8      1.498k (± 4.0%) i/s  (667.68 μs/i)
mostly utf8     1.474k (± 1.6%) i/s  (678.55 μs/i)

3.76 times faster, nice!

Pure-Ruby generator with TruffleRuby 24.1.1 JVM (on ruby/json master):

$ ONLY=json ruby -Ilib:ext benchmark/encoder.rb
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
mixed utf8      8.104k (± 0.6%) i/s  (123.40 μs/i)
mostly utf8     8.070k (± 1.1%) i/s  (123.91 μs/i)

20 times faster than the baseline and 5.4 times faster than SIMD!
And, of course much simpler, this is just a few lines of Ruby code and a Regexp.

If you guessed earlier that the C extension code + SIMD would be fastest, you may be surprised to learn the pure-Ruby version is considerably faster (on TruffleRuby)! That’s the benefit of an advanced JIT compiler which understands Ruby semantics like TruffleRuby’s JIT compiler (called Graal): you get to write nice, idiomatic Ruby and the JIT compiler does the hard work of optimizing it for your target platform.

For comparison, here’s the pure-Ruby generator with CRuby 3.4.2 with YJIT (on ruby/json master):

$ ONLY=json ruby --yjit -Ilib:ext benchmark/encoder.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
mixed utf8     39.551 (± 0.0%) i/s   (25.28 ms/i)
mostly utf8    28.119 (± 0.0%) i/s   (35.56 ms/i)

Unfortunately, it’s 10 times slower than the baseline, demonstrating the Regexp approach is much slower on CRuby. But, we ran with YJIT enabled, so why is it so much slower?

As a disclaimer, TruffleRuby (which uses the pure-Ruby generator) is currently slower than CRuby using the C extension on JSON.generate macro benchmarks (except canada.json where it is faster). Other aspects of JSON generation need to be better optimized on TruffleRuby.

Regexp#match? Benchmark

Let’s simplify the benchmark to just Regexp#match? to understand better:

require 'benchmark/ips'

mixed_utf8 = ("a" * 5000) + "€" + ("a" * 5000)

Benchmark.ips do |x|
  x.warmup = 5

  x.report '/["\\\x0-\x1f]/.match?(mixed_utf8)' do
    raise if /["\\\x0-\x1f]/.match?(mixed_utf8)
  end
end
$ ruby --yjit bench_regexp_match.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
20.700k (± 2.4%) i/s   (48.31 μs/i)

$ ruby bench_regexp_match.rb
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
 4.699M (± 0.3%) i/s  (212.83 ns/i)

TruffleRuby matches 10003 bytes in 213 nanoseconds, i.e., 47 bytes per nanosecond. That’s 227 times faster than CRuby at Regexp matching on this benchmark! Moreover, 47 bytes per nanosecond is greater than the number of CPU cycles we have per nanosecond (this processor has a max frequency of 4.4 GHz). That seems impossible, so what’s going on here?

How can Regexp matching be so fast on TruffleRuby?

There are multiple reasons:

Specifically for this Regexp TRegex internally uses a ByteIndexOfCodePointSetNode, which automatically creates a lookup table for the character range in that regexp (["\\\x0-\x1f]) and also automatically uses SIMD. If you would like to know more about TRegex, the creator of TRegex and I did a talk about it at RubyKaigi 2021. Kevin Menard also gave a talk about ReDoS and solutions at RailsConf 2022.

The JSON String Escaping example above is not an isolated case where TruffleRuby coincidentally is faster. Having a fast regular expression engine allows for more idiomatic solutions and allows Ruby code running on TruffleRuby to be faster than C code in many cases. Here are two more real world examples where TruffleRuby uses Regexps and it’s faster than C code!

Time.new(String)

Time.new since Ruby 3.2 accepts a String argument.

CRuby implements it by parsing strings with char pointers in about 80 lines of C. TruffleRuby implements it in a straightforward way with a Regexp.

require 'benchmark/ips'

NOW_STRING = Time.now.to_s

Benchmark.ips do |x|
  x.report("Time.new(String)") do
    Time.new(NOW_STRING)
  end
end
$ ruby --yjit bench_time_new_string.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
Time.new(String)      1.595M (± 0.5%) i/s  (626.84 ns/i)

$ ruby bench_time_new_string.rb
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
Time.new(String)      3.923M (± 5.4%) i/s  (254.90 ns/i)

TruffleRuby is 2.5 times faster here, while having a much simpler and cleaner implementation.

Let’s benchmark just the Regexp, although we use match here since we need the capture groups:

require 'benchmark/ips'

NOW_STRING = Time.now.to_s

TIME_REGEXP = /\A (?<year>\d{4,5})
(?:
  - (?<month>\d{2})
  - (?<mday> \d{2})
  [ T] (?<hour> \d{2})
     : (?<min>  \d{2})
     : (?<sec>  \d{2})
     (?:\. (?<usec> \d+) )?
 (?:\s* (?<offset>\S+))?
)?\z/x

Benchmark.ips do |x|
  x.report("TIME_REGEXP.match(NOW_STRING)") do
    raise unless TIME_REGEXP.match(NOW_STRING)
  end
end
$ ruby --yjit bench_time_new_string_regexp.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
TIME_REGEXP.match(NOW_STRING)      1.234M (± 0.8%) i/s  (810.47 ns/i)

$ ruby bench_time_new_string_regexp.rb
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
TIME_REGEXP.match(NOW_STRING)     33.448M (± 9.9%) i/s   (29.90 ns/i)

TruffleRuby is 27 times faster for this Regexp. So the Time.new(String) benchmark is actually spending a lot of time in creating and initializing the Time object.

StringScanner#scan_integer

StringScanner#scan_integer is a method recently added to StringScanner.

CRuby implements it in 114 lines of C. TruffleRuby implements it in 18 lines of Ruby using Regexps. We use TruffleRuby master here since the method was introduced after the last TruffleRuby release.

require 'benchmark/ips'
require 'strscan'

SCANNER = StringScanner.new("123456789")

Benchmark.ips do |x|
  x.report("StringScanner#scan_integer") do
    SCANNER.reset
    raise unless 123456789 == SCANNER.scan_integer
  end
end
$ ruby --yjit bench_scan_integer.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
StringScanner#scan_integer     10.530M (± 0.6%) i/s   (94.97 ns/i)

$ ruby bench_scan_integer.rb
truffleruby 25.0.0-dev-48a0a626, like ruby 3.3.5, Oracle GraalVM JVM [x86_64-linux]
StringScanner#scan_integer     30.230M (± 3.0%) i/s   (33.08 ns/i)

TruffleRuby is about 3 times faster. BTW, TruffleRuby implements StringScanner entirely in Ruby code. Ruby code is not only shorter and more expressive, it’s also more correct. To that point, while implementing new StringScanner methods we found 6 issues with the C extension implementation of StringScanner: 2 segfaults and 4 inconsistent/incorrect behaviors.

Let’s also benchmark just the Regexp for completeness:

require 'benchmark/ips'

INPUT = "123456789"

Benchmark.ips do |x|
  x.report("/\A[+-]?\d+/.match(INPUT)") do
    raise unless /\A[+-]?\d+/.match(INPUT)
  end
end
$ ruby --yjit bench_scan_integer_regexp.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
/A[+-]?d+/.match(INPUT)      1.972M (± 0.8%) i/s  (507.00 ns/i)

$ ruby bench_scan_integer_regexp.rb
truffleruby 25.0.0-dev-48a0a626, like ruby 3.3.5, Oracle GraalVM JVM [x86_64-linux]
/A[+-]?d+/.match(INPUT)     61.086M (± 6.9%) i/s   (16.37 ns/i)

TruffleRuby is 31 times faster for this Regexp.

Conclusion

Sometimes, such as in the cases shown in this blog post, the idiomatic pure-Ruby solution is also the fastest and even faster than C code or inline assembly. This is not new on TruffleRuby, pure-Ruby code is often faster than everything else, including C extensions on CRuby. Ruby code is more expressive and higher-level which makes some optimizations possible that otherwise wouldn’t be if written in a lower-level language.

If you want to run idiomatic Ruby code fast, give TruffleRuby a try.