Running Optcarrot, a Ruby NES emulator, at 150 fps with the GUI!

TruffleRuby runs Optcarrot, a NES emulator written in Ruby, at 150 fps while playing a NES game.

As we can see in the GIF, performance gets better as we play and the JIT kicks in. We start with 2 fps and a fairly laggy game in level 1 and end up at 150 fps with a very responsive game in level 3.

Screenshot Recording

The Optcarrot benchmark

Optcarrot is a key benchmark for Ruby 3x3 created by @mame. It is also a NES emulator and runs around 20 fps (frames per second) on MRI, the reference implementation. Therefore, if MRI meets the 3x3 goal then it should run Optcarrot at 60 fps, the frequency of the NES, and we can play NES games!

But why wait? And why limit ourselves to 60 fps? When Optcarrot was announced a few months ago, I tried running it with TruffleRuby. TruffleRuby is a high-performance Ruby implementation based on Truffle and Graal on which I work and do research for more than 2 years now, along with Chris Seaton, Kevin Menard and Petr Chalupa.

The result was astonishing: the Optcarrot benchmark ran around 180 fps. That’s about 9 times faster than MRI!

Boxplot

The only change needed to reach that level of performance was to fix a compare_by_identity Hash bug. Now you may wonder about startup, warmup or how long it takes until it’s fast. Here is the full picture for the first 3000 frames (that’s just 50 seconds at 60 fps).

Times

MRI 2.3.3 runs around 20 fps and is very stable, JRuby 9.1.6.0 with invokedynamic runs around 40 fps and TruffleRuby (from GraalVM 0.18) around 180 fps after warmup. There is definitely some warmup going on, but the milestone of 60 fps is reached at around 300 frames. For the record, this was run on a laptop with an Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz with frequency scaling, turbo mode and hyper-threading all disabled.

But so far, these numbers were just for the benchmark, and it would be nice to actually use TruffleRuby to play the NES and see the numbers on a real game!

The User Interface

I developed a new video driver to display the NES screen, using mplayer, which only requires writing the pixels to a pipe. I then adapted the FPS counter to be able to display 3 digits instead of maximum 2.

The existing video backends (SDL2, SFML) rely on FFI, which TruffleRuby does not support yet. The reason is existing implementations of FFI use native extensions of either 10k lines of C or 10k lines of Java, so it’s not clear which way is best to support it (handle all that implementation-specific extension code or port some parts to Ruby).

For the input, I used the existing terminal-based input driver which just reads characters one by one.

The audio driver was disabled because audio doesn’t sound nice at a higher frame rate. The audio stream is still computed though, as in benchmark mode.

With this setup, we can play the NES as shown in the GIF above and it eventually reaches 150 fps on the Lan Master game. The drop from 180 fps is likely due to the extra work to render the frames to the screen and running a more varied workload (the benchmark mode only runs the splash screen).

There is a problem though with playing at 150 fps: it’s really hard! For time-based games such as Lan Master, it means we have less than half (2/5) of the time we get at 60 fps. On the other hand, the game is really reactive and it’s much faster to move across the board.

Trying it for yourself

First download GraalVM: choose the Runtime Environment as it’s smaller.

You also need mplayer installed with your favorite package manager. Then:

git clone https://github.com/eregon/optcarrot.git --branch demo
cd optcarrot
path/to/graalvm-0.18-re/bin/ruby bin/optcarrot --video=mplayer --audio=none --input=term examples/Lan_Master.nes

Be patient, startup takes a while (loading files, loading the ROM, compiling the emulator and sending enough frames for mplayer to show up).

You need to click on the terminal before entering any input. Use Z and X to rotate the links and the arrow keys to move around.

The End

I hope you liked the post. I would like to thank @mame for this awesome benchmark.