Multi Threaded Ruby
Ruby has a reputation of being slower than some other programming languages. One way we can help to improve this is by using different Threads to perform I/O (Input/Output) operations.
Ruby in its typical configuration is single threaded, as all operations need to be passed through a GVL, or Global Virtual Machine. When your Ruby code is run, it is tokenized with some C code, parsed out into an Abstract Syntax Tree, which is then processed and finally executed by the Ruby Virtual Machine referred to as YARV (Yet Another Ruby VM). This stack-based virtual machine is where our optimization can occur.
YARV is not internally thread safe, and can’t run multiple parallel threads. This means it can only handle the processing of one thread at a time. And in some operations, such as database and API calls, the processing is handled outside of the virtual machine. Because of this, if you’re querying a database, making a request to an API, or pulling from some other external data sources, the rest of your code has to wait for that process to finish before anything else can be executed. Meanwhile your virtual machine is sitting idle, when it could be working on the rest of the stack.
We can add threads to our ruby code to make it more performant. If one thread is off querying our API, our VM can start executing another thread concurrently. When the data comes back from our API, that thread then waits for the VM to be free to be processed again.
It’s important to keep in mind, if you don’t have many of these API or Database calls, then threads may not benefit you at all, and if used incorrectly may even slow your code down. But if a good portion of your time is spent requesting data, multiple threads can add huge benefits.
Making threads is as easy as Thread.new { puts 1 + 1 }
. Here’s a simple example.
The data requesting portions of these two threads are now non-blocking, allowing any other threads to continue running while they fetch their data. Here you can see we’re also creating an array and pushing our thread objects into it. We’re then looping over those threads, and calling .join
on each one. Join is just a blocking operation for when you do need to wait for your threads to finish.
This isn’t the only native tool we have for concurrency. We also have Fibers, which are very similar to threads in that they’re blocks of code that can be paused and resumed. Fibers however are started and paused programmatically by the developer. This allows you to implement generator style functions, and could feasibly be used to manually set up concurrent calls.
Rails 3 has also brought us the Ractor. Ractor actually provides us thread-safe parallel execution. If you’re on the latest versions of Ruby you can now truly run your operations in parallel, as opposed to just concurrently. This is huge, as now all your code can be run in a non-blocking manner. There are of course rules however about the communication these Ractors can have between each other. You wouldn’t want to end up with multiple executions modifying the same piece of state after all.
There are also plenty of Gems to help you handle concurrency. If you’d just rather not deal with it, these can be a very simple option to toss in real quick. They also have the added benefit of having some edge case scenarios handled for you.
Performance in Ruby certainly doesn’t have to be a contradictory statement. Depending on your use case, concurrency and parallelism may be well worth the investment for your app.
Here are some great resources on threads, and concurrency in general.
The Practical Effects of the GVL on Scaling in Ruby - Nate Berkopec
Inside RubyVM - Sitaram Shelke
Concurrency for HTTP Requests in Ruby and Rails - Pawet Urbanek