Daniel Azuma
Ruby threads and Kernel#require
Posted on Wed, Dec 12, 2007, at 03:08 PM
Tags: ruby

For people like me with a Java background, one of the first things we bemoan and disparage when learning Ruby is its thread support. Simply put, Ruby threads are green threads, meaning they are implemented in user space. Among other things, this means they cannot be scheduled onto multiple processors or cores because the operating system does not know about them.

There are other issues, such as the fact that often-used libraries such as ActiveRecord are not thread-safe. This article gives a brief but fairly good overview of the issues, relevant to the current MRI 1.8 implementation. InfoQ also published a very interesting article last spring on the pros and cons of different threading strategies related to Ruby.

I recently had to design a multithreaded library for Zoodango’s deployment system, and I ran into another gotcha that I’m sure is known by someone, but I haven’t seen documented yet. Basically, the Kernel#require method has some arguably incorrect behavior in the presence of threads. If two or more threads require the same file simultaneously, one of them may “finish” prematurely, likely eventually resulting in NameError exceptions being raised when that thread tries to access names that have not yet been defined.

Let’s consider an example. Suppose, we have a ruby file ”foo.rb” that defines the class Foo. Now suppose we execute the following:

  t1 = Thread.new {
    require "foo"
    Foo.new
  }
  t2 = Thread.new {
    require "foo"
    Foo.new
  }
  t1.join; t2.join

The require method has the intended semantics that it will load and execute a ruby file, but will only do so once. Subsequent requires of the same file will skip the loading. But what happens when a second thread starts a require before the first thread finishes executing the file?

As far as I can (experimentally) determine, the check-and-set that controls the “already loaded” flag occurs right at the beginning of the require method. So the following is a potential run of the above code:

  1. Thread 1 checks “foo”, finds it hasn’t been loaded yet, and adds it to the loaded file list $".
  2. Thread 2 checks “foo”, and finds that it has been added to $" (by thread 1) so it doesn’t proceed to load it again. This is a good thing, of course. It means we never get into a state where two threads are loading and executing the same file simultaneously, which could do messy things to your namespace. However…
  3. Meanwhile, Thread 1 starts executing “foo”. It has to do a lot of things, beginning with loading the file, lexing, parsing… It hasn’t quite gotten around to defining the constant Foo yet…
  4. Thread 2, however, thinks that “foo” has already been loaded, so it tries to execute Foo.new. Boom! NameError gets raised.

You could argue that the above example is contrived. But this takes place in real code too. I ran into it when using Jamis Buck’s excellent library Net::SSH. Net::SSH lazily loads parts of its implementation as they are needed, executing a bunch of require statements during the process of initializing a session. (For those who are familiar with the implementation, it is using Jamis’s dependency injection library Needle to load parts of its implementation and API. Needle runs requires dynamically in its Container#require method.) Now, I think it’s reasonable to create Net::SSH sessions in two or more threads simultaneously, if you want to open connections to multiple servers and manage each connection with a thread. However, if you try to do so, you’ll intermittently raise a NameError out of the guts of Net::SSH.

I’m still searching for a good general solution for this. (And I haven’t mucked with the ruby trunk so I don’t know if it’s still an issue in 1.9.) For now, what I’ve done is attempt to wrap code that I know could contain dynamic require invocations in mutex synchronize blocks. For Net::SSH, that just includes code that creates sessions. However, it’s more complex in Net::SFTP because one of the internal callbacks, Protocol::Driver#do_version, can also trigger Needle, and there doesn’t seem to be a way to intercept it without monkeypatching. Another possibility is to try to patch Kernel#require directly, adding an internal mutex, but I’m not yet sure how to do that without introducing the possibility of deadlocks.

Update (27 June 2008) Net::SSH 2.0 and related libraries do in fact ditch the Needle dependency, so this issue no longer tends to show up when multiple threads access Net::SSH. However, the underlying issue with Kernel#require is still there.

5 Comments

Hi. I’ve encountered the same problems. I checked it and ruby from trunk has the same behaviour. I think that best solution will be writing thread safe “require” method.

My proposal:

require "thread"

module Kernel
  alias unsafe_require require
  @@__require_mutex = Mutex.new

  def require(*args)
    @@__require_mutex.synchronize do
      unsafe_require(*args)
    end
  end
end

Your example with my solution:

http://pastie.caboo.se/128951 # foo.rb

http://pastie.caboo.se/128952 # main.rb

run: ruby main.rb

I think that it could be not 100% safe. I didn't check how it will run if there is nested require (file1 require file2, which require file3 etc.).

@Radarek, as I see it, there are two problems. A Mutex can only be synchronized once, so nested requires would throw a ThreadError. Of course, you can get around that by using Monitor instead.

The other problem is one of lock ordering. The following code, for example, could deadlock:

main.rb:

  require "monitor"
  $m = Monitor.new
  t1 = Thread.new {
    $m.synchronize {
      require "foo"
    }
  }
  t2 = Thread.new {
    require "foo"
  }
  t1.join; t2.join

foo.rb:

  $m.synchronize { }

If thread t1 grabs the $m lock first, and t2 graps the require lock first, then neither will be able to get through.

There probably exists some different mutual exclusion primitive that will solve it, but I haven’t had the chance to sit down and work it out yet.

I recently had similar problems running net-sftp with multiple threads and found a workaround that appears to work (http://discuss.joyent.com/viewtopic.php?id=18275). However, you may also want to check out the new net-ssh and net-sftp version 2 that Jamis posted links to awhile back (http://weblog.jamisbuck.org/2007/7/29/net-ssh-revisited). He rewrote the ssh and sftp gems, taking needle out.

@Ajay: Thanks. I took a look at your discussion on Joyent. It looks like you’re trying to use threads to parallelize operations on the same SSH session. I’m taking a generally different approach. Net::SSH and Net::SFTP actually have parallelism built in. A single connection can run multiple operations (i.e. channels) in parallel, and you generally don’t create threads outside the library to do it. Instead, there are asynchronous versions of (nearly) all the operations. You just start them all up, register some callbacks, and then call session.loop once to run all of them. (I say “nearly” because the one glaring omission seems to be sftp.connect. Since it looks like Jamis is working on an update, I’ll file a bug report on that.)

What I’m trying to do is use threads to parallelize operations across multiple servers. Each such connection has its own Net::SSH session, and thus its own session loop. Calling session.loop blocks, so I run one thread per session, and call session.loop on that session within the thread. I haven’t looked at the Capistrano source code recently, but I assume it’s doing something similar. Generally, Net::SSH is perfectly thread safe, except for this one issue with Needle and Kernel#require.

Thanks for pointing me at the upcoming net-ssh and net-sftp version 2. I hadn’t noticed those before. Guess I need to pay more attention to Jamis’s blog… :-) I’ll take a look at the previews.

Comments are disabled for this article.
Recent
Tags
Random blogs