Daniel Azuma
Ruby 1.8.7 IO#select threading bug
Posted on Tue, Aug 25, 2009, at 12:12 AM
Tags: ruby

I just spent most of a day tracking down a rather obscure Ruby interpreter bug involving multiple threads calling IO#select. Basically, what happens is, if you are running Ruby 1.8.7p160 through 1.8.7p174 (and possibly some later versions), and you have multiple threads calling IO#select on different and disjoint sets of IO objects, the calls may fail to return results even though there are results to return.

I reported the bug on RedMine here along with a possible strategy for fixing it. It’s fairly deep in the thread handling C code of MRI 1.8.7. It does not appear to affect MRI 1.9.x. Earlier versions of 1.8.7 (e.g. p72) also appear to be immune.

I happened upon this because I use Net-SSH fairly heavily in a threaded environment. It often manifested when you attempt to open multiple Net-SSH sessions in parallel in different threads. Net-SSH 2.0.14 now has a workaround for the bug.

If you encounter this bug, a Ruby-based workaround is to synchronize all IO#select calls, ensuring that only one thread at a time tries to call it.

(Obsolete) In Net-SSH, you need to monkey-patch Net::SSH::Transport::PacketStream#available_for_read? and wrap its IO#select call. Here’s some sample code:

require 'net/ssh'
require 'thread'
Net::SSH::Transport::PacketStream.module_eval do
  MUTEX = Mutex.new
  def available_for_read?
    result = nil
    MUTEX.synchronize do
      result = IO.select([self], nil, nil, 0)
    end
    result && result.first.any?
  end
end

Update 1

According to the capistrano people (thanks Rafa and Lee), this is also affecting 1.8.6p368.

Akira Tanaka writes on ruby-core that this has been fixed on the 1.8 branch as of earlier this month. The relevant revisions are, I believe, 24413, 24416, and 24442. See this diff. I won’t pretend to understand immediately what has been done here, but I’ll take his word for it. =)

Update 2

I confirmed that Akira’s fix on the 1.8 subversion head is indeed working on my tests. Shyouhei has been pinged so I’m hopeful the fix should show up in the next 1.8.7 patchlevel release.

Update 3

Thanks to Delano, current maintainer of Net-SSH, the newest release now has a workaround in place. It simply wraps all IO#select calls with a mutex to ensure two threads are never caught in a call at the same time. It won’t, of course, solve all instances of the problem if you call IO#select outside Net-SSH, but it should take care of cases arising from within Net-SSH, which should be good news for Capistrano users. The new Net-SSH version is 2.0.14; you can get the new gem from rubyforge.

4 Comments

Hi Daniel,

We suffer this bug with Capistrano[1] and we was trying to find a patch :-) I will pass this link to Delano Mandelbaum(Net:SSH maintainer) to patch Net:SSH.
Thanks!!

[1]https://capistrano.lighthouseapp.com/projects/8716/tickets/79-capistrano-hangs-on-shell-command-for-many-computers-on-ruby-186-p368

Daniel, great work as Rafa said - we've been having problems with this, and we weren't too far away from this solution... excellent work, would you mind liaising with us to get this merged into either Capistrano or Net::SSH (which is being managed by Delano Madlebaum now)

Daniel, thanks for the fix!

I’ve applied it to the 2.0 branch in the Net::SSH Github repo and it will be in the 2.0.14 release.

http://github.com/net-ssh/net-ssh/tree/2.0

Thanks for fixing this, sadly macports is lacking behind some revisions. At the moment (8 Nov 09) one needs to create a local port repository with version 2.0.15 in it.

For macports users:

- ruby @1.8.7-p174
- rb-capistrano @2.5.3

Make sure that macports is installing the latest rb-net-ssh
@2.0.15:

cat ~/ports/ruby/rb-net-ssh/Portfile
PortSystem 1.0
PortGroup ruby 1.0

ruby.setup net-ssh 2.0.15 gem {} rubyforge_gem
maintainers nomaintainer
description A pure-Ruby implementation of the SSH2 client protocol
long_description ${description}
checksums md5 5244742d1b3856d80922b52c72082d33
platforms darwin
homepage http://rubyforge.org/projects/net-ssh/

hope this helps.

Comments are disabled for this article.
Recent
Tags
Random blogs