Daniel Azuma |
||
Ruby 1.8.7 IO#select threading bug
Posted on Tue, Aug 25, 2009, at 12:12 AM
Tags:
ruby
I just spent most of a day tracking down a rather obscure Ruby interpreter bug involving multiple threads calling IO#select. Basically, what happens is, if you are running Ruby 1.8.7p160 through 1.8.7p174 (and possibly some later versions), and you have multiple threads calling IO#select on different and disjoint sets of IO objects, the calls may fail to return results even though there are results to return. I reported the bug on RedMine here along with a possible strategy for fixing it. It’s fairly deep in the thread handling C code of MRI 1.8.7. It does not appear to affect MRI 1.9.x. Earlier versions of 1.8.7 (e.g. p72) also appear to be immune. I happened upon this because I use Net-SSH fairly heavily in a threaded environment. It often manifested when you attempt to open multiple Net-SSH sessions in parallel in different threads. Net-SSH 2.0.14 now has a workaround for the bug. If you encounter this bug, a Ruby-based workaround is to synchronize all IO#select calls, ensuring that only one thread at a time tries to call it. (Obsolete) In Net-SSH, you need to monkey-patch Net::SSH::Transport::PacketStream#available_for_read? and wrap its IO#select call. Here’s some sample code:
Update 1According to the capistrano people (thanks Rafa and Lee), this is also affecting 1.8.6p368. Akira Tanaka writes on ruby-core that this has been fixed on the 1.8 branch as of earlier this month. The relevant revisions are, I believe, 24413, 24416, and 24442. See this diff. I won’t pretend to understand immediately what has been done here, but I’ll take his word for it. =) Update 2I confirmed that Akira’s fix on the 1.8 subversion head is indeed working on my tests. Shyouhei has been pinged so I’m hopeful the fix should show up in the next 1.8.7 patchlevel release. Update 3Thanks to Delano, current maintainer of Net-SSH, the newest release now has a workaround in place. It simply wraps all IO#select calls with a mutex to ensure two threads are never caught in a call at the same time. It won’t, of course, solve all instances of the problem if you call IO#select outside Net-SSH, but it should take care of cases arising from within Net-SSH, which should be good news for Capistrano users. The new Net-SSH version is 2.0.14; you can get the new gem from rubyforge.
4 Comments
Rafa GarcĂa
at Tue, Aug 25, 2009, 01:44 AM
Hi Daniel, We suffer this bug with Capistrano[1] and we was trying to find a patch :-) I will pass this link to Delano Mandelbaum(Net:SSH maintainer) to patch Net:SSH.
[1]https://capistrano.lighthouseapp.com/projects/8716/tickets/79-capistrano-hangs-on-shell-command-for-many-computers-on-ruby-186-p368
Lee Hambley
at Tue, Aug 25, 2009, 05:06 AM
Daniel, great work as Rafa said - we've been having problems with this, and we weren't too far away from this solution... excellent work, would you mind liaising with us to get this merged into either Capistrano or Net::SSH (which is being managed by Delano Madlebaum now)
Delano Mandelbaum
at Wed, Aug 26, 2009, 12:01 PM
Daniel, thanks for the fix! I’ve applied it to the 2.0 branch in the Net::SSH Github repo and it will be in the 2.0.14 release.
Albrecht Sebastian Dietzel
at Sat, Nov 07, 2009, 10:54 PM
Thanks for fixing this, sadly macports is lacking behind some revisions. At the moment (8 Nov 09) one needs to create a local port repository with version 2.0.15 in it. For macports users: - ruby @1.8.7-p174
Make sure that macports is installing the latest rb-net-ssh
cat ~/ports/ruby/rb-net-ssh/Portfile
ruby.setup net-ssh 2.0.15 gem {} rubyforge_gem
hope this helps. Comments are disabled for this article. |
Recent
Tags
Random blogs
|
|