Daniel Azuma
<<Newest  |  <Newer  |  Older>  |  Oldest>>
Rails/ActiveRecord 2.1 and TEXT/BLOB fields
Posted on Fri, Jun 27, 2008, at 04:52 PM (0 comments)
Tags: rails

We just ran into a rather annoying gotcha when upgrading our application from Rails 2.0.2 to 2.1. All of a sudden, a few the columns in a few of our ActiveRecord classes unexpectedly started defaulting to nil, and refusing to save with a MySQL error. We tracked this down to a change in how ActiveRecord 2.1 is handling default values for BLOB and TEXT columns in a MySQL database.

Background

ActiveRecord uses the database schema to automatically “construct” classes and populate default values. For the most part, this works great. You can create a table such as:

  CREATE TABLE `first_objects` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `number_field` INT NOT NULL,
    `string_field` VARCHAR(100) NOT NULL DEFAULT 'hello',
    PRIMARY KEY  (`id`)
  ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

and ActiveRecord will automatically know that the class FirstObject has fields id, string_field, and number_field. Furthermore, it will populate new objects with appropriate default values where provided, leaving them nil when not. e.g.

  class FirstObject < ActiveRecord::Base; end;
  object = FirstObject.new
  object.number_field    # => nil
  object.string_field    # => "hello"

Unfortunately, the database doesn’t always cooperate with this mechanism. MySQL (as of version 5.x) does not allow default values to be specified for TEXT, BLOB, or related field types. This means you cannot specify a particular default value for a long text field. Instead, the value always defaults to NULL if the column can be NULL, or the empty string "" if the column cannot be NULL. That is, the default value is implicit. This causes a bit of a headache for ActiveRecord since there is no way for the user to provide a default value in the table definition, or even to specify that one should exist.

The old behavior

In Rails prior to 2.1, ActiveRecord dealt with this by always providing a default value for such types. e.g.

  CREATE TABLE `second_objects` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `number_field` INT NOT NULL DEFAULT '0',
    `text_field` TEXT NOT NULL,
    PRIMARY KEY  (`id`)
  ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

  class SecondObject < ActiveRecord::Base; end
  object = SecondObject.new
  object.number_field   # => 0
  object.text_field      # => ""

This behavior could be considered a little confusing, but it was the behavior, and it seemed to make sense since text_field did have a default value even though it is implicit. And because there would otherwise be no other way to specify that object.text_field should default to something other than nil. Unfortunately, it also caused a headache when writing out schema.rb files since the implicit default value is not allowed in the syntax.

What changed, and why is it important?

As far as I can tell, what changed in Rails 2.1 is this patch. It removes the implicit default value. Now, the behavior for our SecondObject class is as follows:

  class SecondObject < ActiveRecord::Base; end
  object = SecondObject.new
  object.number_field   # => 0
  object.text_field     # => nil  (!)

This means the behavior has changed (and caused some whiny-nil exceptions in our code until I fixed them). But perhaps more critically, this fails:

  object2 = SecondObject.new()
  object2.save!   # Raises an exception

The text_field column must always be explicitly set before the record can be saved, because as far as ActiveRecord knows, it has no default value. Hence, the attribute defaults to nil, which is an illegal value since the column is declared NOT NULL. Furthermore, as far as I can tell, there is no (easy) way to give it a default value. Which means all MySQL columns of type TEXT, BLOB, or similar must now be explicitly set in code.

Is there a solution?

I won’t get into the debate here regarding what the Right Thing To Do should have been. For now we needed a fast Rails 2.1 upgrade path, and so we’ve put in a quick-and-dirty solution by going through our ActiveRecord classes that are affected by this issue, and modifying the initialize methods to default the values according to our desires:

  class SecondObject < ActiveRecord::Base
    def initialize(attributes=nil)
      super({:text_field => 'our default'}.merge(attributes || {}))
    end
  end

One could of course imagine patching this capability into the ActiveRecord DSL, but we didn’t think that was worth it for the few cases we were running into. Is there a better solution? Am I missing something obvious?

HTML comments and Rails views
Posted on Sat, Jan 05, 2008, at 01:25 PM (0 comments)

Comment your code—it’s a mantra they drill into you in Programming 101. I know, most of us took that class at age 5 or 6, but it’s still important 30 years later.

Of course, in those bygone days, we were probably programming in C (or maybe Java for you young’uns, or FORTRAN for the graybeards). Comments were there for your benefit as a programmer, and the compiler helpfully threw them away when building binaries for end-user consumption. You could put anything you wanted into your comments and not worry about them becoming public knowledge. Anything, including:

C XOR WITH 42405 TO ENCRYPT MY SECRET VALUE

or

/* This paranoid and performance-killing logic is here because
   the customer is a clueless f***ing anal-rententive nutcase. */

or

// I have a crush on the chick in the next cubicle.

and only the other developers could see them. Of course, if the girl in the next cubicle was also a developer, it wouldn’t be terribly smart to put such a comment in your code. And that’s what I want to talk about, because this sort of thing seems to be happening more often than it should…

The danger of HTML comments

In this era of high level languages and web development, you have to be careful because not all code is compiled, and sometimes comments end up visible to end users. Comments in HTML are a particular culprit. Anything you put into an html document, including:

<!-- This is our top-secret widget but I'm commenting
     it out until our launch date.
  <div> ...
-->

is visible to any user who knows how to “view source” in a web browser. I’ve viewed-source on a lot of web sites, and seen a lot of html comments that probably should not have been exposed to the public. At worst, these can be embarrassments, or even security holes.

Similar issues surround documents served directly out of your public directory, such as javascripts and stylesheets. Any such files can be found and viewed directly by anyone with a passing knowledge of html. Do you think no one will look at your CSS files? I’ve done so many times, studying how other sites style their pages, and I’m sure I’m not the only one. And you don’t want to know how many times I’ve seen something like this at the top of someone’s publicly-visible CSS file:

/*
   $HeadURL: http://192.168.3.8/website/trunk/public/css/main.css $
   $Revision: 2343 $
   $Author: johndoe $
*/

You don’t really want me to see your subversion header and gain information about your development infrastructure, do you? Then don’t include it in those files, or write some logic in your production deployment process that strips it out. Better yet, use an optimization/obfuscation tool, especially for javascript. You’ll hide sensitive comments, and make your users happier by reducing load time and bandwidth.

A simple solution for Rails views

When writing a Rails view using erb/rhtml, don’t use html comments unless there’s a genuine functional reason it needs to be in the final html document itself. Instead, erb supports a syntax for ruby comments that will not be rendered into the final document:

<%# This is a comment that will not be rendered %>

But sometimes, don’t we want those comments there for development and testing purposes? Perhaps, as I’m debugging a view, I want to visually delineate sections of an html document, or display a value in a comment, but not show (or spend time computing) that value in production. We’ve written a simple method in our ApplicationHelper to make this easy:

module ApplicationHelper

  def c(str="")
    if RAILS_ENV == "production"
      ""
    else
      str = yield if block_given?
      "<!-- #{str} -->"
    end
  end

end

Now, in our rails views, we can do the following:

<%=c "Starting Login Widget" %>

You can recognize this line as a comment in your .html.erb view file. Furthermore, the comment is rendered into the html document in development, test, and staging environments, but not in production. If a comment requires some computation, we pass in a block instead:

<%=c {"The value is #{perform_long_computation()} here."} %>

The computation is run and the comment is rendered in development, test, and staging, but the computation will not even execute in production.

Conclusions

No one doubts that ample and strategic commenting is essential in any software development project. But web developers need to be conscious that some of their code can be viewed by an end-user. I don’t know, this all seems obvious to me, but I’m still amazed how many times I’ve seen potentially sensitive comments plainly visible in html. You need to assume that your users are viewing-source, and manage your comments, especially sensitive comments, accordingly. A simple tool like the one above can be a great help.

Ruby threads and Kernel#require
Posted on Wed, Dec 12, 2007, at 03:08 PM (5 comments)
Tags: ruby

For people like me with a Java background, one of the first things we bemoan and disparage when learning Ruby is its thread support. Simply put, Ruby threads are green threads, meaning they are implemented in user space. Among other things, this means they cannot be scheduled onto multiple processors or cores because the operating system does not know about them.

There are other issues, such as the fact that often-used libraries such as ActiveRecord are not thread-safe. This article gives a brief but fairly good overview of the issues, relevant to the current MRI 1.8 implementation. InfoQ also published a very interesting article last spring on the pros and cons of different threading strategies related to Ruby.

I recently had to design a multithreaded library for Zoodango’s deployment system, and I ran into another gotcha that I’m sure is known by someone, but I haven’t seen documented yet. Basically, the Kernel#require method has some arguably incorrect behavior in the presence of threads. If two or more threads require the same file simultaneously, one of them may “finish” prematurely, likely eventually resulting in NameError exceptions being raised when that thread tries to access names that have not yet been defined.

Let’s consider an example. Suppose, we have a ruby file ”foo.rb” that defines the class Foo. Now suppose we execute the following:

  t1 = Thread.new {
    require "foo"
    Foo.new
  }
  t2 = Thread.new {
    require "foo"
    Foo.new
  }
  t1.join; t2.join

The require method has the intended semantics that it will load and execute a ruby file, but will only do so once. Subsequent requires of the same file will skip the loading. But what happens when a second thread starts a require before the first thread finishes executing the file?

As far as I can (experimentally) determine, the check-and-set that controls the “already loaded” flag occurs right at the beginning of the require method. So the following is a potential run of the above code:

  1. Thread 1 checks “foo”, finds it hasn’t been loaded yet, and adds it to the loaded file list $".
  2. Thread 2 checks “foo”, and finds that it has been added to $" (by thread 1) so it doesn’t proceed to load it again. This is a good thing, of course. It means we never get into a state where two threads are loading and executing the same file simultaneously, which could do messy things to your namespace. However…
  3. Meanwhile, Thread 1 starts executing “foo”. It has to do a lot of things, beginning with loading the file, lexing, parsing… It hasn’t quite gotten around to defining the constant Foo yet…
  4. Thread 2, however, thinks that “foo” has already been loaded, so it tries to execute Foo.new. Boom! NameError gets raised.

You could argue that the above example is contrived. But this takes place in real code too. I ran into it when using Jamis Buck’s excellent library Net::SSH. Net::SSH lazily loads parts of its implementation as they are needed, executing a bunch of require statements during the process of initializing a session. (For those who are familiar with the implementation, it is using Jamis’s dependency injection library Needle to load parts of its implementation and API. Needle runs requires dynamically in its Container#require method.) Now, I think it’s reasonable to create Net::SSH sessions in two or more threads simultaneously, if you want to open connections to multiple servers and manage each connection with a thread. However, if you try to do so, you’ll intermittently raise a NameError out of the guts of Net::SSH.

I’m still searching for a good general solution for this. (And I haven’t mucked with the ruby trunk so I don’t know if it’s still an issue in 1.9.) For now, what I’ve done is attempt to wrap code that I know could contain dynamic require invocations in mutex synchronize blocks. For Net::SSH, that just includes code that creates sessions. However, it’s more complex in Net::SFTP because one of the internal callbacks, Protocol::Driver#do_version, can also trigger Needle, and there doesn’t seem to be a way to intercept it without monkeypatching. Another possibility is to try to patch Kernel#require directly, adding an internal mutex, but I’m not yet sure how to do that without introducing the possibility of deadlocks.

Update (27 June 2008) Net::SSH 2.0 and related libraries do in fact ditch the Needle dependency, so this issue no longer tends to show up when multiple threads access Net::SSH. However, the underlying issue with Kernel#require is still there.

The Rails Rdoc Template
Posted on Sat, Dec 08, 2007, at 10:26 PM (0 comments)
Tags: rails, ruby

It’s amazing what you find if you just bother to look.

I was updating Zoodango’s internal rdocs, and I got thinking, the layout that rdoc generates by default is not how I’d do it. I often have fairly long lists of files, classes, modules, and methods, and the three scrolling lists across the top are too small to scroll effectively. Just try scrolling through the class list on the Ruby core library rdocs and you’ll see what I mean.

Most people’s screens and browser windows are horizontal, so I couldn’t understand why the template would take up valuable vertical real estate in that way. The rdocs for Rails look much better—moving the scrolling lists to the left side. Not to mention that the formatting looks quite a bit nicer.

Lo and behold, I just discovered, it’s not hard to get the Rails-style view in your rdocs. The Rails rdoc template, it turns out, was written by Jamis Buck, the author of Capistrano, way back in early 2005. The template itself, along with instructions on how to use it, are on this blog posting from back then.

Jamis’s instructions say to install the jamis.rb file in your ruby installation. If you’re like me and don’t like mucking with your ruby installation, you can install it anywhere covered by your ruby require path, such as your $RAILS_ROOT/lib directory. It just needs to be in the path that rdoc expects; e.g. $RAILS_ROOT/lib/rdoc/generators/template/html/jamis.rb.

Recent
Tags
Random blogs