<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Daniel Azuma</title>
	<atom:link href="http://www.daniel-azuma.com/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.daniel-azuma.com/blog</link>
	<description>Theology and software development</description>
	<lastBuildDate>Mon, 06 Feb 2012 07:31:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Geo-Rails part 9: The PostGIS spatial_ref_sys Table and You</title>
		<link>http://www.daniel-azuma.com/blog/archives/239</link>
		<comments>http://www.daniel-azuma.com/blog/archives/239#comments</comments>
		<pubDate>Mon, 06 Feb 2012 07:31:10 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[PostGIS]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=239</guid>
		<description><![CDATA[When you create a spatial database using PostGIS, you may notice that PostGIS automatically installs a table called &#8220;spatial_ref_sys&#8221;. This is a standard table for spatial databases, as required by the Open Geospatial Consortium&#8217;s specification. It defines which SRIDs are &#8230; <a href="http://www.daniel-azuma.com/blog/archives/239">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When you create a spatial database using PostGIS, you may notice that PostGIS automatically installs a table called &#8220;spatial_ref_sys&#8221;. This is a standard table for spatial databases, as required by the Open Geospatial Consortium&#8217;s specification. It defines which SRIDs are allowed in your geometries, and provides information about the corresponding coordinate systems.</p>
<p>In this article, we&#8217;ll take a brief look at the <code>spatial_ref_sys</code> table and how you can use it in your application. We&#8217;ll cover:</p>
<ul>
<li>What&#8217;s useful about the <code>spatial_ref_sys</code> table</li>
<li>Where the <code>spatial_ref_sys</code> data comes from, and how you can populate your own custom data.</li>
<li>Accessing <code>spatial_ref_sys</code> data from Ruby using RGeo&#8217;s <code>SRSDatabase</code></li>
</ul>
<p>This is part 9 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-239"></span>So what&#8217;s in the <code>spatial_ref_sys</code> table anyway?</h2>
<p>The <code>spatial_ref_sys</code> table is defined by an OGC specification entitled <a title="Simple Features for SQL specification" href="http://www.opengeospatial.org/standards/sfs" target="_blank">Simple Feature Access Part 2: SQL Option</a>. This is a companion to the Simple Features <a title="Simple Features Access specification" href="http://www.opengeospatial.org/standards/sfa" target="_blank">specification</a> we covered in earlier articles. It takes the standard data types and operations and specifies how such data should appear in an SQL-based relational database. Many of the &#8220;<code>ST_*</code>&#8221; functions that we&#8217;ve used when interacting with PostGIS are defined in this specification.</p>
<p>We recall from <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a> that when you&#8217;re working with spatial data, every coordinate references a coordinate system that defines what the coordinate means&#8212;whether it is latitude and longitude, or feet from your front door, or light-years from Alpha Centauri. Those coordinate systems are generally represented by a numeric ID reference known as the SRID. <code>Spatial_ref_sys</code> is merely a table of known coordinate systems keyed by their SRID. According to the spec:</p>
<p style="padding-left: 30px;"><em>Every Geometry Column is associated with a Spatial Reference System. The Spatial Reference System identifies the coordinate system for all geometric objects stored in the column, and gives meaning to the numeric coordinate values for any geometric object stored in the column. &#8230; The Spatial Reference System Identifier (SRID) constitutes a unique integer key for a Spatial Reference System within a database.</em></p>
<p>How is this useful? Generally, <code>spatial_ref_sys</code> will provide an actual definition of the coordinate system for each SRID. This theoretically provides enough information to correctly interpret every piece of coordinate data in your database, and even&#8212;where possible&#8212;to convert the data into whatever coordinate system you need.</p>
<p>According to the spec, all implementations of <code>spatial_ref_sys</code> include these columns:</p>
<ul>
<li><strong>srid</strong>: The numeric SRID. This should be the table&#8217;s primary key.</li>
<li><strong>auth_name</strong>: An authority name as a string. This is set if this coordinate system is specified by an outside authority such as EPSG.</li>
<li><strong>auth_srid</strong>: The numeric ID of the coordinate system in the above authority&#8217;s catalog.</li>
<li><strong>srtext</strong>: The Well-Known-Text (WKT) representation of the coordinate system (as we described in part 4).</li>
</ul>
<p>If you are using PostGIS, you&#8217;ll notice <code>spatial_ref_sys</code> has one more non-standard but very useful column:</p>
<ul>
<li><strong>proj4text</strong>: The Proj4 representation of the coordinate system.</li>
</ul>
<h2>Where does the data come from and what does it do?</h2>
<p>In many cases, a spatial database will prepopulate <code>spatial_ref_sys</code> for you with a standard set of EPSG data. If you are using PostGIS, this is handled by the <code>spatial_ref_sys.sql</code> script, which gets run when you create a spatial database. If you are using Rails and follow the steps in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>, the activerecord-postgis-adapter will do this for you automatically when you create your database.</p>
<p>The EPSG spatial reference database is such a universal standard that in most cases it is probably best to use one of its coordinate systems and its corresponding SRID. This will maximize the chances that your SRIDs will match those used by any external data sources you will interact with&#8212;meaning your data will be easily portable. However, you may run into a case where you need to define your own coordinate system, perhaps because you&#8217;re using an unusual map projection. In such cases, you can add rows to the <code>spatial_ref_sys</code> table. You can also delete SRID rows that you don&#8217;t need. It is, after all, merely a table in your database.</p>
<p>The main caveat is that most spatial databases (including PostGIS) will establish a foreign key constraint between SRIDs and this table. This means that your data can&#8217;t just choose any SRID it wants. The SRID has to <em>exist</em>&#8212;and by &#8220;exist&#8221;, we simply mean it has to be present in the <code>spatial_ref_sys</code> table. So you just need to make sure that you don&#8217;t delete any SRIDs that you need, and you add any that aren&#8217;t provided by default.</p>
<p>Some databases will provide additional tools that leverage the <code>spatial_ref_sys</code> information. PostGIS, for example, provides the SQL function <code>ST_Transform()</code>, which lets you transform a geometry from one coordinate system to another. When you call it, you provide the SRID of the desired coordinate system. PostGIS then looks up both the original and the target SRIDs in <code>spatial_ref_sys</code>, and uses the coordinate system information there to figure out how to compute the transformation.</p>
<h2>Accessing the <code>spatial_ref_sys</code> from Ruby</h2>
<p>The <a title="RGeo gem" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> library provides a convenient way to look up coordinate systems from the <code>spatial_ref_sys</code> table. If you&#8217;re already using PostGIS and activerecord-postgis-adapter, it&#8217;s quite easy:</p>
<pre>srs_database = RGeo::CoordSys::SRSDatabase::ActiveRecordTable.new
entry = srs_database.get(4326)
entry.identifier  # =&gt; 4326
entry.name        # =&gt; "WGS 84"
entry.proj4.to_s  # =&gt; " +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"</pre>
<p>You can now, for example, create a factory using the SRID (from <code>entry.identifier</code>) and the Proj4 object (from <code>entry.proj4</code>). As we saw in <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, you generally need to provide Proj4 information if you are going to be converting data between coordinate systems.</p>
<p>As a convenient shorthand, some factories let you pass in an <code>srs_database</code> object when you construct a factory. The factory constructor will then go through the process of looking up the coordinate system definition and extracting the Proj4 specification. For example:</p>
<pre>srs_database = RGeo::CoordSys::SRSDatabase::ActiveRecordTable.new
my_factory = RGeo::Geos.factory(:srs_database =&gt; srs_database, :srid =&gt; 3785)
my_factory.proj4.to_s  # =&gt; " +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +units=m +k=1.0 +nadgrids=@null +no_defs"</pre>
<h2>Other SRSDatabase sources</h2>
<p>The <code>spatial_ref_sys</code> table is not the only source of coordinate system information. The Proj4 library itself installs a bunch of data files that contain most of the EPSG codes and other information. There are also online services that you can use to look up coordinate systems; one particularly important one is <a title="spatialreference.org web site" href="http://spatialreference.org" target="_blank">spatialreference.org</a>.</p>
<p>The SRSDatabase mechanism in RGeo supports a common interface for coordinate system databases. In the section above, we looked up EPSG 4326 from the <code>spatial_ref_sys</code> table using the <code>ActiveRecordTable</code> class. Alternately, we can use the <code>SrOrg</code> class to look up information from spatialreference.org:</p>
<pre>srs_database = RGeo::CoordSys::SRSDatabase::SrOrg.new('EPSG')
entry = srs_database.get(4326)
# ...</pre>
<p>You can choose an appropriate source of coordinate system information based on the requirements of your application. In many cases, however, I recommend using the <code>spatial_ref_sys</code> table because it is convenient and readily available. See the RDocs for RGeo for more information on connecting to different SRSDatabase sources.</p>
<h2>Where to go from here</h2>
<p>In this article, we covered <code>spatial_ref_sys</code>, one of the standard tables that is included in most spatial databases. For more information on this table and how it is implemented in the <a title="PostGIS spatial database" href="http://www.postgis.org/" target="_blank">PostGIS</a> database, see the <a title="PostGIS documentation" href="http://www.postgis.org/documentation/" target="_blank">PostGIS documentation</a> online. The official specification of the table is available in the OGS <a title="OGC Simple Features for SQL spec" href="http://www.opengeospatial.org/standards/sfs" target="_blank">Simple Features for SQL</a> spec.</p>
<p>RGeo provides basic convenience tools for accessing coordinate system information from the <code>spatial_ref_sys</code> table. We covered a few basic examples in this article. For detailed information, see the RGeo::CoordSys::SRSDatabase module in the <a title="RGeo rdoc reference" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo documentation</a>.</p>
<p>Observant readers may notice that PostGIS includes one more automatic table, called <code>geometry_columns</code>. I may cover this table in a later article that digs deeper into PostGIS, but if you&#8217;re interested now, it&#8217;s described in the PostGIS documentation.</p>
<p><em>This is part 9 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/239/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Setting the Database Free with ActiveRecord&#8217;s Connection API</title>
		<link>http://www.daniel-azuma.com/blog/archives/216</link>
		<comments>http://www.daniel-azuma.com/blog/archives/216#comments</comments>
		<pubDate>Thu, 26 Jan 2012 01:26:13 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[ActiveRecord]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Rails]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=216</guid>
		<description><![CDATA[TL;DR: ActiveRecord is more than just an ORM. It also provides a convenient common interface for writing direct SQL queries, for those times when you need to access your database&#8217;s advanced features. This article provides an introduction to the low-level &#8230; <a href="http://www.daniel-azuma.com/blog/archives/216">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>TL;DR</strong>: <em>ActiveRecord is more than just an ORM. It also provides a convenient common interface for writing direct SQL queries, for those times when you need to access your database&#8217;s advanced features. This article provides an introduction to the low-level ActiveRecord Connection API, which you can use to bypass ActiveRecord &#8220;models&#8221; and work directly with your database.</em></p>
<h2><span id="more-216"></span>My story</h2>
<p>When I first started working with Rails back in 2006, I followed the common practice of abstracting all my data and business logic into ActiveRecord &#8220;models.&#8221; This worked out for a short time, but soon I realized that it didn&#8217;t scale all that well to complex applications. There were a number of reasons, some of which are now finally being discussed and addressed in the community.</p>
<p>In any case, what followed was the Big Refactor of my Rails development process and architecture. On the one hand, I rediscovered Ruby&#8217;s object-oriented roots, separating behavior from persistence, and building higher-level business objects focused on actually modeling my system. And on the other hand, I rediscovered the database, writing more and more custom SQL to leverage its capabilities.</p>
<p>I still use ActiveRecord where it suits the task at hand. In many cases, it makes sense to treat rows of a table as data objects principally identified by a numeric ID, and in those cases ActiveRecord still shines. However, in other cases I&#8217;m finding the ORM more of a hindrance than a help. Many of my model implementations have begun bypassing ActiveRecord-based persistence altogether, instead creating plain old tables and querying with custom SQL.</p>
<p>The good news is that ActiveRecord provides some amount of support for low-level queries. You can share ActiveRecord&#8217;s database connection, and you don&#8217;t need to know the API of the underlying database driver, making it convenient to add custom SQL incrementally to your Rails application. In this article, I&#8217;ll provide an overview of ActiveRecord&#8217;s connection API, a convenient low-level interface for writing raw SQL and interacting with your database on its own terms.</p>
<h2>Obtaining a Low-Level Connection</h2>
<p>Hidden underneath your ActiveRecord class is a useful low-level object called the <em>ActiveRecord connection adapter</em>. It wraps and abstracts away the underlying database-specific driver, and provides a common interface for database tasks such as creating and destroying databases, creating and modifying tables, inserting, updating, and deleting data, running queries, and managing transactions. Normally, the connection adapter is used internally by ActiveRecord, but you can access it yourself if you want to talk to the database directly without using ActiveRecord &#8220;models.&#8221;</p>
<p>To obtain a connection adapter object, simply call the <code>connection</code> method on your ActiveRecord class or any ActiveRecord object:</p>
<pre>connection = User.connection
obj = User.find(1)
connection = obj.connection</pre>
<p>Which ActiveRecord class should you use? In most Rails applications, you talk to just one database, as defined in your database.yml file. For such an application, every ActiveRecord class, including <code>ActiveRecord::Base</code>, will give you the same connection. So in those cases, it doesn&#8217;t matter which class you use. However, if your Rails application connects to a secondary database for some ActiveRecord classes (using the <code>establish_connection</code> method) then those classes will yield a connection object pointing at the secondary database. In those cases, you will need to pay attention to which database you need to talk to, and ask for a connection from the right class.</p>
<h2>Managing connections and the connection pool</h2>
<p>Each connection adapter object represents a single connection to a database. Rails generally opens several connections at once and manages them in a connection pool. When a task needs a database connection, it checks one out of the pool; when it finishes, it checks the connection back in so that the next task can use it. Connections can run only one SQL statement at a time, so generally one connection is opened per thread.</p>
<p>In most Rails applications, this takes place transparently. When you&#8217;re processing an HTTP request, ActiveRecord automatically checks out a connection and memoizes it for the duration of the request. Whether you use the standard ActiveRecord APIs, or obtain a raw connection by calling the <code>connection</code> method described above, you get the request&#8217;s memoized connection. At the end of the request, ActiveRecord&#8217;s Rack middleware automatically checks the connection back in.</p>
<p>However, if you access ActiveRecord outside the context of handling a request via your controller, then you need to do connection management yourself. ActiveRecord will still automatically check out and memoize a connection when you ask for one. However, if you are not using a normal Rails controller, or are otherwise not running inside ActiveRecord&#8217;s ConnectionManager middleware, you should check in the connection manually when you are done. You can do so using the <code>clear_active_connections!</code> method on any of your ActiveRecord classes:</p>
<pre>User.clear_active_connections!</pre>
<p>Now that we know how to get a connection, let&#8217;s see what we can do with one.</p>
<h2>Running Low-Level Queries</h2>
<p>Most calls in the standard ActiveRecord API return ActiveRecord &#8220;model&#8221; objects. However, there might be cases when you want to bypass the overhead of creating these full ActiveRecord objects, or maybe you want to query data that doesn&#8217;t have a corresponding ActiveRecord class. The connection adapter&#8217;s low-level query methods let you write your own SQL, and return &#8220;plain old data&#8221; as raw result tables.</p>
<p>In this first example, we get the &#8220;name&#8221; value from a single row in our &#8220;users&#8221; table. If we only need the name, and don&#8217;t need the full ActiveRecord object with the rest of its data and capabilities, we can grab the connection object and use the <code>select_value</code> method:</p>
<pre>connection = User.connection
name = connection.select_value("SELECT name FROM users WHERE id=1")
# =&gt; "Dave"</pre>
<p>Use <code>select_value</code> when you need a single value from a single row. The result will be either a string, or nil if no rows match. Note that the returned value is a string even if the column has a different type such as a number or timestamp. You will need to perform your own conversion:</p>
<pre>time_str = connection.select_value(
  "SELECT created_at FROM users WHERE id=1")
# =&gt; "2011-12-24 11:35:02"
time = Time.parse("#{time_str} UTC")</pre>
<p>(This often gets me when I ask for a numeric value from the database. The <code>select_value</code> method returns it as a string, so be sure to call <code>to_i</code> on it if you want to do any math.)</p>
<p>If you want a single column from multiple rows, use <code>select_values</code>, which returns an array of strings.</p>
<pre>names = connection.select_values(
  "SELECT name FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; ['Bob', 'Kathy']</pre>
<p>To obtain multiple columns, you can use <code>select_rows</code>. This method returns an array of arrays, each representing a row in the order of the selected columns. For example:</p>
<pre>rows = connection.select_rows(
  "SELECT id,name FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; [['2', 'Bob'], ['4', 'Kathy']]</pre>
<p>This returns an array of two-element arrays. Remember that all values are returned as strings. In the example above, the &#8220;id&#8221; column is numeric, so you may need to call <code>to_i</code> if you want to treat the IDs as integers.</p>
<p>There is no <code>select_row</code> method. To select a single row, just use <code>select_rows</code> and take the first element.</p>
<p>Alternately, you can return rows as hashes (of column name to value) using <code>select_one</code> (for a single row) or <code>select_all</code> (for multiple rows).</p>
<pre>records = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; [{'id'=&gt;'2', 'name'=&gt;'Bob'}, {'id'=&gt;'4', 'name'=&gt;'Kathy'}]</pre>
<h2>Low-Level Data Updates</h2>
<p>Updating data is accomplished using similar calls. Insert rows into the database using the <code>insert</code> method:</p>
<pre>connection.insert(
  "INSERT INTO users SET name='Sally', created_at='now'")</pre>
<p>Update rows in the database using the <code>update</code> method. This method returns the number of rows affected by the update.</p>
<pre>row_count = connection.update(
  "UPDATE users SET name='Robert' WHERE id=2")
# =&gt; 1</pre>
<p>You can delete rows using <code>delete</code>, which again returns the number of rows deleted.</p>
<pre>row_count = connection.delete(
  "DELETE FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; 3</pre>
<p>Normally, however, you won&#8217;t be hard-coding values when you insert and update. You&#8217;ll be injecting data obtained from the user or some other source. Therefore, to avoid SQL injection attacks and other issues, you need to quote these values when you construct your SQL statement. Use the <code>quote</code> method for this purpose. It takes a Ruby object and represents it, properly quoted, in the syntax expected by the database. Here are some examples from the PostgreSQL connection adapter. (Connection objects for other databases will yield slightly different results depending on the database.)</p>
<pre>connection.quote("Hello") # =&gt; "'Hello'"
connection.quote("Joe's") # =&gt; "'Joe''s'"
connection.quote(2) # =&gt; "2"
connection.quote(true) # =&gt; "'t'"
connection.quote(Time.now) # =&gt; "'2012-01-23 07:46:41.540033'"</pre>
<p>Use <code>quote</code> when constructing SQL statements that update the database:</p>
<pre>new_name = "Robert"
row_id = 2
row_count = connection.update(
  "UPDATE users SET name=#{connection.quote(new_name)}"+
  " WHERE id=#{connection.quote(row_id)}")</pre>
<h2>Using Prepared Statements</h2>
<p>Prepared statements are a common database optimization technique. The idea is that often many of the queries run by your application will have a common form. For example, you might find yourself running hundreds of queries of the form &#8220;<code>SELECT name FROM users WHERE id=</code><em>[something]</em>&#8220;. Instead of reparsing the entire statement and rerunning the query planner on every query, you can often instruct the database to parse and plan only once, and re-use that information on subsequent queries. The way to do that is with a prepared statement.</p>
<p>As of Rails 3.1, ActiveRecord includes support for prepared statements. The standard ActiveRecord API will utilize prepared statements transparently. If you are using the low-level connection API, you can set up prepared statements manually.</p>
<p>For running queries, the only one of the methods we&#8217;ve covered that supports prepared statements is <code>select_all</code>. Instead of the full statement, you must send a statement template and a list of values to substitute in. The list of values should be an array of two-element arrays; the second element of each array is the value. The first element is generally set to the ActiveRecord column type governing the type of the value, but you can set it to nil if you&#8217;re confident of the type you&#8217;re sending in. The values must be sent in as the third argument. (The second is a &#8220;name&#8221; for the query, used for annotating the logs. You can set it to nil.) Here&#8217;s the earlier <code>select_all</code> example, rewritten using a prepared statement.</p>
<pre>records = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;$1",
  nil, [[nil, ::Time.utc(2012,1,1)]])
# =&gt; [{'id'=&gt;'2', 'name'=&gt;'Bob'}, {'id'=&gt;'4', 'name'=&gt;'Kathy'}]</pre>
<p>Now, if you have additional requests of the same form (using the same template), they should run faster because the prepared statement (with its prepared SQL and predetermined query plan) is being reused.</p>
<pre>records2 = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;$1",
  nil, [[nil, ::Time.utc(2012,1,2)]])
records3 = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;$1",
  nil, [[nil, ::Time.utc(2012,1,3)]])</pre>
<p>The <code>insert</code>, <code>update</code>, and <code>delete</code> methods provide similar access to prepared statements. For the <code>update</code> and <code>delete</code> methods, use the same arguments as <code>select_all</code>: the template first, followed by a query name (which can be nil) and then an array of values to inject into the statement:</p>
<pre>row_count = connection.update(
  "UPDATE users SET name=$1 WHERE id=$2",
  nil, [[nil, "Robert"], [nil, 2]])</pre>
<p>The <code>insert</code> method is a little more complex because it takes a series of three more arguments before the values list. (Those extra arguments are for supporting databases that make you manage primary key sequences manually. In most cases, you can set those to nil.) So the values array should be passed as the <em>sixth</em> argument to <code>insert</code>. It&#8217;s a little messy, but this is the way the API is set up as of Rails 3.1.</p>
<pre>connection.insert(
  "INSERT INTO users SET name=$1, created_at=$2",
  nil, nil, nil, nil, [[nil, "Sally"], [nil, "now"]])</pre>
<p>Prepared statement support requires ActiveRecord 3.1 or later. On older versions, you will need to construct the entire SQL statement using <code>quote</code> to inject values.</p>
<h2>Low-Level Migrations and Other Features</h2>
<p>Migrations are a very common case for dropping to low-level SQL statements, since there may be many cases when you&#8217;ll want more control over your schema than ActiveRecord gives you.</p>
<p>When you write an ActiveRecord migration, you actually are already using the connection adapter API. The <code>create_table</code> and similar methods are connection adapter methods; the migration simply delegates them to the adapter. This means you can use all the methods we&#8217;ve been discussing, such as insert and update, directly in your migration if you wanted to inject data during that process. (With the caveat, of course, that many consider it bad practice to alter data during a migration.)</p>
<p>To perform schema changes using raw SQL, you should generally use the <code>execute</code> method. This method returns the underlying database driver&#8217;s result object; it doesn&#8217;t try to do any postprocessing on the result, and so it generally may be faster than the calls we&#8217;ve been looking at so far. It&#8217;s useful for cases when you want the <em>very</em> low-level driver-specific result, or (as in most migrations) for cases when you don&#8217;t care about the result. Here&#8217;s an example using PostgreSQL&#8217;s flavor of SQL:</p>
<pre>class MyMigration &lt; ActiveRecord::Migration
  def up
    execute &lt;&lt;-SQL
      CREATE TABLE users (
        id SERIAL PRIMARY KEY,
        name CHARACTER VARYING NOT NULL,
        created_at TIMESTAMP NOT NULL)
    SQL
  end
  def down
    execute 'DROP TABLE users'
  end
end</pre>
<p>Nowadays when I write migrations, I use <code>execute</code> for almost everything because I generally want to fine-tune the database schema. It&#8217;s not strictly necessary in the above very simple case, but in more complex applications, you may want to set up constraints, triggers, and other data management features in your database, and you&#8217;ll need it in those cases.</p>
<p>Of course, if you use <code>execute</code>, you will have to write out both the forward and reverse migrations. You can&#8217;t use Rails 3.1&#8242;s &#8220;change&#8221; feature which attempts to auto-create the backward migration from the forward migration&#8212;Rails isn&#8217;t <em>quite</em> smart enough to figure out how to reverse a change made by arbitrary SQL. But you may find it an acceptable trade-off, as I have.</p>
<p>Another common use for the connection object is to delimit transactions. If you have a set of statements that should be wrapped in a transaction, use the <code>transaction</code> method:</p>
<pre>connection.transaction do
  connection.update("UPDATE users SET name='Robert' WHERE id=2")
  connection.update("UPDATE users SET name='Catherine' WHERE id=4")
end</pre>
<p>Because the connection adapter is shared with ActiveRecord, you can also include high-level ActiveRecord calls in your transaction block, interspersed with your low-level calls. For example, you could write the above code like this:</p>
<pre>connection.transaction do
  connection.update("UPDATE users SET name='Robert' WHERE id=2")
  obj = User.find(4)
  obj.name = "Catherine"
  obj.save
end</pre>
<p>I prefer the first version, however. For starters, it&#8217;s more performant&#8212;it doesn&#8217;t require an extra select to get the data, and it doesn&#8217;t require building the ActiveRecord object simply to update a single field. But in my opinion, it&#8217;s also cleaner and more succinct. I know some don&#8217;t like seeing SQL in your web app, but in many cases it&#8217;s a very expressive and readable language for tasks like this. I believe is simply a case of the right tool for the right job.</p>
<p>You can also obtain various information about the capabilities of the database and the database driver. Here are just a few examples:</p>
<pre>connection.supports_savepoints?       # Database supports savepoints?
connection.supports_statement_cache?  # Supports prepared statements?
connection.table_name_length          # Maximum table name length
connection.columns_per_table          # Maximum columns per table</pre>
<p>Finally, the connection adapter provides access to the underlying driver-specific connection for when you need to access <em>really</em> advanced or database-specific features. Here&#8217;s an example (assuming you&#8217;re using the postgresql database adapter).</p>
<pre>raw_connection = connection.raw_connection  # Returns a PGconn object
pg_server_version = raw_connection.server_version.to_s</pre>
<h2>For More Information</h2>
<p>This has been an overview of the tools that ActiveRecord provides for bypassing the ActiveRecord &#8220;models&#8221; and accessing the advanced features of your database directly. Most of us won&#8217;t write entire Rails applications using <em>only</em> this low-level API. However, it is a good tool to have in the toolbox. For those cases when the ActiveRecord ORM doesn&#8217;t quite do what you need, or does so clumsily or inefficiently, ActiveRecord lets you drop down to SQL quite easily.</p>
<p>The good news is that all these calls are well-documented on <a title="Rails API" href="http://api.rubyonrails.org/" target="_blank">http://api.rubyonrails.org/</a>:</p>
<ul>
<li>The <code>ActiveRecord::ConnectionAdapters::DatabaseStatements</code> module describes most of the low-level methods for querying and updating data that we covered in this article.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::Quoting</code> module describes various methods for quoting data for injection into SQL.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::SchemaStatements</code> module describes the schema-manipulation methods you are probably familiar with using for migrations.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::TableDefinition</code> class includes the methods you can call inside <code>create_table</code>.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::DatabaseLimits</code> module describes a miscellaneous set of informational methods you can call.</li>
</ul>
<h2>About the Author</h2>
<p>Daniel Azuma is a Ruby developer specializing in geospatial technologies, computational geometry, graphics, and related fields. He is the author of <a title="RGeo gem" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">rgeo</a> and related gems for <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">geospatial analysis</a> in Ruby and Rails applications. He currently works as Chief Software Architect at <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>.</p>
<p><strong>Edits (Fri 27 Jan)&#8212;</strong> I made some corrections to the prepared statement examples so they actually work. Note to self: test examples before publishing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/216/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 8: ZCTA Lookup, A Worked Example</title>
		<link>http://www.daniel-azuma.com/blog/archives/191</link>
		<comments>http://www.daniel-azuma.com/blog/archives/191#comments</comments>
		<pubDate>Mon, 16 Jan 2012 08:02:35 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[RGeo]]></category>
		<category><![CDATA[segmentation]]></category>
		<category><![CDATA[Shapefile]]></category>
		<category><![CDATA[ZCTA]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=191</guid>
		<description><![CDATA[This week we&#8217;ll put together what we&#8217;ve covered so far in this series by implementing a simple but usable service: looking up the Zip Code Tabulation Area (ZCTA) for a location. This is an actual task I had to do &#8230; <a href="http://www.daniel-azuma.com/blog/archives/191">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This week we&#8217;ll put together what we&#8217;ve covered so far in this series by implementing a simple but usable service: looking up the Zip Code Tabulation Area (ZCTA) for a location. This is an actual task I had to do for my job at <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>, and while I will pare it down for this article, we&#8217;ll go through some of the actual trade-offs and optimization decisions I made in our implementation.</p>
<p>In this article, we will cover:</p>
<ul>
<li>The goals for the service, and what is a ZCTA anyway</li>
<li>Obtaining ZCTA data from the U.S. Census</li>
<li>Developing our own ZCTA database</li>
<li>Querying the database</li>
<li>Improving performance using polygon segmentation</li>
</ul>
<p>This is part 8 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-191"></span>ZCT-what?</h2>
<p>At <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>, we do a lot of geospatial analysis. One task we have to perform fairly frequently is to group data into neighborhoods for deomographic and spatial analysis.</p>
<p>Now, mapping coordinates to neighborhoods is a more challenging task than you might suspect. There is no &#8220;official&#8221; database, and boundaries on the ground are often imprecise and subject to rapidly evolving local information. One way that you could go about it is to define neighborhoods as zip codes or clusters of zip codes, but this carries its own challenges because (perhaps counter-intuitively) zip code &#8220;boundaries&#8221; are not well-defined either. Zip codes are defined by postal routes, and don&#8217;t necessarily correspond to or respect any other boundary information, including even city and state boundaries. (For further discussion, see the US Census position on zip codes at <a title="Census position on zip code boundaries" href="http://www.census.gov/geo/www/tiger/tigermap.html#ZIP" target="_blank">http://www.census.gov/geo/www/tiger/tigermap.html#ZIP</a>.)</p>
<p>There are two different ways of tackling the problem of mapping neighborhoods or zip codes. One is to use a (paid) service to do the local heavy lifting&#8212;curating and cleaning the data, and tracking and applying local knowledge&#8212;for you. I generally recommend <a title="Maponics" href="http://www.maponics.com/" target="_blank">Maponics</a> for this task, but there are a variety of services available. Alternatively, if you&#8217;re content with approximate data, and/or you have more direct access to some amount of local knowledge, you can build your own database using Zip Code Tabulation Areas (ZCTA).</p>
<p>ZCTA (pronounced &#8220;zik-tuh&#8221;) is a system created by the US Census to address the zip code problem. The idea is this. Sometimes you need to look up the actual zip code for a location for reasons related to postal delivery. In such cases you will need to work directly with the US Postal Service or a third-party curator like Maponics. But other times you don&#8217;t necessarily need the <em>actual</em> zip code, but you just want to use something like a zip code as a convenient unit of delineation for geo-analysis&#8212;for example, to approximate neighborhood boundaries. In this latter case, it doesn&#8217;t matter that the zip code itself isn&#8217;t always 100% accurate. Rather, what&#8217;s important is that the boundaries are stable and make geographic and demographic sense. ZCTA is designed for this latter case.</p>
<p>Each ZCTA is a collection of US Census blocks for which, at the time of the census, the addresses fell largely if not completely within a particular zip code. Because ZCTAs are made up of Census blocks, they are well-defined, stable, and statistically useful. And usually, they approximate the actual zip code boundaries fairly well.</p>
<p>For this article, we&#8217;ll build a simple ZCTA lookup tool: one that will let you query a location (latitude and longitude) for which ZCTA contains it.</p>
<h2>Setting up our database to hold ZCTA data</h2>
<p>I&#8217;ll assume we&#8217;ve set up a Rails project and a <a title="PostGIS spatial database" href="http://www.postgis.org/" target="_blank">PostGIS</a> database as covered in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>. Let&#8217;s start by creating a model for the ZCTA data. For now we&#8217;ll just create a simple table capable of mapping geometry to ZCTA (zip code). You can of course modify this to add more fields.</p>
<pre>% rails generate model Zcta zcta:integer region:polygon</pre>
<p>This creates the following migration:</p>
<pre>class CreateZctas &lt; ActiveRecord::Migration
  def change
    create_table :zctas do |t|
      t.integer :zcta
      t.polygon :region
      t.timestamps
    end
  end
end</pre>
<p>Before we migrate, we&#8217;ll do a few things here. First, we don&#8217;t need those timestamps so we&#8217;ll get rid of them. Second, we will want to do spatial queries against the <code>:region</code> column, so we&#8217;ll create a spatial index on that column, as we covered in <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>.</p>
<p>Third, we&#8217;ll choose a coordinate system for the <code>:region</code> column. For this service, I&#8217;ll use the EPSG 3785 projection. This coordinate system is often useful because of its affinity with mapping software, its local conformality, and its ability to keep political boundaries straight. It&#8217;s also good to choose a flat projection rather than a geographic coordinate system because we&#8217;re going to do some geometric manipulation. You can read more discussion on choosing a coordinate system for your database in <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>.</p>
<p>Our migration now looks like this:</p>
<pre>class CreateZctas &lt; ActiveRecord::Migration
  def change
    create_table :zctas do |t|
      t.integer :zcta
      t.polygon :region, :srid =&gt; 3785
    end
    change_table :zctas do |t|
      t.index :region, :spatial =&gt; true
    end
  end
end</pre>
<p>Now we can run the migration:</p>
<pre>% rake db:migrate</pre>
<p>As we discussed in <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>, we also set up the ActiveRecord class to use the simple_mercator_factory.</p>
<pre>class Zcta &lt; ActiveRecord::Base
 FACTORY = RGeo::Geographic.simple_mercator_factory
 set_rgeo_factory_for_column(:region, FACTORY.projection_factory)
end</pre>
<h2>Obtaining ZCTA data and populating our database</h2>
<p>A nice feature of ZCTA is that it is public data freely downloadable from the US government. A good start point for exploring the current (2010) Census ZCTA data is <a title="ZCTA page at the US Census" href="http://www.census.gov/geo/ZCTA/zcta.html" target="_blank">http://www.census.gov/geo/ZCTA/zcta.html</a>. If you want to go straight to the downloads, head over <a title="ZCTA download" href="http://www.census.gov/cgi-bin/geo/shapefiles2010/main" target="_blank">here</a> and choose &#8220;Zip Code Tabulation Areas&#8221; from the menu. You can download shapefiles for individual states, or the entire database as one huge shapefile. (Warning: the combined shapefile download is half a gigabyte compressed.)</p>
<p>For this example, we&#8217;ll download just the state of Washington, but it should be trivial to modify the code to deal with the entire database. When you download the data for Washington, you&#8217;ll end up with a zip file &#8220;tl_2010_53_zcta510.zip&#8221;. Unzipping this file yields a set of five files:</p>
<ul>
<li>tl_2010_53_zcta510.dbf</li>
<li>tl_2010_53_zcta510.prj</li>
<li>tl_2010_53_zcta510.shp</li>
<li>tl_2010_53_zcta510.shp.xml</li>
<li>tl_2010_53_zcta510.shx</li>
</ul>
<p>The .shp extension clues us in that this is a shapefile, one of the formats we covered in <a title="Geo-Rails part 5" href="http://www.daniel-azuma.com/blog/archives/125" target="_blank">part 5</a>. Shapefiles are a very common format for public data sets. They&#8217;re great for downloading data, but not for running spatial searches. So our next task is to import the shapefile into our database.</p>
<p>The shapefile specifies that its geometric information is in the &#8220;NAD83&#8243; geographic coordinate system (EPSG 4269). This is a geographic (latitude-longitude) coordinate system optimized for the United States. It does have very slight differences from the WGS84-based coordinate system (EPSG 4326) that we usually use, but for our purposes, the differences are negligible, so we&#8217;ll treat these as standard WGS84 geographic coordinates.</p>
<p>Now, our database uses the EPSG 3785 projection, so we&#8217;ll need to convert the polygons into the projection. We covered how to use the simple_mercator_factory to perform these projections in <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>. As we saw in <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, converting lines and polygons between coordinate systems can change their shape if individual sides are long enough. For our purpose, ZCTA areas are small enough that we&#8217;ll ignore the effect.</p>
<p>One more issue is that the geometries we&#8217;ll read from the shapefile are actually MultiPolygons rather than Polygons. This is because some ZCTAs (such as those that cover islands) may include multiple disjoint areas. Since we defined our database column to store polygons, we&#8217;ll have to break up each MultiPolygon into its constituent parts. This means we may have some ZCTA numbers that are represented by multiple records in the database.</p>
<p>For the ZCTA number, we&#8217;ll take a quick peek at the Census-provided documentation at <a title="TIGER documentation" href="http://www.census.gov/geo/www/tiger/tgrshp2010/documentation.html" target="_blank">http://www.census.gov/geo/www/tiger/tgrshp2010/documentation.html</a>. Here it states that the ZCTA number can be found in the &#8220;<code>ZCTA5CE10</code>&#8221; property field of each shapefile record.</p>
<p>Now we have enough information to write a script to read the shapefile and populate our database!</p>
<pre>require 'rgeo-shapefile'
RGeo::Shapefile::Reader.open('tl_2010_53_zcta510.shp',
    :factory =&gt; Location::FACTORY) do |file|
  file.each do |record|
    zcta = record['ZCTA5CE10'].to_i
    # The record geometry is a MultiPolygon. Iterate
    # over its parts.
    record.geometry.projection.each do |poly|
      Zcta.create(:zcta =&gt; zcta, :region =&gt; poly)
    end
  end
end</pre>
<p>Let that run for a minute or two, and now we have a fully populated database of ZCTA data for the state of Washington.</p>
<h2>Running ZCTA queries</h2>
<p>Now let&#8217;s write a simple API for querying the ZCTA for a given location.</p>
<p>First we&#8217;ll write some scopes in the ActiveRecord class to construct the queries. To find the ZCTA that contains a particular point location (latitude and longitude), we can use the <code>ST_Intersects</code> function. We just need to make sure we convert the location to the projected coordinate system, as we covered at the end of <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>.</p>
<p>For best performance, we&#8217;ll write our queries to speak PostGIS&#8217;s native language, which is EWKB (see <a title="Geo-Rails part 5" href="http://www.daniel-azuma.com/blog/archives/125" target="_blank">part 5</a>).</p>
<pre>class Zcta &lt; ActiveRecord::Base

  # ...

  EWKB = RGeo::WKRep::WKBGenerator.new(:type_format =&gt; :ewkb,
    :emit_ewkb_srid =&gt; true, :hex_format =&gt; true)

  def self.containing_latlon(lat, lon)
    ewkb = EWKB.generate(FACTORY.point(lon, lat).projection)
    where("ST_Intersects(region, ST_GeomFromEWKB(E'\\\\x#{ewkb}'))")
  end

end</pre>
<p>We could also extend this to support queries by any arbitrary geometry, letting us find the ZCTAs that cover a line or polygon:</p>
<pre>class Zcta &lt; ActiveRecord::Base

  # ...

  def self.containing_geom(geom)
    ewkb = EWKB.generate(FACTORY.project(geom))
    where("ST_Intersects(region, ST_GeomFromEWKB(E'\\\\x#{ewkb)}'))")
  end

end</pre>
<p>Now it&#8217;s pretty straightforward to write a web service API wrapper for this function. Here&#8217;s one way it could be done:</p>
<pre>class ZctaController

  def lookup
    lat = params[:lat].to_f
    lon = params[:lon].to_f
    zcta = Zcta.containing_latlon(lat, lon).first
    render(:json =&gt; {:lat =&gt; lat, :lon =&gt; lon,
      :zcta =&gt; zcta ? zcta.zcta : nil})
  end

end</pre>
<h2>Segmenting polygons for improved performance</h2>
<p>Now that we have a basic implementation, let&#8217;s see if we can improve performance a bit. In <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>, we saw that large, complex geometries, such as polygons with many sides, can result in slow queries. The ZCTA data, it turns out, does have some fairly large polygons with side counts in the tens of thousands. Since all we&#8217;re interested in is looking up ZCTA by location, we may be able to improve performance using the segmentation technique.</p>
<p>Segmentation involves breaking up large polygons into smaller polygons with fewer sides. It trades off a smaller polygon size for a larger number of rows in the database. However, the spatial index can help mitigate queries against a large number of rows, so such a trade-off may be a performance win in some situations. (Of course, we should measure so we know for certain&#8212;we&#8217;ll do that below.)</p>
<p>We&#8217;ll segment using four-to-one subdivision as described in <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>. For each polygon, we&#8217;ll count its sides, and if the count is larger than some threshold, we&#8217;ll divide it in half horizontally and vertically. An easy way to accomplish this division is to take the polygon&#8217;s bounding box and divide it four ways into smaller rectangles. Then, take the intersections of the original polygon with those sub-rectangles. These functions are available in the Simple Features interfaces, and are implemented by RGeo, as we covered in <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>.</p>
<p>Note that it is possible for a subdivision to result in multiple disjoint polygons in each quadrant (that is, a MultiPolygon). So we have to handle that case in the code.</p>
<p>We&#8217;ll also perform one more optimization: if the polygon is long and thin, we&#8217;ll divide it in two rather than in four, in order to make the pieces closer to square.</p>
<p>Ready? Here&#8217;s our implementation:</p>
<pre>MAX_SIZE = 500
MAX_DEPTH = 12

require 'rgeo-shapefile'

# Handle a geometry of any type
def handle_geometry(depth, geom, zcta)
  case geom
  when ::RGeo::Feature::Polygon
    handle_polygon(depth, geom, zcta)
  when ::RGeo::Feature::MultiPolygon
    geom.each{ |polygon| handle_polygon(depth, polygon, zcta) }
  end
end

# Handle a polygon
def handle_polygon(depth, polygon, zcta)
  # Check the number of sides. We'll combine the number of sides for
  # the "outer edge" and any "holes" that the polygon might have.
  # A polygon boundary consists of a LineString that is closed, so
  # the first and last points are the same. Therefore, to count the
  # sides, count the number of vertices and subtract 1.
  sides = polygon.exterior_ring.num_points - 1
  polygon.interior_rings.each{ |ring| sides += ring.num_points - 1 }
  if depth &gt;= MAX_DEPTH || sides &lt;= MAX_SIZE
    # The polygon is small enough, or we recursed as far as we're
    # willing. Just add the polygon.
    Zcta.create(:zcta =&gt; zcta, :region =&gt; polygon)
  else
    # Split the polygon 4-to-1 and recurse
    depth = depth + 1
    # Find the bounding box for the polygon
    envelope = polygon.envelope.exterior_ring
    p1 = envelope.point_n(0)
    p2 = envelope.point_n(2)
    min_x = p1.x
    max_x = p2.x
    min_x, max_x = max_x, min_x if min_x &gt; max_x
    min_y = p1.y
    max_y = p2.y
    min_y, max_y = max_y, min_y if min_y &gt; max_y
    # dx and dy are the size of the bounding box.
    # cx and cy are the center point.
    dx = max_x - min_x
    dy = max_y - min_y
    cx = (min_x + max_x) * 0.5
    cy = (min_y + max_y) * 0.5
    # Check the aspect ratio of the bounding box. If it's very wide
    # or very tall, then only split in half. Otherwise, split in 4.
    if dy &gt; dx * 2
      # The bounding box is tall, so split in half
      handle_quadrant(depth, polygon, min_x, min_y, max_x, cy, zcta)
      handle_quadrant(depth, polygon, min_x, cy, max_x, max_y, zcta)
    elsif dx &gt; dy * 2
      # The bounding box is wide, so split in half
      handle_quadrant(depth, polygon, min_x, min_y, cx, max_y, zcta)
      handle_quadrant(depth, polygon, cx, min_y, max_x, max_y, zcta)
    else
      # The bounding box is close to square so split in four
      handle_quadrant(depth, polygon, min_x, min_y, cx, cy, zcta)
      handle_quadrant(depth, polygon, cx, min_y, max_x, cy, zcta)
      handle_quadrant(depth, polygon, min_x, cy, cx, max_y, zcta)
      handle_quadrant(depth, polygon, cx, cy, max_x, max_y, zcta)
    end
  end
end

# Take a polygon and a box. Run the algorithm on the part of the
# polygon that falls within the box.
def handle_quadrant(depth, polygon, min_x, min_y, max_x, max_y, zcta)
  # We do this by creating a rectangle for the box, and computing
  # the intersection with the input polygon. The result could be a
  # polygon, a MultiPolygon, or an empty geometry.
  box = Zcta::FACTORY.polygon(Zcta::FACTORY.linear_ring([
    Zcta::FACTORY.point(min_x, min_y),
    Zcta::FACTORY.point(min_x, max_y),
    Zcta::FACTORY.point(max_x, max_y),
    Zcta::FACTORY.point(max_x, min_y)]))
  handle_geometry(depth, polygon.intersection(box), zcta)
end

# The main shapefile reader.
RGeo::Shapefile::Reader.open('tl_2010_53_zcta510.shp',
    :factory =&gt; Zcta::FACTORY) do |file|
  file.each do |record|
    # For each MultiPolygon, analyze it and add to the database
    handle_geometry(0, record.geometry.projection,
      record['ZCTA5CE10'].to_i)
  end
end</pre>
<p>Now whenever we consider an optimization, we have to measure its effect. Does it actually work? And if it does, what value of MAX_SIZE should we use?</p>
<p>To find out, I ran the segmentation on the Washington state ZCTA data with different values of MAX_SIZE, and then ran a simple benchmark on each segmentation. The benchmark consisted of 50000 queries randomly distributed across the state. I timed the results on my laptop (an early 2011 Macbook Pro running OSX 10.6.8, Ruby 1.9.2, PostgreSQL 9.0.6, and PostGIS 1.5.3).</p>
<p>This first graph shows the total number of polygons (database rows) created by the segmentation process, plotted against the MAX_SIZE parameter. The original database had 622 polygons, with a maximum of 9893 sides. As our threshold on the number of sides approaches the low hundreds and smaller, the number of polygons (and hence the number of rows in the database) gets very large.</p>
<p><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/polygons_my_maxsize.png"><img class="aligncenter size-full wp-image-195" title="polygons_my_maxsize" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/polygons_my_maxsize.png" alt="" width="486" height="324" /></a></p>
<p>This second graph shows the total time taken by 50,000 queries against the segmented database, plotted against the MAX_SIZE parameter. The benchmark against the original database took 18.15 seconds. As we can see, decreasing the size of each polygon (by running more subdivisions) improves our query performance up to a point, where the larger number of rows becomes significant.</p>
<p><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/time_by_maxsize.png"><img class="aligncenter size-full wp-image-196" title="time_by_maxsize" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/time_by_maxsize.png" alt="" width="465" height="321" /></a></p>
<p>The sweet spot seems to be around the 300-500 side range. At that point, our optimization has cut average query times roughly in half. When I segmented the entire US database of ZCTAs for <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>, we set MAX_SIZE to 500.</p>
<p>A more difficult question is, is the benefit we measured worth the extra complexity introduced by segmenting? That will depend. At Pirq, we sometimes run these queries in an inner loop, so every optimization matters. We also loaded a few much larger polygons, for which the segmentation procedure had a much more dramatic effect. So for our application, we determined that it was worth it. However, as with all benchmarking, it&#8217;s important to do your own measurements for your application.</p>
<h2>Where to go from here</h2>
<p>This article concludes my original outline for this series on geospatial development with Rails. But not to worry&#8212;it&#8217;s not the end. There&#8217;s still plenty of material to cover and plenty of discussion to be had. I just don&#8217;t have the next few parts already planned out yet. So this is your chance to direct where this series goes from here. If you have a topic you&#8217;d like covered, leave me a comment!</p>
<p>That said, there are a couple of major topics that I haven&#8217;t yet covered.</p>
<ul>
<li>I&#8217;ve covered <em>vector</em> data (i.e. points, lines, and polygons) but not <em>raster</em> data (i.e. image overlays). This is for several reasons. First, raster support in PostGIS is relatively new and not yet that mature. (In fact, unless you are using prereleases of PostGIS 2.0, you have to install another third-party library for raster support.) Second, the Ruby tools, notably RGeo, don&#8217;t yet support raster data either. And third, you may have noticed that a major theme of these first eight articles has been understanding coordinate systems and projections. This is critical background knowledge for handling raster data, so I though it was important to cover it first.</li>
<li>I haven&#8217;t covered much on visualization tools. This is largely because my own work has been largely focused on the back-end, so I don&#8217;t yet have a lot to contribute on the view side.</li>
</ul>
<p>I will write something on those topics in the future, but I&#8217;m not sure when I&#8217;ll get to the point where I have enough useful material. In the meantime, the floor is open for other topics!</p>
<p>For now I&#8217;ll conclude with links to resources on the tools that we&#8217;ve been working with during these articles.</p>
<p><a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> is the open source geospatial database of choice. It is an add-on library to the venerable <a title="PostgreSQL" href="http://www.postgresql.org/" target="_blank">PostgreSQL</a> open source database. For more information on PostGIS, see the <a title="PostGIS documentation" href="http://www.postgis.org/documentation/" target="_blank">documentation</a> online, and sign up for the <a title="postgis-users mailing list" href="http://postgis.refractions.net/mailman/listinfo/postgis-users" target="_blank">postgis-users</a> mailing list.</p>
<p><a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> provides the core geospatial vector data types for Ruby. It is installed as the gem &#8220;<a title="rgeo gem" href="http://rubygems.org/gems/rgeo" target="_blank">rgeo</a>&#8220;. For more information on RGeo, see the <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">documentation</a> online, report <a title="RGeo issues" href="http://github.com/dazuma/rgeo/issues" target="_blank">issues</a> and contribute to the <a title="RGeo source" href="http://github.com/dazuma/rgeo" target="_blank">source</a> at Github, and sign up for the <a title="rgeo-users mailing list" href="http://groups.google.com/group/rgeo-users" target="_blank">rgeo-users</a> mailing list.</p>
<p>We have also covered several other ruby gems that are used for more specialized tasks. These include:</p>
<ul>
<li><a title="rgeo-shapefile gem" href="http://rubygems.org/gems/rgeo-shapefile" target="_blank">rgeo-shapefile</a> (reading ESRI shapefiles). <a title="rgeo-shapefile documentation" href="http://virtuoso.rubyforge.org/rgeo-shapefile/README_rdoc.html" target="_blank">documentation</a> / <a title="rgeo-shapefile issues" href="http://github.com/dazuma/rgeo-shapefile/issues" target="_blank">issues</a> / <a title="rgeo-shapefile on Github" href="http://github.com/dazuma/rgeo-shapefile" target="_blank">source</a></li>
<li><a title="rgeo-geojson gem" href="http://rubygems.org/gems/rgeo-geojson" target="_blank">rgeo-geojson</a> (reading and writing GeoJSON). <a title="rgeo-geojson documentation" href="http://virtuoso.rubyforge.org/rgeo-geojson/README_rdoc.html" target="_blank">documentation</a> / <a title="rgeo-geojson issues" href="http://github.com/dazuma/rgeo-geojson/issues" target="_blank">issues</a> / <a title="rgeo-geojson on Github" href="http://github.com/dazuma/rgeo-geojson" target="_blank">source</a></li>
<li><a title="activerecord-postgis-adapter gem" href="http://rubygems.org/gems/activerecord-postgis-adapter" target="_blank">activerecord-postgis-adapter</a> (ActiveRecord adapter for PostGIS). <a title="activerecord-postgis-adapter documentation" href="http://virtuoso.rubyforge.org/activerecord-postgis-adapter/README_rdoc.html" target="_blank">documentation</a> / <a title="activerecord-postgis-adapter issues" href="http://github.com/dazuma/activerecord-postgis-adapter/issues" target="_blank">issues</a> / <a title="activerecord-postgis-adapter on Github" href="http://github.com/dazuma/activerecord-postgis-adapter" target="_blank">source</a></li>
</ul>
<p>There are, of course, many other ruby libraries for other related tasks such as geocoding. Some of these will likely be the subject of future articles.</p>
<p>Finally, I started a mailing list for general geospatial rails discussion, the &#8220;<a title="GeoRails google group" href="http://groups.google.com/group/georails" target="_blank">georails</a>&#8221; google group. Sign up if you&#8217;re interested in more community discussion.</p>
<p><em>This is part 8 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/191/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 7: Geometry vs. Geography, or, How I Learned To Stop Worrying And Love Projections</title>
		<link>http://www.daniel-azuma.com/blog/archives/164</link>
		<comments>http://www.daniel-azuma.com/blog/archives/164#comments</comments>
		<pubDate>Mon, 09 Jan 2012 08:04:32 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[Projections]]></category>
		<category><![CDATA[Rails]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=164</guid>
		<description><![CDATA[This week we&#8217;re going to look at how to choose a coordinate system for your database. In PostGIS, this includes the choice of geometry vs geography columns, as well as which projection (if any) to use, and how to interact &#8230; <a href="http://www.daniel-azuma.com/blog/archives/164">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This week we&#8217;re going to look at how to choose a coordinate system for your database. In PostGIS, this includes the choice of geometry vs geography columns, as well as which projection (if any) to use, and how to interact with it from Rails.</p>
<p>In this article, we&#8217;ll:</p>
<ul>
<li>Review geographic and projected coordinate systems</li>
<li>Discuss the pros and cons of using the PostGIS geographic type</li>
<li>See why I typically store data in a projection</li>
<li>Look at some specific projections I recommend using (or avoiding)</li>
<li>Learn how to handle projected data in Rails</li>
</ul>
<p>My original series plan for this week called for a worked example of a location-based web service, bringing together much of the material that we&#8217;ve covered so far. But as I was writing it, I realized there was one more topic we probably ought to cover first. So I&#8217;ll publish the example next week.</p>
<p>This is part 7 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-164"></span>A tale of two coordinate systems</h2>
<p>In <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, we took a first look at coordinate systems. We saw that coordinate systems are different ways of assigning <em>meaning</em> to coordinate values. Or, put another way, any particular meaning (such as a location) can be described in multiple ways. Each of those ways would use a different set of values, according to a different coordinate system.</p>
<p>Locations on the earth&#8217;s surface are typically specified using one of two general types of coordinate systems: <em>geographic</em> coordinate systems and <em>projected</em> coordinate systems. Geographic coordinate systems usually use some notion of latitude and longitude, measuring angles along the surface of the earth. They are also embedded in a curved domain. What this means is, you can&#8217;t technically show latitude and longitude on a flat piece of paper or computer screen. Objects described in latitude and longitude are always curved like the surface of the earth; distances measured between latitudes and longitudes are always measured along a curved surface.</p>
<p>Projected coordinate systems are formed by &#8220;flattening&#8221; the earth&#8217;s surface into a flat domain. Coordinates in a projected system are not in latitude and longitude. They do not measure angles. Instead, they measure distance and position along that flattened surface. Because of this, the actual coordinate values in a projection may not be immediately recognizable. However, the benefit is that objects in a projected coordinate system are flat, so you can draw them on a flat piece of paper or computer screen, and you can perform analysis and calculations the way you are used to used to from your high school geometry class.</p>
<p>Here are two sets of coordinates for the Space Needle in Seattle. The first uses a geographic coordinate system, and the values are the familiar longitude and latitude. The second, called &#8220;NAD83 / Washington North&#8221;, is the <em>state plane</em> projected coordinate system for northern Washington state. The coordinates in this projection may not be immediately recognizable, but it points to the same location.</p>
<pre>POINT(-122.34978 47.620578)  -- geographic
POINT(1266457.58 230052.50)  -- projected</pre>
<p>In the beginning of <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, we looked at some of the ramifications of using different coordinate systems. They can drastically change the way that objects are shaped or computations are done. Now we&#8217;ll look at some practical advice regarding choosing coordinate systems to use.</p>
<h2>The PostGIS geographic type</h2>
<p>The <a title="PostGIS geospatial database" href="http://www.postgis.org/" target="_blank">PostGIS</a> database provides two different types of spatial columns: geometric and geographic. We saw in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a> that we can specify which type to use in our Rails migrations, through the use of the <code>:geographic</code> modifier:</p>
<pre>class CreateLocations &lt; ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :latlon, :geographic =&gt; true
      t.timestamps
    end
  end
end</pre>
<p>Geographic columns use a geographic coordinate system (latitude and longitude on a curved domain). Geometric columns use a projected coordinate system (on a flat domain). But which should you use for your application? To answer this question, we need to unpack what the coordinate system differences mean in the context of PostGIS.</p>
<p>Let&#8217;s start with the obvious. Geographic types use units of latitude and longitude. Since these are familiar concepts, we can put them directly into the database and pull them out for display without having to perform any transformations on the values. This makes the geographic type very convenient for many simple applications.</p>
<p>Second, the shape of lines and polygons in geographic columns will follow the curvature of the earth. We saw a dramatic demonstration of this in the beginning of <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>: a &#8220;straight line&#8221; from San Francisco to Athens passes over Iceland in a geographic coordinate system, even though Iceland is far to the north of either endpoint.</p>
<p>Third, as a corollary to the previous point, geographic coordinates for the most part let you ignore seams and singularities. Take a short line segment from <code>POINT(179 0)</code> to <code>POINT(-179 0)</code>. On the globe, in a geographic coordinate system, this is a short line that crosses the International Date Line. Projections, in contrast, have to flatten the earth, and in order to do so, they have to &#8220;cut&#8221; the globe someplace. This cut becomes the edge of the map. Many projections perform this cut along the Date Line. Hence, if we take our two points on either side of the Date Line, and draw a line segment between then in such a projection, that line would run the other way, crossing most of the world.</p>
<div id="attachment_167" class="wp-caption aligncenter" style="width: 401px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_geographic.png"><img class="size-full wp-image-167" title="segment_geographic" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_geographic.png" alt="" width="391" height="387" /></a><p class="wp-caption-text">A line segment connecting two points on either side of the Date Line, in a geographic coordinate system.</p></div>
<div id="attachment_168" class="wp-caption aligncenter" style="width: 490px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_projected.png"><img class="size-full wp-image-168" title="segment_projected" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_projected.png" alt="" width="480" height="244" /></a><p class="wp-caption-text">A line segment connecting the same endpoints, in some projections, may cross the entire world.</p></div>
<p>Similarly, the north and south poles also cause problems for many projections. As a result, if you deal with objects that cross the Date Line or live near or especially surrounding the poles, you may have to deal with these (literal) edge cases specially. Generally, the geographic type lets you avoid having to think about these special cases because a globe has no edges.</p>
<p>Now the bad news. Computations across a curved surface are more complex than across a flat surface. Distance calculation, intersections, and so forth, will be slower on geographic types than on projections. In fact, some computations will not be available at all. In <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>, we considered an example &#8220;counties&#8221; table, in which we chose to use a projected coordinate system to store polygons. The reason I did that is that I wanted to cover <code>ST_Relate</code>, a function that PostGIS supports for geometric types but not geographic types.</p>
<p>Finally, geographic types are also subject to the model of the earth that you are using. The earth is actually not a perfect sphere, but is slightly flattened along its axis of rotation. In order to perform computations across a large area with a high degree of accuracy, you need to take that flattening into account. Unfortunately, the flattening makes the already complex computations maddeningly complex (and correspondingly slower). Because of this, PostGIS gives you the option of choosing whether to perform computations using the spherical or flattened shape, trading off speed for accuracy. Each function that supports geographic inputs performs the more accurate computations by default, but you can change it to use the faster spherical formulas by passing FALSE as an optional final parameter.</p>
<pre>ST_Distance(pt1, pt2)         -- Uses more accurate computation
ST_Distance(pt1, pt2, FALSE)  -- Uses faster spherical computation</pre>
<h2>A case for projections</h2>
<p>So which type should you use? There will be some cases when the decision is clear. If you need to perform computations across large sections of the globe, for example, you will usually want to use the geographic type. However, my experience has been that, for <em>most</em> use cases that you&#8217;re likely to encounter in a Rails application, you&#8217;ll get better results by choosing a reasonable projection.</p>
<p>Why do I say that?</p>
<p><strong>Spatial data storage should match its usage.</strong> This is, I think, the most important but most overlooked consideration. Often, your application will lend itself to particular projection based on what it <em>does</em> with the data, and it is almost always beneficial to structure your data storage accordingly. I know as engineers we often want to abstract our data representation from our application functionality. But you don&#8217;t always have that luxury with big data&#8212;whether you like it or not, you have to accommodate the resource and performance needs of your database. This goes double with geospatial data, because the queries and analysis can get quite expensive.</p>
<p>One very common application is simply the display of your database objects on a Google Map or similar visualization tool. In such an application, most of your queries might be of the form: <em>Give me all the objects that appear within this rectangle on a Google Map.</em> If your data is stored and queried in the same coordinate system as that used by Google Maps, then those rectangular map areas will translate directly into simple rectangular queries in your database. If, however, your database uses a geographic coordinate system or a different projection, your query may map into a distorted or non-rectangular area in your database&#8217;s coordinate system, resulting in more complex code and/or decreased performance.</p>
<p><strong>Many shapes are best represented in a (particular) projection.</strong> Let&#8217;s take a look at a shape that should be familiar to most readers, the outline of the United States:</p>
<div id="attachment_175" class="wp-caption aligncenter" style="width: 388px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_lambert2.png"><img class="size-full wp-image-175" title="us_lambert" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_lambert2.png" alt="" width="378" height="240" /></a><p class="wp-caption-text">The United States, in a Lambert Conformal Conic projection. (credit: http://csanet.org/newsletter/winter03/nlw0303.html)</p></div>
<p>Now, much of the northern border with Canada follows a line of latitude, the &#8220;49th parallel&#8221;. A straight line. Except, in the above image, it&#8217;s not straight; it&#8217;s curved slightly. This map is in a <em>Lambert Conformal Conic</em> projection, very commonly used for US national and state maps. To represent the northern border of the country in this projection, you would need a curved line (or, in practice, a bunch of short straight lines that together approximate a curved line.) But in some other projections&#8212;for example a <em>Mercator</em> projection&#8212;lines of latitude are straight, making the shape much easier and more efficient to represent.</p>
<div id="attachment_176" class="wp-caption aligncenter" style="width: 415px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_mercator1.png"><img class="size-full wp-image-176" title="us_mercator" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_mercator1.png" alt="" width="405" height="231" /></a><p class="wp-caption-text">The United States, in a Mercator projection. (credit: http://csanet.org/newsletter/winter03/nlw0303.html)</p></div>
<p>East-west and north-south lines in most political boundaries tend to follow lines of latitude and longitude, respectively, and so are best represented in a projection (such as Mercator) that preserves those lines as straight. Remember, most lines of latitude are <em>not</em> straight in a geographic coordinate system, so a geographic latitude-longitude coordinate system is <em>not</em> particularly well-suited for large political boundaries such as states and countries.</p>
<p><strong>Most data is hyperlocal</strong>. The geographic type&#8217;s advantages come to the foreground when you&#8217;re dealing with data spread over the entire globe, or when you need to deal with objects covering large areas or distances covering significant portions of the globe. However, in practice I&#8217;ve found there are very few applications like that. In most cases, you&#8217;ll be dealing with primarily point data, or if you do have line or polygonal data, the individual objects are small: streets, parcel boundaries, municipal and statistical boundaries, and so forth. Furthermore, in most cases, your data will be limited to a particular part of the world, or at least you&#8217;ll seldom need to handle data that crosses seams such as poles or the Date Line. So in practice, you seldom actually run into the problems that would be solved by using the geographic type.</p>
<p><strong>Performance does matter</strong>. Many operations gain a substantial performance improvement from using the PostGIS geometry type rather than the geography type. Furthermore, using geometry saves you from having to think about which functions are available and which are not.</p>
<h2>A projection to avoid and a projection to consider</h2>
<p>You might be tempted to store latitude and longitude in a geometry type column. That is, to set up your PostGIS column with a geometry type, but use SRID=4326 (which is the EPSG number for WGS 84 latitude and longitude).</p>
<p>Don&#8217;t do this.</p>
<p>I did this a few times in my naive youth, and it came back to bite me. What you&#8217;re really doing here is employing a particular projection called <em>Plate Carree</em>, which simply maps latitude and longitude directly to <em>x</em> and <em>y</em> on the plane. Remember, any time you use geometry rather than geography, you are working with a flat coordinate system, and thus a projection. You might think you&#8217;re working with latitude and longitude, but you&#8217;re actually not.</p>
<div id="attachment_173" class="wp-caption aligncenter" style="width: 490px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/plate_carree.png"><img class="size-full wp-image-173" title="plate_carree" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/plate_carree.png" alt="" width="480" height="238" /></a><p class="wp-caption-text">The Plate-Carree projection. (Credit: http://kartoweb.itc.nl/geometrics/Map%20projections/body.htm)</p></div>
<p>Plate Carree is not a particularly useful projection (except that it is trivial to compute). It doesn&#8217;t preserve distances, angles, directions, areas, or any other cartographically useful properties, and its distortion in polar regions is severe. In almost all cases, you can do much better with a different projection.</p>
<p>The projection I tend to recommend for many applications is Mercator. In particular, a minor variation on Mercator that is used by Google and Bing Maps:</p>
<div id="attachment_172" class="wp-caption aligncenter" style="width: 448px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/google_world.png"><img class="size-full wp-image-172" title="google_world" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/google_world.png" alt="" width="438" height="322" /></a><p class="wp-caption-text">The Google world map, a slight variation on a Mercator projection</p></div>
<p>This coordinate system has EPSG number 3785, and has a number of helpful properties.</p>
<ul>
<li>It&#8217;s used by <a title="Google maps" href="http://maps.google.com/" target="_blank">Google maps</a> and <a title="Bing maps" href="http://www.bing.com/maps/" target="_blank">Bing maps</a> (and possibly other mapping systems as well), so if you use those systems for visualization, you have a good match between your data storage and application.</li>
<li>It preserves angles and shapes locally. (In cartographic terms, it is <em>conformal</em>.) This means if you zoom into any part of the map, the shapes and aspect ratio will closely match the real shapes on the globe. This is, I think, the primary reason it is popular with mapping visualizations.</li>
<li>Lines of latitude and longitude are straight, so political boundaries tend to work well.</li>
<li>It&#8217;s relatively simple to compute.</li>
</ul>
<p>As with any projection, there will be times when this one is not appropriate. By now, you should have enough understanding to identify many of these cases. However, a few of the common objections you might encounter, are not as important as they sound, and I think I should say a few words about them.</p>
<p>You might hear people object to using EPSG 3785 on the grounds that it contains a simplification that introduces cartographic inaccuracies. (Specifically, it treats its underlying geography as a sphere rather than a flattened ellipsoid.) In most cases, this argument makes too much of too little. <em>All</em> projections rely on simplifications that introduce inaccuracies in one form or another. If your application is to bounce a laser across a continent, then by all means dig deep into the corrective factors. But for most web applications, 3785 should be more than sufficient. Indeed, the inaccuracies in most of the data you will gather, including GPS and geocoded data, will far outweigh most of what can be introduced by the projection.</p>
<p>You also might hear people object to using the Mercator projection at all, on the grounds that it gives a distorted picture of the nature of the world. Because the projection magnifies areas further from the Equator, it generates map images that appear to privilege richer countries in higher latitudes while downgrading the importance of poorer countries closer to the Equator. In 1989, a well-publicized resolution, signed by a number of prominent geographers, was published in <em>American Cartographer</em>, decrying the use of Mercator and similar rectangular projections for these and other reasons. This point is well-taken, and if you are displaying a full world map, I generally do not recommend Mercator if you can help it. However, here we are talking specifically about database structure, not visualization, so for our purposes I think the point is moot.</p>
<h2>Working with projected data in Rails</h2>
<p>So let&#8217;s see some code! I&#8217;ll demonstrate how to set up your PostGIS database to store data using EPSG 3785, and how to read and write data using ActiveRecord.</p>
<p>We&#8217;ll use our code from <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a> as a starting point. But now, in our migration, we no longer set <code>:geographic</code>, but instead use a geometric (flat) coordinate system with SRID = 3785, as follows. (We&#8217;ll also set up a spatial index, as we saw in <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>.)</p>
<pre>class CreateLocations &lt; ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :loc, :srid =&gt; 3785
      t.timestamps
    end
    change_table :locations do |t|
      t.index :loc, :spatial =&gt; true
    end
  end
end</pre>
<p>We also need to specify a corresponding factory in our ActiveRecord class. Here I&#8217;m going to introduce a rather dirty little feature of RGeo: &#8220;projected geographic&#8221; factories. Now, if you cringed a little at that description, then you&#8217;re getting the hang of coordinate systems. Geographic coordinate systems are by definition <em>not</em> projected! However, sometimes when you&#8217;re working with a projection, you&#8217;ll want a quick way to interact with the data in latitude and longitude&#8212;a quick way to transform individual points to geographic coordinates and back again. This is where RGeo&#8217;s projected geographic factories come in handy.</p>
<p>These factories really use a projected coordinate system under the hood. In fact, they reference a full Cartesian factory internally, and you can gain access to that &#8220;real&#8221; projected factory by calling the <code>projection_factory</code> method. However, they provide you with a convenience interface that lets you look at the data as latitudes and longitudes, as if it were a geographic factory.</p>
<p>The &#8220;simple_mercator&#8221; factory is a useful example. Its &#8220;real&#8221; internal factory has SRID 3785, indicating the Google Maps style Mercator projection, but the wrapper factory reports latitudes and longitudes. In this way, it mirrors the Google Maps Javascript API. It talks latitudes and longitudes on the outside, but converts them internally to the projection for use with the map.</p>
<p>In our ActiveRecord class, we&#8217;ll set up the factory so it correctly interacts with the database in projected coordinates.</p>
<pre>class Location &lt; ActiveRecord::Base

  # Create a simple mercator factory. This factory itself is
  # geographic (latitude-longitude) but it also contains a
  # companion projection factory that uses EPSG 3785.
  FACTORY = RGeo::Geographic.simple_mercator_factory

  # We're storing data in the database in the projection.
  # So data gotten straight from the "loc" attribute will be in
  # projected coordinates.
  set_rgeo_factory_for_column(:latlon, FACTORY.projection_factory)

  # To interact in projected coordinates, just use the "loc"
  # attribute directly.
  def loc_projected
    self.loc
  end
  def loc_projected=(value)
    self.loc = value
  end

  # To use geographic (lat/lon) coordinates, convert them using
  # the wrapper factory.
  def loc_geographic
    FACTORY.unproject(self.loc)
  end
  def loc_geographic=(value)
    self.loc = FACTORY.project(value)
  end

end</pre>
<p>Now let&#8217;s do an example query. Suppose our basic query is a simple map search where we want to return all the locations in a given rectangle on our map visualization. Since our data is in the same projection as the original map, a rectangular query in the map translates into a rectangular query in our database. So we&#8217;ll take the latitudes and longitudes of the rectangle edges as parameters, and convert them to projected coordinates. Once there, we can use a simple PostGIS box intersection to run the query itself. It&#8217;s a simple query that can be accelerated using the spatial index.</p>
<p>We&#8217;ll add a scope to our class as follows:</p>
<pre>class Location &lt; ActiveRecord::Base

  # ...

  # w,s,e,n are in latitude-longitude
  def self.in_rect(w, s, e, n)
    # Create lat-lon points, and then get the projections.
    sw = FACTORY.point(w, s).projection
    ne = FACTORY.point(e, n).projection
    # Now we can create a scope for this query.
    where("loc &amp;&amp; '#{sw.x},#{sw.y},#{ne.x},#{ne.y}'::box")
  end

end</pre>
<p>Now rectangle searches are simple:</p>
<pre>locations = Location.in_rect(-122, 47, -121, 48).all</pre>
<h2>Where to go from here</h2>
<p>In this article, we saw some of the pros and cons of using different coordinate systems for your database. The right coordinate system will depend on your application, but I&#8217;ve found that for many applications, using a projection&#8212;often the specific projection EPSG 3785&#8212;produces good results.</p>
<p>It may be useful at this point to gain a general feel for the different types of projections, how they work, and what their pros and cons are. A very good online resource for this is provided <a title="USGS introduction to map projections" href="http://egsc.usgs.gov/isb/pubs/MapProjections/projections.html" target="_blank">here</a> by the USGS.</p>
<p>The <code>RGeo::Geographic.simple_mercator_factory</code> is useful for storing data in EPSG 3785. However, if you want to use a different projection under the hood, you can use a more powerful method, <code>RGeo::Geographic.projected_factory</code>, which lets you specify arbitrary projections using Proj4. Read about it in the <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo documentation</a>.</p>
<p>Next time, I will get to the worked example I promised last week. Stay tuned, and let&#8217;s bring Rails down to earth!</p>
<p><em>This is part 7 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/164/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 6: Scaling Spatial Applications</title>
		<link>http://www.daniel-azuma.com/blog/archives/134</link>
		<comments>http://www.daniel-azuma.com/blog/archives/134#comments</comments>
		<pubDate>Mon, 02 Jan 2012 10:24:02 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=134</guid>
		<description><![CDATA[Scaling, scaling, scaling. Can Rails really scale? It&#8217;s been a source of FUD and the butt of running jokes. But scaling is a serious matter when it comes to large data sets, and it&#8217;s something we need to pay attention &#8230; <a href="http://www.daniel-azuma.com/blog/archives/134">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Scaling, scaling, scaling. Can Rails really scale? It&#8217;s been a source of FUD and the butt of <a title="Can Rails Scale - Joke" href="http://canrailsscale.com/" target="_blank">running jokes</a>. But scaling is a serious matter when it comes to large data sets, and it&#8217;s something we need to pay attention to in the geospatial realm where big data is commonplace.</p>
<p>In this week&#8217;s article, I&#8217;ll go over the basic issues every geospatial programmer should know about scaling, and provide tips for writing your geospatial Rails application so it doesn&#8217;t fall over when you go national. We will cover:</p>
<ul>
<li>The bottom line regarding scaling</li>
<li>Building spatial indexes for your database</li>
<li>Writing queries to take advantage of indexes</li>
<li>Simplification and segmentation of large objects</li>
</ul>
<p>This is part 6 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-134"></span>Scaling is complex but not difficult</h2>
<p>Scaling is a high-profile issue. We all notice when our blog gets slashdotted, and when Twitter or Amazon goes down it makes national news. Failures happen to everyone, and seem almost inevitable. Are they?</p>
<p>Now, I don&#8217;t want to downplay the complexity of the scaling task, but we do have to start with an important observation. Scaling, in its essence, is a solved problem. The techniques involved have been well-understood for decades. We all learned about logarithmic searching, branch and bound, and similar algorithms in our computer science classes. And as web developers, we should already know what these algorithms look like in practice: database indexes, sharding, caching, replication, load balancing, and so forth.</p>
<p>So why the hoopla?</p>
<p>Because scaling, though a solved problem, is not an <em>automatically</em> solved problem. It requires our attention. More to the point, it requires that we understand every aspect of the system we are building, how the various components work and how they interact. If we need our database to scale, at some level we need to understand how it works, how we&#8217;re using it, and thus what we need to <em>do</em> to make it scale.</p>
<p>This is probably the main reason why Rails has historically had a negative reputation about scaling. Rails purports to make web development simple. But that&#8217;s a bit misleading. If you think about it, a website is a complex system with a lot of moving parts: the network, the server, the MVC control flow, databases, caches, client-side code, control flow from one page to another, interactions with external systems, security layers&#8230;, and that doesn&#8217;t even include your application logic with all of its complexities. It&#8217;s hardly simple. Rails tries to deal with this complexity by hiding elements, at least partly. It hides the database behind ActiveRecord, and in doing so trains us not to think about the database. But that&#8217;s a deceit. We have to think about the database if we&#8217;re going to scale it.</p>
<p>And that is one of the key motivations behind this entire series. I&#8217;ve heard some comments that this material is difficult, that there&#8217;s no <em>TL;DR</em>. Yes, the material is difficult, and I&#8217;m not going to try to hide or gloss over that fact. I could try to make geospatial programming really easy, creating a one-size-fits-all tool or recipe for everyone to cargo-cult. But that is the wide road that leads to destruction. Eventually, you&#8217;ll need to figure out how to scale, and at that time, if you don&#8217;t have some understanding of what&#8217;s going on under the hood, you&#8217;ll get very stuck very quickly.</p>
<p>Now, that said, dealing with geospatial features does not fundamentally change the scaling task. Scaling is still a solved problem. As we prepare to scale our applications, there is a well-known, systematic process we all go through. We measure, find the bottlenecks, apply well-understood techniques to address those bottlenecks, and repeat. It can be a tedious process, and (believe me, I know) it is sometimes difficult to sell our business partners on the fact that we need to spend time on it before it&#8217;s too late. But it&#8217;s not like we don&#8217;t know what we&#8217;re doing. Scaling is complex, but not difficult. It simply requires that we have a general understanding of how spatial data works.</p>
<p>So, sermon over, let&#8217;s dive in.</p>
<h2>About spatial database indexes</h2>
<p>Spatial data is often big data, and as with any big data, our basic scaling task involves making it smaller.</p>
<p>In a database, this is generally accomplished by judicious use of <em>indexes</em>. An index provides a fast way to look up data by some criteria, without having to read and compare against every single row. For example, if your table has a million rows, each identified by a numeric ID, you can generally speed up ID lookups by creating an index on that column.</p>
<p>Similarly, <em>spatial database indexes</em> can accelerate queries that include spatial criteria. If your million-row table also contains latitude-longitude coordinate, and you want to find rows whose coordinate falls within a certain region, you should consider building a spatial index on the coordinate column. This allows your spatial search to avoid checking every row in the database, thus speeding up your queries.</p>
<p>The important thing to understand about spatial indexes is that although they are conceptually the same as &#8220;standard&#8221; database indexes, they are implemented differently under the hood. In most databases, a simple index on an ID column will use an algorithm known as a <em>B-tree</em>. Such an index relies on a global ordering of the data, and builds a balanced binary tree, which, as we remember from computer science, lets us do lookups in logarithmic time.</p>
<p>Spatial data, however, has some important differences from normal scalar data. A simple numeric ID is an infinitesimal point on a one-dimensional number line, whereas a polygon is a finite area on a two-dimensional surface. For data that covers finite areas or lives in more than one dimension, a B-tree does not work. We have to resort to a more complex algorithm, usually a variant on what is known as an <em>R-tree</em>.</p>
<p>I&#8217;ll save the gory details on R-trees for a later article on spatial index design, but there is one upshot you&#8217;ll need to understand: spatial indexes are heavier and more expensive than standard indexes. An R-tree takes up more space in memory and on disk than a similarly-sized B-tree. Queries against an R-tree can be a little slower than against a B-tree, and R-tree updates can be considerably slower. However, R-trees still provide logarithmic-time queries, and so will still give you speed-ups in many situations. So the usual database mantra still applies, and indeed goes double for spatial indexes: Index your common queries but don&#8217;t index everything. And of course, Measure, Measure, Measure.</p>
<p>Because R-tree updates can be slow, it is also usually a good idea to remove or disable a spatial index if you are going to be loading a lot of data, and then turn it back on once you are done. In this way, you pay the cost of building the index only once at the end, rather than having to incrementally update it on every insert.</p>
<h2>Creating and using spatial indexes</h2>
<p>Because a spatial index is constructed differently from most indexes, creating one usually requires a special syntax. For a Rails project, you can usually let RGeo&#8217;s ActiveRecord adapters handle this for you. Create a spatial index in a migration simply by providing the <code>:spatial</code> attribute. Following is a snippet from a migration that creates a &#8220;counties&#8221; table with polygons, along with a spatial index on the polygons. (Here we&#8217;ll use geometric column with the &#8220;NAD83 / Washington North&#8221; projection, which has SRID 2285&#8212;see <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>. If you&#8217;re using PostGIS, indexes do work for the geographic type, but have some limitations.)</p>
<pre>create_table :counties do |t|
  t.string :name
  t.polygon :poly, :srid =&gt; 2285
end
change_table :counties do |t|
  t.index :poly, :spatial =&gt; true
end</pre>
<p>If you are managing your schema manually, you&#8217;ll need to use the database&#8217;s particular syntax. In <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a>, spatial indexes use the GIST framework, so you denote a spatial index with &#8220;USING gist&#8221;:</p>
<pre>CREATE INDEX "counties_poly_idx" ON "counties" USING gist ("poly");</pre>
<p><a title="MySQL" href="http://dev.mysql.com/" target="_blank">MySQL</a>&#8216;s spatial extension defines a separate index type:</p>
<pre>CREATE SPATIAL INDEX `counties_poly_index` ON `counties` (`poly`);</pre>
<p>In <a title="SpatiaLite" href="http://www.gaia-gis.it/fossil/libspatialite/index" target="_blank">SpatiaLite</a>, a spatial index is actually a separate table that you must join to. Creating a spatial index involves calling a special function provided by the SpatiaLite library:</p>
<pre>SELECT CreateSpatialIndex('counties', 'poly');</pre>
<p>Once you&#8217;ve created a spatial index, it is usually a good idea to verify that your queries will take advantage of it. You&#8217;ll want to make sure the database&#8217;s <em>query planner</em>, the component that analyzes a query and decides how to attack it, is producing an optimal plan. This is generally good practice for all your database design, but more so for spatial queries because they are less commonly used, and query planners do not always do as good a job with them as we would like.</p>
<p>Your best tool for interacting with the query planner, whether or not you&#8217;re using a spatial database, is <code>EXPLAIN</code>. This SQL command takes a query and returns the query planner&#8217;s plan of attack for that query, usually including which indexes it intends to use and its estimate of how expensive the query will be.</p>
<p>For most databases, you can invoke the EXPLAIN command simply by prefixing your query with &#8220;<code>EXPLAIN</code>&#8220;. For example, using PostGIS, let&#8217;s see what the query planner does with a query asking for the county containing the Seattle Space Needle:</p>
<pre>EXPLAIN
  SELECT "name" FROM "counties" WHERE
    ST_Intersects("poly", ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)'));</pre>
<p>Postgres will return a query plan that looks something like this:</p>
<pre>QUERY PLAN
----------------------------------------------------------------------------------------------
Index Scan using counties_poly_idx on counties (cost=0.00..8.52 rows=1 width=68)
  Index Cond: (poly &amp;&amp; '0101000020ED08000048E17A94195333410000000024150C41'::geometry)
  Filter: _st_intersects(poly, '0101000020ED08000048E17A94195333410000000024150C41'::geometry)</pre>
<p>Note that it&#8217;s using the &#8220;<code>counties_poly_idx</code>&#8221; index that we created. PostGIS is currently quite good about knowing how to use a spatial index for most queries. With the EXPLAIN command, we can be sure this query will use our spatial index for maximum efficiency.</p>
<h2>Optimizing difficult queries in PostGIS</h2>
<p>Unfortunately, there are a few cases when the query planner won&#8217;t be able to figure out by itself that an index is useful. For example, suppose we want to perform a sanity check of our counties database, making sure we don&#8217;t have any overlapping polygons. More precisely, while we expect that county polygons will &#8220;touch&#8221;&#8212;that is, share boundaries&#8212;we don&#8217;t want counties to actually share <em>interior</em> points. That could mean a problem in our data.</p>
<p><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/county_boundaries.png"><img class="aligncenter size-full wp-image-154" title="county_boundaries" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/county_boundaries.png" alt="" width="475" height="273" /></a></p>
<p>Unfortunately, PostGIS doesn&#8217;t provide this kind of &#8220;interior intersection&#8221; function out of the box. The <code>ST_Intersects</code> function we used earlier will flag the &#8220;touch&#8221; case as well, and we don&#8217;t want that.</p>
<p>But we <em>can</em> build &#8220;interior intersection&#8221; using the function <code>ST_Relate</code>. This powerful function lets you test a wide variety of relationships using the <em>Dimensionally Extended Nine-Intersection Model</em>. I won&#8217;t cover this model in detail for now&#8212;you can read about it in the <a title="OGC Simple Feature Access spec" href="http://www.opengeospatial.org/standards/sfa" target="_blank">Simple Features Spec</a>. For our purposes, what&#8217;s important is that, by giving it a particular specification string, &#8220;<code>T********</code>&#8220;, it can implement the relationship we want to test.</p>
<p>Unfortunately, because <code>ST_Relate</code> is such a powerful and general tool, the query planner can&#8217;t optimize it very well, and tends to fall back on the lowest common denominator, which is sequential scan.</p>
<pre>EXPLAIN
  SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
    ON c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<pre>QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=0.00..10372.17 rows=229633 width=64)
  Join Filter: st_relate(c1.poly, c2.poly, 'T********'::text)
  -&gt; Seq Scan on counties c1 (cost=0.00..18.30 rows=830 width=68)
  -&gt; Materialize (cost=0.00..22.45 rows=830 width=68)
       -&gt; Seq Scan on counties c2 (cost=0.00..18.30 rows=830 width=64)</pre>
<p>Ouch! That&#8217;s an unfortunate query plan. It does nested sequential scans, comparing every county polygon with every other county polygon, an <em>n-squared</em> operation. In a table with thousands of counties, this can be slow.</p>
<p>But it turns out we can do better. The &#8220;interior intersection&#8221; operation actually <em>can</em> be optimized using the index. The query planner doesn&#8217;t realize this, so we need to give it some help.</p>
<p>Here a bit of trivia about spatial indexes will help us. In general, the &#8220;native&#8221; operation for an R-tree index is <em>bounding box intersection</em>. It can take the bounding box of an input geometry, and determine which geometries in the table have bounding boxes that intersect the input. At the most basic level, the query planner for PostGIS works by looking for opportunities to apply this native operation. It asks, &#8220;How can I reduce the search space by applying a bounding box intersection?&#8221;</p>
<p>In our first example above, when we used <code>ST_Intersects</code> to find the polygon containing the Space Needle, the query planner reasoned thus: <em>Every time two geometries intersect, their bounding boxes also intersect. So I can add a bounding box intersection and not change the result of the query. I like bounding box intersections because they let me use the index.</em> So actually what happened behind the scenes, was that PostGIS rewrote our query from:</p>
<pre>SELECT "name" FROM "counties" WHERE
  ST_Intersects("poly", ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)'));</pre>
<p>to:</p>
<pre>SELECT "name" FROM "counties" WHERE
  "poly" &amp;&amp; ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)') AND
  ST_Intersects("poly", ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)'));</pre>
<p>&#8230;using the PostgreSQL operator for bounding box intersection: &#8220;<code>&amp;&amp;</code>&#8220;. Now, when it creates the actual query plan, it uses the spatial index to optimize the bounding box intersection. Let&#8217;s take another look at that query plan:</p>
<pre>QUERY PLAN
----------------------------------------------------------------------------------------------
Index Scan using counties_poly_index on counties (cost=0.00..8.52 rows=1 width=68)
  Index Cond: (poly &amp;&amp; '0101000020ED08000048E17A94195333410000000024150C41'::geometry)
  Filter: _st_intersects(poly, '0101000020ED08000048E17A94195333410000000024150C41'::geometry)</pre>
<p>See the bounding box intersection &#8220;&amp;&amp;&#8221; in the Index Condition? That wasn&#8217;t in our original query, but PostGIS rewrote our query and put it there so it could use the index. Pretty clever, PostGIS is.</p>
<p>Well, sometimes PostGIS isn&#8217;t quite clever enough, and we have to give it some help. In our &#8220;interior intersection&#8221; example, we can improve the query plan by going through this process manually. We reason thus: <em>PostGIS doesn&#8217;t realize this, but every time two geometries have an &#8220;interior intersection&#8221; using ST_Relate, their bounding boxes also intersect. So I can add a bounding box intersection and not change the result of the query. Bounding box intersections are good because they let me use the index.</em></p>
<p>So let&#8217;s manually rewrite our query from:</p>
<pre>SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
  ON c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<p>to:</p>
<pre>SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
  ON c1.poly &amp;&amp; c2.poly AND
  c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<p>Now we give this to PostGIS, and <em>voila</em>! The query planner now uses the index:</p>
<pre>EXPLAIN
  SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
    ON c1.poly &amp;&amp; c2.poly AND
    c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<pre>QUERY PLAN
----------------------------------------------------------------------------------------
Nested Loop (cost=0.00..296.81 rows=1 width=64)
  Join Filter: st_relate(c1.poly, c2.poly, 'T********'::text)
  -&gt; Seq Scan on counties c1 (cost=0.00..18.30 rows=830 width=68)
  -&gt; Index Scan using counties_poly_idx on counties c2 (cost=0.00..0.32 rows=1 width=68)
       Index Cond: (c1.poly &amp;&amp; c2.poly)</pre>
<p>This improved plan still does one full sequential scan, because it still has to check every county in the database. But the nested scan, which checks whether that county overlaps any other county, is now accelerated using the index. We&#8217;ve reduced the <em>n-squared</em> query to an <em>n log n</em> query. The computer scientist in us rejoices!</p>
<p>Going through this process does require some creativity, and it helps to have a bit of experience. The good news is that PostGIS is smart enough to handle most cases automatically. But you should still make liberal use of the EXPLAIN tool and look carefully at the query plan that is generated, to see if it&#8217;s doing as well as you think it ought. There may be opportunities to improve your query performance dramatically just by giving it a little bit of help.</p>
<h2>Indexing and queries in MySQL and SpatiaLite</h2>
<p>Generally, I recommend <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> as an open source spatial database. But there are others out there that you may need to use from time to time, and each one will have its quirks.</p>
<p>As we&#8217;ve seen, the spatial extensions to <a title="MySQL" href="http://dev.mysql.com" target="_blank">MySQL</a> do support spatial indexes. However, there are some significant limitations in comparison with PostGIS.</p>
<p>First, spatial indexes currently work only on MyISAM tables. This means you can&#8217;t use spatial indexes and get the transaction safety benefits of InnoDB on the same table. Ugh.</p>
<p>Second, MySQL supports only a very limited set of spatial relationship functions. In particular, nearly all MySQL&#8217;s functions work on bounding boxes (which MySQL calls <em>Minimum Bounding Rectangles</em>, or <em>MBR</em>) rather than the geometry itself. So for example, MySQL&#8217;s <code>Intersects</code> function is actually only an alias to <code>MBRIntersects</code>, which tests the bounding boxes for intersection. If you want to test actual geometric intersection, you&#8217;ll have to do some post-filtering on the result set (which you can do using <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a>).</p>
<p>I generally don&#8217;t recommend using MySQL Spatial unless you&#8217;re already using MySQL. But then I typically don&#8217;t recommend using MySQL in general either&#8230;</p>
<p><a title="Spatialite" href="http://www.gaia-gis.it/fossil/libspatialite/index" target="_blank">SpatiaLite</a> is a set of spatial extensions to the popular <a title="SQLite" href="http://www.sqlite.org/" target="_blank">SQLite</a> database. I haven&#8217;t used SpatiaLite much. It does seem to have a fairly complete feature set, at least in comparison with MySQL Spatial, though it doesn&#8217;t compare in maturity with PostGIS.</p>
<p>Spatial indexes in SpatiaLite are a bit of a pain, however. They are implemented as a separate set of managed join tables tied to your main table using triggers. All this is handled fairly transparently, except for queries. When you want to write a query that takes advantage of a spatial index in SpatiaLite, you must explicitly join to the index table.</p>
<p>For the sake of space, I won&#8217;t go into the details here. Instead, I highly recommend an excellent online publication by the author of SpatiaLite, the <a title="SpatiaLite Cookbook" href="http://www.gaia-gis.it/gaia-sins/spatialite-cookbook/index.html" target="_blank">SpatiaLite Cookbook</a>, which serves as the user&#8217;s manual for SpatiaLite, and provides a number of very helpful examples.</p>
<h2>Simplifying and segmenting data</h2>
<p>But wait&#8212;there&#8217;s more!</p>
<p>Remember that the basic scaling task is to make big data smaller. If we have big data, we try to do clever things, such as applying indexes, so that we don&#8217;t have to analyze all the data at once.</p>
<p>Now, there are two ways in which spatial data can be &#8220;big&#8221;. First, there may be a lot of objects, lots of rows. In this case, we can often speed up queries by adding a spatial index, as we have seen.</p>
<p>However, individual objects can also be &#8220;big&#8221;, particularly when you&#8217;re dealing with polygons. Take our table of county polygons. Some county boundaries are simple polygons with just a few sides, but many others have complex, crinkly boundaries that follow rivers, coastlines, mountain divides, or other natural features. The number of sides in such polygons can quickly rise into the thousands or more. When you want to compute, say, an intersection with such a polygon, it can be slow.</p>
<p>There are several different strategies you can use to address this problem. I&#8217;m just going to summarize a couple of the important ones here. But first I need to emphasize one thing. There is no one-size-fits-all solution. Each approach has its pros and cons, and your choice will depend on the requirements of your particular application.</p>
<p>To this end, <em>measurement</em> is absolutely critical. Before, during, and after applying any optimization technique, run a benchmark and make sure that (1) you&#8217;re addressing the right problem, and (2) the performance is going in the right direction. This is doubly important when dealing with spatial data, because the algorithms involved are somewhat more complex, and it may surprise you what&#8217;s fast and what&#8217;s slow.</p>
<p>There are two general techniques for dealing with large polygons: <em>simplification</em> and <em>segmentation</em>.</p>
<p><strong>Simplification</strong> can be applied when you&#8217;re more concerned with speed than accuracy. For example, you might have a polygon with a thousand sides, but if you&#8217;re going to be displaying it in a relatively small area on a map, or you&#8217;re running some spatial queries where you don&#8217;t care too much if you&#8217;re a little off, then you can probably get away with an approximation of the polygon with fewer sides.</p>
<div id="attachment_139" class="wp-caption aligncenter" style="width: 532px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/simplification.png"><img class=" wp-image-139 " title="simplification" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/simplification.png" alt="" width="522" height="400" /></a><p class="wp-caption-text">An example of polygon simplification for part of the coast of France. (Credit: http://vis4.net/blog/posts/rendering_country_maps/)</p></div>
<p>There are a number of polygon simplification techniques out there, useful for different circumstances. I don&#8217;t have space here for a full discussion, but I may write a specialized article on simplification at a later time, because it&#8217;s an interesting (and sometimes tricky) problem.</p>
<p><strong>Segmentation</strong> is often useful for speeding up queries against big polygons, when you care not about the shape of the polygon itself but merely whether you&#8217;re intersecting it. Segmentation, for example, might be useful in our county boundary example.</p>
<p>The idea is to break up large polygons into smaller polygons that can be stored in separate rows in your database. That is, we trade &#8220;width&#8221; of the data (i.e. how big each object is, in terms of number of vertices) for &#8220;length&#8221; (i.e. how many objects there are). Since we have spatial indexes that mitigate big &#8220;length&#8221;, this trade-off can be a win for us.</p>
<p>There are many ways to break up a large polygon. The simplest approach, usually good enough in practice, is to perform <em>recursive four-to-one subdivision</em>. Don&#8217;t be scared off by the name; it&#8217;s actually quite straightforward. The idea is to take your large polygon with many sides, and split it down the middle horizontally and vertically. This will typically result in four polygons, each covering about a quarter of the area and containing about a quarter of the number of sides:</p>
<div id="attachment_158" class="wp-caption aligncenter" style="width: 498px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/split.png"><img class="size-full wp-image-158" title="split" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/split.png" alt="" width="488" height="168" /></a><p class="wp-caption-text">One four-to-one split of a province in France.</p></div>
<p>&nbsp;</p>
<p>Now, if any of the four polygons still has too many sides, you can do the same thing again, recursively, and so forth until you reach a number of sides that you&#8217;re comfortable with. Once you&#8217;re done, the <em>union</em> of all the resulting polygons will still be your original polygon. So, in our counties example, a county now <code>has_many</code> polygons, and to find the county containing a particular point, do a spatial query for the polygon containing that point and map back to the county.</p>
<p>So how deeply should you segment a polygon? What&#8217;s the &#8220;sweet spot&#8221; in the number of sides? That&#8217;s where you have to test and measure, because it will depend on many factors. In one recent project in which I did some polygon segmentation, I measured the optimal number of sides between three and five hundred. But your mileage will vary.</p>
<h2>Where to go from here</h2>
<p>It is worth diving into the manual for your spatial database for tips on the effective use of spatial indexes. The PostGIS manual is <a title="PostGIS documentation" href="http://www.postgis.org/documentation" target="_blank">online</a>.</p>
<p>EXPLAIN is a very powerful tool for studying and optimizing your database performance in general, not only when you&#8217;re working with spatial data. I highly recommend getting familiar with using it in your database. For PostgreSQL, a good place to start is the <a title="Using Explain in PostgreSQL" href="http://www.postgresql.org/docs/current/static/using-explain.html" target="_blank">Using Explain</a> page in the manual. SQLite and MySQL also have sections in their manuals. Additionally, it looks like Rails 3.2 will <a title="Explain in Rails 3.2" href="http://weblog.rubyonrails.org/2011/12/6/what-s-new-in-edge-rails-explain" target="_blank">include</a> some useful EXPLAIN-based tools out of the box.</p>
<p>This week&#8217;s article didn&#8217;t include a lot of code. This was because I had a lot of material to get through, and I decided it was more important to cover the concepts at this stage, rather than encourage code cargo-culting. In next week&#8217;s article, however, I&#8217;ll go through a fully worked example, with code, that mirrors an actual task I recently had to do for my job at <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>.</p>
<p>Until then, have fun and let&#8217;s bring Rails down to earth!</p>
<p><em>This is part 6 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/134/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>RGeo 0.3.3 Released</title>
		<link>http://www.daniel-azuma.com/blog/archives/132</link>
		<comments>http://www.daniel-azuma.com/blog/archives/132#comments</comments>
		<pubDate>Wed, 21 Dec 2011 08:05:16 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=132</guid>
		<description><![CDATA[I&#8217;ve released RGeo 0.3.3. This is a bug fix release with several important fixes, and upgrading is highly recommended. Changes include: The WKRep WKT parser recognizes MultiPoint WKTs in which individual points are not contained in parens. This syntax is technically &#8230; <a href="http://www.daniel-azuma.com/blog/archives/132">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve released <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> 0.3.3. This is a bug fix release with several important fixes, and upgrading is highly recommended.</p>
<p>Changes include:</p>
<ul>
<li>The WKRep WKT parser recognizes MultiPoint WKTs in which individual points are not contained in parens. This syntax is technically incorrect, but we are now supporting it because there was some ambiguity due to an error in early versions of the spec, and apparently there are now examples in the wild. (Reported by J Smith.)</li>
<li>The Geos CAPI implementation sometimes returned the wrong result from <code>GeometryCollection#geometry_n</code>. Fixed.</li>
<li>Fixed a hang when validating certain projected linestrings. (Patch contributed by Toby Rahilly.)</li>
<li>Several rdoc updates (including a contribution by Andy Allan).</li>
<li>Separated declarations and code in the C extensions to avert warnings on some compilers.</li>
</ul>
<p>RGeo is a spatial data library for Ruby, providing full implementations of the standard spatial data types. It is the basis for a suite of useful gems for writing geospatial applications in Ruby and Rails. For more information, see the documentation at <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">http://virtuoso.rubyforge.org/rgeo/README_rdoc.html</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/132/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 5: Spatial Data Formats</title>
		<link>http://www.daniel-azuma.com/blog/archives/125</link>
		<comments>http://www.daniel-azuma.com/blog/archives/125#comments</comments>
		<pubDate>Mon, 19 Dec 2011 09:36:32 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[GeoJSON]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[RGeo]]></category>
		<category><![CDATA[Shapefile]]></category>
		<category><![CDATA[WKB]]></category>
		<category><![CDATA[WKT]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=125</guid>
		<description><![CDATA[The location revolution is a revolution of data. Ubiquitous data, from mobile GPS and user input as well as from census and other datasets, is what makes location-aware applications possible. And so the first task of many geospatial projects is &#8230; <a href="http://www.daniel-azuma.com/blog/archives/125">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The location revolution is a revolution of data. Ubiquitous data, from mobile GPS and user input as well as from census and other datasets, is what makes location-aware applications possible. And so the first task of many geospatial projects is to determine how to find and utilize (and, in some cases, produce) external data.</p>
<p>In this article, we will survey some of the important spatial data formats, including serialization, file formats, and api-oriented formats. Specifically, we will look at:</p>
<ul>
<li>Basic serialization using WKT and WKB</li>
<li>Variants on WKT and WKB</li>
<li>Reading public datasets from shapefiles</li>
<li>Web service oriented formats such as GeoJSON</li>
<li>XML-based formats commonly used in web services</li>
</ul>
<p>We will also go over a few quick examples using Ruby and <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a>. This will be a fairly high-level overview and we won&#8217;t go into a lot of detail. We&#8217;ll take deeper looks at some of these formats in future articles.</p>
<p>This is part 5 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-125"></span>The standard OGC serialization formats</h2>
<p>If, after reading <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>, you looked through the Simple Feature interfaces (or the corresponding RGeo interfaces), you may have noticed two serialization methods provided for geometries: <code>as_text</code> and <code>as_binary</code>. These methods respectively output the &#8220;Well-Known Text&#8221; and &#8220;Well-Known Binary&#8221; representations of the geometry. These two standard serialization formats are defined by the <a title="Simple Features Spec" href="http://www.opengeospatial.org/standards/sfa" target="_blank">OGC Simple Feature Access specification</a>, and commonly supported by most GIS systems.</p>
<p>Well-Known Text (often abbreviated WKT) is a human-readable and parseable text-based format for all geometry objects. You can read the exact format specification in the Simple Features Spec, but a few examples are probably sufficient to get the general hang of it.</p>
<pre>Point(-122.1 47.2)
LineString(2 4, 5 4, 5 8, 2 4)
Polygon((0 0, 5 0, 5 5, 0 5, 0 0), (2 2, 2 3, 3 3, 3 2, 2 2))
MultiPoint((-122.1 47.2), (-93.5 39.4))
GeometryCollection(Point(3 5), LineString(-2 0, -3 -4))
MultiLineString EMPTY</pre>
<p>Don&#8217;t confuse the simple features WKT format with the coordinate system WKT format we covered in <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>. Unfortunately, both are commonly known as Well-Known Text (WKT), but they are distinct formats: one represents geometric objects whereas the other represents coordinate systems.</p>
<p>Well-Known Binary (often abbreviated WKB) is a binary format that uses numeric codes and IEEE floating-point representations. It is not human-readable but is much more compact than WKT.</p>
<p>Using RGeo, you can obtain the WKT and WKB representations of a geometric object by calling <code>as_text</code> and <code>as_binary</code>, respectively. Factory objects will provide methods to parse WKT and WKB format and recover the geometric object.</p>
<pre>point = factory.point(1, 2)
wkt = point.as_text   # =&gt; "Point(1 2)"
point2 = factory.parse_wkt(wkt)
point == point2       # =&gt; true</pre>
<h2>Variants on WKT and WKB</h2>
<p>As simple and well-supported as they are, WKT an WKB have several important weaknesses that have caused headaches for spatial databases and applications. In <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, we saw that to properly interpret a geometric object, you need to know the coordinate system, which is usually specified by a spatial reference ID (SRID). Unfortunately, neither WKT nor WKB include a way to represent SRID. They expect SRID to be specified or implied elsewhere, which is sometimes but not always true.</p>
<p>Furthermore, some applications use additional coordinates in their geometric data. Applications that store altitude or other third-dimensional data may include a &#8220;Z&#8221; coordinate in their geometries. Other applications may include a measurement (such as temperature or population) stored in an &#8220;M&#8221; coordinate. Version 1.1 of the Simple Features Spec (and the corresponding WKT and WKB specifications) do not directly support these extra coordinates, (although version 1.2 does address this, as we will see.)</p>
<p>Finally, neither WKT nor WKB by themselves provide a way to associate metadata, such as object names or other properties, with geometric objects. This limits their usefulness as a complete format for data transmission.</p>
<p>Because of these limitations, several variants have appeared that you should be aware of. The <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> database supports an extension to WKT called &#8220;EWKT&#8221;, which supports SRID as well as &#8220;Z&#8221; and &#8220;M&#8221; coordinates. The SRID, if present, appears at the front of the EWKT string:</p>
<pre>SRID=4326;Point(-122.34978 47.62058)</pre>
<p>EWKT supporting &#8220;Z&#8221; and &#8220;M&#8221; coordinates to be appended to each pair of coordinates as third and fourth coordinate values. When both &#8220;Z&#8221; and &#8220;M&#8221; are present (i.e. four coordinate values per point), the third coordinate is used for &#8220;Z&#8221; while the fourth is used for &#8220;M&#8221;. If only one is used (i.e. three coordinate values per point), you must specify whether it is &#8220;Z&#8221; or &#8220;M&#8221;. Here are some examples:</p>
<pre>Point(-122.34978 47.62057 20.0 -3)  # X,Y,Z,M in EWKT
PointM(-122.34978 47.62057 -3)      # X,Y,M
PointZ(-122.34978 47.62057 20.0)    # X,Y,Z</pre>
<p>PostGIS also defines a corresponding &#8220;EWKB&#8221; format with appropriate extensions to the binary format to support SRID as well as Z and M. EWKB is (or at least appears to be) the native internal format used by PostGIS to represent geometric data.</p>
<p>More recent versions of the OGC Simple Features Spec (version 1.2 and later) also provide support for Z and M. However, beware that the OGC format is not the same as the PostGIS EWKT and EWKB. The WKT update expects a space between the geometry type and the Z/M specifier, and it also requires the modifier in the &#8220;four-dimensional&#8221; ZM case:</p>
<pre>Point ZM(-122.34978 47.62057 20.0 -3)  # X,Y,Z,M in WKT 1.2
Point M(-122.34978 47.62057 -3)        # X,Y,M
Point Z(-122.34978 47.62057 20.0)      # X,Y,Z</pre>
<p>Furthermore, the updated WKT format still does not support a SRID. The updated WKB similarly supports Z and M (but not SRID), but uses different binary codes than those used by EWKB. Hence, these two extensions are not fully compatible with each other.</p>
<p>Because of this fragmentation, neither of these extensions are, in practice, used frequently for long-term serialization. However, you will likely need to work with EWKT at some point if you use PostGIS, so it is important to be familiar with it.</p>
<p>RGeo provides support for parsing and generating both variants in the <code>RGeo::WKRep</code> module. See the <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">rdocs</a> for more details. Here&#8217;s a really quick code example as a starting point:</p>
<pre>parser = RGeo::WKRep::WKTParser.new(nil, :support_ewkt =&gt; true)
point = parser.parse('SRID=4326;Point(-122.1 47.3)')
point.srid   # =&gt; 4326</pre>
<h2>Shapefiles and public datasets</h2>
<p>Location is driven by data, and a lot of the data you will need to work with will likely come in the form of <em>shapefiles</em>. The shapefile is a flat file format for geospatial data originally developed by <a title="ESRI" href="http://www.esri.com/" target="_blank">ESRI</a> for storing sets of geographic features. It supports certain vector shapes&#8211; points, lines, and polygons&#8211; along with associated attributes. Although shapefile began as a proprietary format, the format specification is readily available, and it is now a <em>de facto</em> standard for large datasets, including those provided by government agencies such as the <a title="Census" href="http://www.census.gov/" target="_blank">US Census Bureau</a>.</p>
<p>A shapefile actually consists of three (and sometimes more) related files, each with the same base filename but different extensions. The main file has the extension &#8220;<code>.shp</code>&#8221; and contains the geometric data itself in a binary format. An auxiliary &#8220;<code>.shx</code>&#8221; file provides a simple flat index allowing random access into the shapefile. A second auxiliary &#8220;<code>.dbf</code>&#8221; file provides the attribute data in <a title="dBASE" href="http://dbase.com/" target="_blank">dBASE</a> format. All shapefiles should have those three core files, although some shapefiles may include additional files containing coordinate system, spatial index, or other related information.</p>
<p>Most Rails applications will not read a shapefile directly, but will instead transfer the data to a spatial database such as PostGIS for rapid query and data retrieval. In Ruby, you can use the <a title="rgeo-shapefile gem" href="http://virtuoso.rubyforge.org/rgeo-shapefile/README_rdoc.html" target="_blank">rgeo-shapefile</a> gem to help with this task. This gem does the heavy lifting involved with parsing and analyzing a shapefile, and exposes the data to you as RGeo geometric objects. You should also install the <a title="DBF gem" href="https://rubygems.org/gems/dbf" target="_blank">dbf</a> gem, which lets you read the dBASE attributes in the shapefile.</p>
<pre>% gem install rgeo-shapefile
% gem install dbf</pre>
<p>Once you have the gems installed, and you&#8217;ve downloaded and unpacked a shapefile, use the <code>RGeo::Shapefile::Reader</code> class to open and read the file. The following example reads objects sequentially:</p>
<pre>factory = RGeo::Geographic.spherical_factory(:srid =&gt; 4326)
RGeo::Shapefile::Reader.open('myfile.shp', :factory =&gt; factory) do |file|
  file.each do |record|
    geom = record.geometry
    # geom is now an RGeo geometry object.
    name = record['Name']
    # You can read any other attribute similarly.
    # Now, you can do whatever you want with the data,
    # such as inserting rows into your database...
  end
end</pre>
<p>Notice that we provide a factory for the objects being read. Shapefiles generally do not provide an SRID, so we must supply that. The above example assumes the shapefile contains latitude-longitude coordinates in WSG84.</p>
<p>The <code>RGeo::Shapefile::Reader</code> class also lets you do random access reads, and get other information about the shapefile&#8217;s contents. See the <a title="rgeo-shapefile rdocs" href="http://virtuoso.rubyforge.org/rgeo-shapefile/README_rdoc.html" target="_blank">rdocs</a> for more details. The gem does not currently support writing shapefiles, but that feature is on the roadmap.</p>
<p>For more information on the shapefile format itself, you can find the original ESRI specification at <a title="Shapefile specification" href="http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf" target="_blank">http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf</a>. Another common (C-based) implementation of shapefile is Shapelib, which you can find at <a title="ShapeLib" href="http://shapelib.maptools.org/" target="_blank">http://shapelib.maptools.org/</a>.</p>
<h2>Web services and GeoJSON</h2>
<p>Another way to obtain location data is to call a web service such as <a title="Google Places API" href="http://code.google.com/apis/maps/documentation/places/" target="_blank">Google Places</a>, <a title="SimpleGeo" href="https://simplegeo.com/" target="_blank">SimpleGeo</a>, or <a title="Factual" href="http://www.factual.com/" target="_blank">Factual</a>. These services do the heavy lifting of curating, deduping, and managing location data, and generally provide an http REST api letting you query for location information of interest.</p>
<p>There are a number of different types of web services, including geocoders, point of interest search, location properties, and others. I&#8217;ll write up a survey of useful location-oriented web services in a later article. For this current article, however, we are interested in data formats that would typically be returned from a point of interest search. When you make a query, what sort of data can you expect to get?</p>
<p>In many cases, the web service will define its own schema for the returned data. You must then parse the returned document yourself to extract the information you want. There are well-known gems available for this task, such as <a title="JSON gem" href="http://flori.github.com/json/" target="_blank">json</a> for parsing <a title="JSON format" href="http://www.json.org/" target="_blank">JSON</a>, and <a title="Nokogiri XML parser" href="http://nokogiri.org/" target="_blank">nokogiri</a> for parsing XML. There are also, however, a few semi-standard schemas commonly used by a number of web services. Here we will take a quick tour of some of these formats and how you can go about using them.</p>
<p><strong>GeoJSON</strong> is an important emerging standard commonly used by SimpleGeo and similar modern APIs. It provides a standard JSON representation for each geometric type, as well as support for bounding boxes, coordinate systems, and a set of properties. Following is an example of GeoJSON, lifted out of the specification:</p>
<pre>{ "type": "FeatureCollection",
  "features": [
    { "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [102.0, 0.5]},
      "properties": {"prop0": "value0"}
      },
    { "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
          ]
        },
      "properties": {
        "prop0": "value0",
        "prop1": 0.0
        }
      },
    { "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0],
            [100.0, 1.0], [100.0, 0.0] ]
          ]
      },
      "properties": {
        "prop0": "value0",
        "prop1": {"this": "that"}
        }
      }
    ]
  }</pre>
<p>The core object type in GeoJSON is the <em>Feature</em>, which consists of a geometry and a set of properties. The geometry can be any of the OGC types, and its internal representation is closely modeled on WKT. Properties are simply named key-value pairs whose values can be any JSON object.</p>
<p>GeoJSON is simple and highly versatile, and is often an ideal format both for consuming and producing geospatial data. From Ruby, you can use the <a title="rgeo-geojson gem" href="http://virtuoso.rubyforge.org/rgeo-geojson/README_rdoc.html" target="_blank">rgeo-geojson</a> gem to read and write GeoJSON. Here are some quick examples to get you started:</p>
<pre>require 'rgeo/geo_json'

str1 = '{"type":"Point","coordinates":[1,2]}'
geom = RGeo::GeoJSON.decode(str1, :json_parser =&gt; :json)
geom.as_text              # =&gt; "POINT(1.0 2.0)"

str2 = '{"type":"Feature","geometry":{"type":"Point","coordinates":' +
  '[2.5,4.0]},"properties":{"color":"red"}}'
feature = RGeo::GeoJSON.decode(str2, :json_parser =&gt; :json)
feature['color']          # =&gt; 'red'
feature.geometry.as_text  # =&gt; "POINT(2.5 4.0)"

hash = RGeo::GeoJSON.encode(feature)
hash.to_json == str2      # =&gt; true</pre>
<p>For more information on GeoJSON, see <a title="GeoJSON" href="http://geojson.org/" target="_blank">http://geojson.org/</a>. The actual spec hosted on the website is quite short and very readable. You can find more information on the rgeo-geojson gem from its <a title="GeoJSON rdoc" href="http://virtuoso.rubyforge.org/rgeo-geojson/README_rdoc.html" target="_blank">rdocs</a>.</p>
<h2>XML-based formats</h2>
<p>Although JSON is often a format of choice for many modern web services because of its simplicity and its close affinity with Javascript and similar high-level languages, XML is still the established standard in many fields and applications. GIS services, in particular, have a long tradition of XML-based representation, and there are a number of XML-based geospatial formats you may encounter when writing location-aware applications. Among them:</p>
<p><strong>GeoRSS</strong> is a family of RSS extensions for embedding geospatial data into RSS or Atom feeds, often used to spatially tag feed entries. It comes in two flavors, <em>Simple GeoRSS</em> and <em>GML GeoRSS</em>. Simple GeoRSS is designed for simplicity, and supports a limited set of features. Notably, not all the OGC geometric types can be represented, and coordinate system is limited to WGS84 latitude/longitude. GML GeoRSS is a more full-featured but much more complex format, essentially a profile of <em>GML</em>, which we will cover below. Most actual implementations of GeoRSS are of the Simple flavor.</p>
<p>Below are a couple of examples of a basic GeoRSS element from an RSS feed, first in the Simple flavor and then in the GML flavor.</p>
<pre>&lt;georss:point&gt;47.604828 -122.330779&lt;/georss:point&gt;</pre>
<pre>&lt;GeoRSS:where&gt;
  &lt;gml:Point&gt;
    &lt;gml:pos&gt;47.604828 -122.330779&lt;/gml:pos&gt;
  &lt;/gml:Point&gt;
&lt;GeoRSS:where&gt;</pre>
<p>As of this writing, the georss.org website appears to be unmaintained and possibly hacked. The best starting point I can recommend for GeoRSS is an OGC whitepaper at <a title="GeoRSS whitepaper from OGC" href="http://portal.opengeospatial.org/files/?artifact_id=15755" target="_blank">http://portal.opengeospatial.org/files/?artifact_id=15755</a>.</p>
<p>I&#8217;m not currently aware of an RGeo-based Ruby implementation of GeoRSS. The older <a title="GeoRuby gem" href="http://georuby.rubyforge.org/" target="_blank">GeoRuby</a> gem, however, does have basic support for GeoRSS.</p>
<p><strong>Geography Markup Language</strong> (or <strong>GML</strong>) is an XML-based object model intended to describe geographic information. Its specification is maintained by the Open Geospatial Consortium. GML by itself is a highly general and flexible model that can represent not only geometric objects and coordinate systems such as we have looked at so far in this article series, but also observations, topological information, temporal information, and various other related entities.</p>
<p>You generally don&#8217;t work with GML directly, but instead use an application XML schema that references GML internally. Furthermore, most application schemas don&#8217;t utilize the entire GML specification, but a relevant subset, known as a <em>GML profile</em>. For example, GML GeoRSS is an application schema referencing a GML profile relevant to geotagging feed entries.</p>
<p>Another common GML-based XML schema is <em>CityGML</em> (<a title="CityGML website" href="http://www.citygml.org/" target="_blank">http://www.citygml.org/</a>), which is designed to model urban objects. CityGML is commonly used, for example, to model 3D visualizations of cities.</p>
<p>For more information on GML as a whole, you can review the OGC spec at <a title="OGC GML spec" href="http://www.opengeospatial.org/standards/gml" target="_blank">http://www.opengeospatial.org/standards/gml</a>.</p>
<p>I&#8217;m not currently aware of any specific Ruby support for GML or its various dialects.</p>
<p><strong>KML</strong> (or <strong>Keyhole Markup Language</strong>) is an XML schema that originated at Google for describing features in <a title="Google Earth" href="http://earth.google.com/" target="_blank">Google Earth</a>, but was later standardized by the OGC. Although it does have some overlap with GML, KML is often seen as complementary because of its particular emphasis on visualization. Its intended use is to describe how to display features within a Google Earth style application. You can, for example, open a KML file with Google Earth to display its contents.</p>
<p>For more information on KML, see the Google documentation at <a title="KML documentation from Google" href="http://code.google.com/apis/kml/documentation/" target="_blank">http://code.google.com/apis/kml/documentation/</a> or the OGC specification at <a title="OGC KML spec" href="http://www.opengeospatial.org/standards/kml" target="_blank">http://www.opengeospatial.org/standards/kml</a>.</p>
<p>I&#8217;m not currently aware of any specific Ruby support for KML.</p>
<h2>Where to go from here</h2>
<p>This article has covered just a few of the most common and/or promising major spatial data formats. There are a number of others currently in use, including many locale or application-specific forms. But as you can see, Ruby support for even the major formats is currently rather thin. We still have much work to do on our tools.</p>
<p>As the principal author of RGeo, I&#8217;m looking for help in this area. I released the rgeo-geojson and rgeo-shapefile gems based on work I&#8217;ve done to integrate my own applications with those formats. However, I haven&#8217;t yet had the need to actually use one of the XML formats, and as a result I haven&#8217;t written any tools to help with them. There is currently quite a bit of room to contribute to the community in this area.</p>
<p>Next week I&#8217;m going to take a break for the holidays, but I expect to release the next planned article on scaling spatial applications with the new year. Stay tuned, and let&#8217;s bring Rails down to earth!</p>
<div>
<p><em>This is part 5 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/125/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 4: Coordinate Systems and Projections</title>
		<link>http://www.daniel-azuma.com/blog/archives/106</link>
		<comments>http://www.daniel-azuma.com/blog/archives/106#comments</comments>
		<pubDate>Tue, 13 Dec 2011 05:31:45 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=106</guid>
		<description><![CDATA[When people speak of a learning curve in geospatial programming, they&#8217;re usually referring to handling coordinate systems. It&#8217;s true that many spatial applications require close attention to the coordinate system, and it&#8217;s true that there are some difficult concepts involved. &#8230; <a href="http://www.daniel-azuma.com/blog/archives/106">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When people speak of a learning curve in geospatial programming, they&#8217;re usually referring to handling coordinate systems. It&#8217;s true that many spatial applications require close attention to the coordinate system, and it&#8217;s true that there are some difficult concepts involved. However, it&#8217;s been my experience that once the light bulb turns on, it opens up a lot of the power and potential of geodata.</p>
<p>In this article, we&#8217;ll take a first look at coordinate systems and geographic projections. We will:</p>
<ul>
<li>Examine the importance and effect of coordinate system differences</li>
<li>Survey the various coordinate systems used for geospatial data</li>
<li>Become familiar with coordinate system representations and SRIDs</li>
<li>Specify coordinate systems in RGeo factories</li>
<li>Use RGeo to convert data between coordinate systems</li>
<li>Learn how to handle coordinate systems in Rails</li>
</ul>
<p>This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<p><span id="more-106"></span></p>
<h2>Why coordinate systems are important: a cautionary fable</h2>
<p>Once upon a time, a plane took off from San Francisco, California on a routine flight to Athens, Greece. The captain, being both valiant and diligent with his passengers&#8217; safety, sent a Twitter message to air traffic control. He wanted to ensure that his flight route would not be disrupted, for he had read on Hacker News that the diabolical EvilVolcano in Iceland was erupting, sending a deadly ash cloud high into the air lanes.</p>
<p>Meanwhile, an air traffic technician, having just finished installing a brand new Rails-based flight planning application, received the tweet. &#8220;<code>@air_traffic_control pls chk flt path SFO-ATH. Far enuf fr #EvilVolcano?</code>&#8221;</p>
<p>Excited to use his new tool, the technician got busy with his analysis. He looked up the latitude-longitude coordinates of San Francisco and Athens, and plotted a straight line between them, using Google Maps to verify that his path looked correct. Then he looked up the latitude-longitude coordinates of the diabolical EvilVolcano, and calculated the distance between it and the plane&#8217;s expected path.</p>
<p>&#8220;Lo!&#8221; he exclaimed. &#8220;Surely the flight path shall follow a straight line between two points, right along the 38-degree latitude line. But look&#8211; Iceland is nowhere near that path. The safety of our valiant pilot is assured!&#8221;</p>
<div id="attachment_111" class="wp-caption aligncenter" style="width: 522px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/wrong_path.png"><img class="size-full wp-image-111" title="wrong_path" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/wrong_path.png" alt="" width="512" height="280" /></a><p class="wp-caption-text">The flight path as seen by the air traffic technician</p></div>
<p>The technician proudly tweeted back his findings: &#8220;<code>@valiant_pilot Flt path safe dist fr #EvilVolcano. Have pleasant journey.</code>&#8221;</p>
<p>Having received this response on Twitter, the pilot proceeded to take off on his planned route. The last tweet received from the ill-fated flight, some seven hours later, read as follows: &#8220;<code>Oceanic Air flt 815 encountering ash #SmokeMonster from #EvilVolcano. #Mayday #Mayday</code>&#8220;. And the rest is history.</p>
<p>What went wrong?</p>
<p>For the most part, the air traffic technician did the right thing. He looked up the latitude-longitude coordinates of the departure and arrival cities, drew a straight line path between them, and then computed the distance bewteen that path and the EvilVolcano in Iceland. That is, he computed:</p>
<pre>Distance(LineString(-122.4 37.8, 23.7 37.9), Point(-19.6 63.6))</pre>
<p>However, he missed one thing. The shape of a &#8220;straight&#8221; line drawn between two points, and thus the distance calculated, may be vastly different depending on what coordinate system you are using, even though the latitude and longitude coordinates are the same!</p>
<p>Google Maps uses a <em>Mercator Projection</em> to display a world map. In this flat coordinate system, a straight line between San Francisco and Athens follows the 38-degree latitude line, passing through muggy Virginia and sunny southern Spain on its way to balmy (albeit bankrupt) Athens.</p>
<p>But the earth itself is not flat. Using a spherical coordinate system, the straight line shortest path between the two cities across the surface of the globe passes further north, directly over chilly and volcano-infested Iceland. This is the actual flight path of the ill-fated valiant pilot.</p>
<div id="attachment_112" class="wp-caption aligncenter" style="width: 448px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/actual_path.png"><img class="size-full wp-image-112" title="actual_path" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/actual_path.png" alt="" width="438" height="374" /></a><p class="wp-caption-text">The actual shortest straight-line path from San Francisco to Athens</p></div>
<p>Simply by using the wrong coordinate system, the air traffic technician got a grossly innacurate answer, resulting in dozens of innocent passengers becoming instant prime-time celebrities.</p>
<h2>Coordinate systems for geo data</h2>
<p>At a basic level, a coordinate system can be thought of as providing &#8220;meaning&#8221; to a set of coordinates. When you see the location &#8220;<code>Point(-19.6 63.6)</code>&#8220;, how do you interpret those numbers? They could be the latitude and longitude of the Iceland volcano, but they could equally be measurements in feet from your front door, or light years from Alpha Centauri. The coordinate system is what differentiates these cases.</p>
<p>Location applications generally work with coordinate systems related to the earth&#8217;s surface, and these coordinate systems fall into three types.</p>
<p><strong>Geocentric</strong> coordinate systems are three-dimensional coordinate systems with the origin located at the earth&#8217;s center. You won&#8217;t generally see much data in a geocentric coordinate system, but it is sometimes a convenient coordinate system to use for computational geometry and analysis algorithms.</p>
<div id="attachment_113" class="wp-caption aligncenter" style="width: 361px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geocentric.png"><img class="size-full wp-image-113 " title="geocentric" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geocentric.png" alt="" width="351" height="321" /></a><p class="wp-caption-text">Geocentric coordinates measure X, Y, and Z distances from the center of the earth. (Credit: http://kartoweb.itc.nl/geometrics/Coordinate%20systems/coordsys.html)</p></div>
<p><strong>Geographic</strong> coordinate systems are the familiar latitude-longitude systems identifying points on the earth&#8217;s surface in terms of degrees. The most common geographic coordinate system, the one used by GPS and expected by most mapping applications, is known as &#8220;<em>WGS 84</em>&#8220;.</p>
<div id="attachment_114" class="wp-caption aligncenter" style="width: 399px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geographic.png"><img class="size-full wp-image-114 " title="geographic" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geographic.png" alt="" width="389" height="379" /></a><p class="wp-caption-text">Geographic coordinates measure our familiar latitude and longitude. (Credit: http://kartoweb.itc.nl/geometrics/Coordinate%20systems/coordsys.html)</p></div>
<p><strong>Projected</strong> coordinate systems involve taking a portion of the earth&#8217;s surface and &#8220;flattening&#8221; it. These coordinate systems are planar and generally Cartesian for easy display and computation; however, they introduce various kinds and amounts of distortion. Whenever you see a map displayed on a piece of paper, a computer screen, or other flat medium (basically anything other than a globe), you are looking at a projection.</p>
<p>There are hundreds of different projections in use, from common projections used for world maps, to special-purpose projections used for specific regions. When you view a Google Map, you are looking at a <em>Mercator Projection</em>. This is a projection designed to preserve shapes and straight-line directions, at the expense of distorting sizes and distances away from the Equator. A Google Map, for instance, implies that Greenland is larger than Africa, when in fact it is much, much smaller.</p>
<div id="attachment_115" class="wp-caption aligncenter" style="width: 448px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/google_world.png"><img class="size-full wp-image-115" title="google_world" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/google_world.png" alt="" width="438" height="322" /></a><p class="wp-caption-text">Google Maps uses a Mercator Projection</p></div>
<p>The United Nations logo includes a <em>polar projection</em> putting the north pole at the center. In this projection the pole is the least distorted region of the world, and everything else revolves around it, symbolizing the ideal of a world community privileging no particular country (except possibly the northern hemisphere, but we&#8217;ll ignore the politics for our purposes&#8230;)</p>
<div id="attachment_116" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/un_logo.png"><img class="size-full wp-image-116" title="un_logo" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/un_logo.png" alt="" width="300" height="255" /></a><p class="wp-caption-text">The United Nations logo includes a north polar projection</p></div>
<p>Finally, whenever you look at a folding map&#8211; whether a street map for a city, a state map, or any other map that shows a limited area&#8211; you are usually looking at a <em>local projection</em>, one specifically tailored to that particular limited area. Such projections define a particular area in which they make sense. Objects inside that boundary can usually be displayed with minimal distortion, while objects outside that boundary often cannot be described at all.</p>
<h2>Representing and specifying coordinate systems</h2>
<p>Whenever you receive location data&#8211; whether from geocoding, user input, an external database, or any other source&#8211; the data should come with a coordinate system. Currently, there are two common ways a coordinate system can be defined:</p>
<p><strong>Proj4 Syntax</strong>. One common way to specify a coordinate system is through a syntax defined by the <a title="Proj" href="http://proj.osgeo.org/" target="_blank">Proj</a> library. We won&#8217;t go into detail on the syntax here, but it is intended to describe how to convert data to and from that coordinate system. That is, if you receive data in a projected coordinate system, if you have the Proj4 syntax for that projection, the Proj library can convert the data back into latitude and longitude, and vice versa. Because of this useful property, Proj4 syntax is quite ubiquitous. Below is an example of the Proj4 syntax for &#8220;NAD83 / Washington North&#8221;, a local projection commonly used for topgraphic mapping in northern Washington state. For now, don&#8217;t worry if you don&#8217;t understand every field. This is just an example so that you can recognize Proj4 syntax when you see it.</p>
<pre>+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 +lat_0=47
  +lon_0=-120.8333333333333 +x_0=500000.0001016001 +y_0=0
  +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs</pre>
<p><strong>OGC well-known-text</strong> (or <strong>WKT</strong>). The <a title="OGC" href="http://www.opengeospatial.org/" target="_blank">Open Geospatial Consortium</a>, the standards body that developed the <a title="OGC simple features spec" href="http://www.opengeospatial.org/standards/sfa" target="_blank">Simple Feature Access</a> specification described in <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>, also developed a syntax for representing <a title="OGC coordinate transformation spec" href="http://www.opengeospatial.org/standards/ct" target="_blank">coordinate systems and transformations</a>. Below is the well-known-text for &#8220;NAD83 / Washington North&#8221;. Again, for now you don&#8217;t need to understand every field here, but you should be able to recognize WKT format when you see it.</p>
<pre>PROJCS["NAD83 / Washington North (ftUS)",
  GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
      SPHEROID["GRS 1980",6378137,298.257222101,
        AUTHORITY["EPSG","7019"]],
      AUTHORITY["EPSG","6269"]],
    PRIMEM["Greenwich",0,
      AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.01745329251994328,
      AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4269"]],
  UNIT["US survey foot",0.3048006096012192,
    AUTHORITY["EPSG","9003"]],
  PROJECTION["Lambert_Conformal_Conic_2SP"],
  PARAMETER["standard_parallel_1",48.73333333333333],
  PARAMETER["standard_parallel_2",47.5],
  PARAMETER["latitude_of_origin",47],
  PARAMETER["central_meridian",-120.8333333333333],
  PARAMETER["false_easting",1640416.667],
  PARAMETER["false_northing",0],
  AUTHORITY["EPSG","2285"],
  AXIS["X",EAST],
  AXIS["Y",NORTH]]</pre>
<p>As we saw above, each piece of geometry needs a corresponding coordinate system in order to specify the meaning of its coordinates and thus how to handle its data. Instead of attaching an entire Proj4 or WKT formatted string to every latitude-longitude point in a system, most geospatial systems provide a database of coordinate systems, each identified by an ID known as the <em>Spatial Reference ID</em> (or <em>SRID</em>). Each geometric data object then includes an SRID field referencing an entry in that database.</p>
<p>Technically, a geospatial system can provide its own spatial reference database and set its own SRIDs. However, in practice, many systems use a <em>de facto</em> standard dataset known as the <em>EPSG dataset</em>. This is a database of several thousand coordinate systems managed by the <a title="OGP" href="http://www.ogp.org.uk/" target="_blank">International Association of Oil &amp; Gas Producers</a>. The EPSG dataset is ubiquitous enough that most spatial database tools include a copy of it. A spatially-enabled <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> database, for example, automatically includes a table called <code>spatial_ref_sys</code> that is typically prepopulated with the EPSG dataset. You can look up SRIDs and get the coordinate system name, the WKT representation, and sometimes the Proj4 representation. The &#8220;NAD83 / Washington North&#8221; coordinate system example above has SRID 2285 in the EPSG database.</p>
<p>One important EPSG-specified SRID that you will encounter often is 4326. 4326 refers to the &#8220;WGS 84&#8243; geographic (latitude-longitude) coordinate system we mentioned earlier&#8211; the coordinate system used by Global Positioning System (GPS). Typically, when you get a latitude-longitude coordinate from a GPS system, from a geocoder, from a Google map input, or most other common sources, it will implicitly be in the EPSG 4326 coordinate system. This is so universally true that PostGIS currently mandates that the SRID must be set to 4326 when you create a column of type &#8220;geography&#8221; (that is, a latitude-longitude column), as we saw in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>.</p>
<p>You can browse the EPSG database at <a title="spatialreference.org" href="http://www.spatialreference.org/" target="_blank">http://www.spatialreference.org/</a>.</p>
<h2>Coordinate systems for RGeo factories</h2>
<p>In <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>, we discussed how <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> manages coordinate systems through factories. Now that we understand coordinate systems in more detail, we can take a closer look at how to handle coordinate systems in Ruby.</p>
<p>RGeo supports two main types of factories: <em>Cartesian factories</em> and <em>geographic factories</em>. Cartesian factories are ideal for handling projected coordinate systems, in which the domain is flat. Geographic factories are useful for geographic coordinate systems representing latitude and longitude, in which the domain is curved like the surface of the earth.</p>
<p>When you create an RGeo factory, you may specify the exact coordinate system by passing arguments to the factory constructor. In most cases, you should provide an SRID using the <code>:srid</code> parameter. You may also provide Proj4 and/or WKT representations for the coordinate system using the <code>:proj4</code> and <code>:coord_sys</code> parmeters, respectively. For example, here&#8217;s how you could create a factory designed to handle the &#8220;NAD83 / Washington North&#8221; projection we discussed earlier.</p>
<pre>north_wa_proj4 = '+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 ' +
  '+lat_0=47 +lon_0=-120.8333333333333 +x_0=500000.0001016001 ' +
  '+y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 ' +
  '+no_defs'
north_wa_wkt = &lt;&lt;WKT
  PROJCS["NAD83 / Washington North (ftUS)",
    GEOGCS["NAD83",
      DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101,
          AUTHORITY["EPSG","7019"]],
        AUTHORITY["EPSG","6269"]],
      PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
      UNIT["degree",0.01745329251994328,
        AUTHORITY["EPSG","9122"]],
      AUTHORITY["EPSG","4269"]],
    UNIT["US survey foot",0.3048006096012192,
      AUTHORITY["EPSG","9003"]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",48.73333333333333],
    PARAMETER["standard_parallel_2",47.5],
    PARAMETER["latitude_of_origin",47],
    PARAMETER["central_meridian",-120.8333333333333],
    PARAMETER["false_easting",1640416.667],
    PARAMETER["false_northing",0],
    AUTHORITY["EPSG","2285"],
    AXIS["X",EAST],
    AXIS["Y",NORTH]]
WKT
north_wa_factory = RGeo::Cartesian.factory(:srid =&gt; 2285,
  :proj4 =&gt; north_wa_proj4, :coord_sys =&gt; north_wa_wkt)</pre>
<p>Notice that, since the coordinate system is a projection, we&#8217;re using a Cartesian factory that will perform computations in a flat domain.</p>
<p>When you work with a projected coordinate system like this one, the coordinates themselves are expressed in the projection rather than in latitude and longitude. For example, the location of the Space Needle in Seattle, which is at latitude 47.620578, longitude -122.34978, is expressed as <code>(1266457.58, 230052.50)</code> in this coordinate system.</p>
<pre>space_needle = north_wa_factory.point(1266457.58, 230052.50)</pre>
<p>Let&#8217;s consider another example. If you want to work with latitudes and longitudes, say from a GPS system, then you should use the &#8220;WGS 84&#8243; coordinate system, which has SRID 4326. Looking up this coordinate system in <a title="Spatialreference.org" href="http://www.spatialreference.org/" target="_blank">spatialreference.org</a>, we can find its Proj4 and WKT forms:</p>
<pre>wgs84_proj4 = '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
wgs84_wkt = &lt;&lt;WKT
  GEOGCS["WGS 84",
    DATUM["WGS_1984",
      SPHEROID["WGS 84",6378137,298.257223563,
        AUTHORITY["EPSG","7030"]],
      AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
      AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.01745329251994328,
      AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
WKT</pre>
<p>To create a factory for this coordinate system, we should use one of the geographic factories provided by RGeo. These factories perform computations over a curved earth rather than a flat earth. For example, they will correctly draw the line between San Francisco and Athens so that it passes through Iceland. You can use <code>RGeo::Geographic.spherical_factory</code> to create a factory that performs computations on a spherical earth:</p>
<pre>wgs84_factory = RGeo::Geographic.spherical_factory(:srid =&gt; 4326,
  :proj4 =&gt; wgs84_proj4, :coord_sys =&gt; wgs84_wkt)</pre>
<p>We now have a factory that will correctly manage features using latitude/longitude.</p>
<pre>space_needle = wgs84_factory.point(-122.34978, 47.620578)</pre>
<p>Technically, the earth is not a perfect sphere, but is slightly flattened. The WGS84 coordinate system actually uses a flattened ellipsoid to more closely model this shape. However, RGeo does not yet support ellipsoidal computations, so we used a spherical factory instead as an approximation. In many cases, this will be good enough, but if you need accuracy down to the millimeter, this is something to be aware of.</p>
<h2>Converting data between coordinate systems</h2>
<p>The above examples may have sparked an important question. In the examples in parts 2 and 3, we created factories without the benefit of the proj4 and wkt strings, and in some cases we also omitted the SRID. Under what circumstances are proj4, wkt, and/or SRID needed when you create an RGeo factory, and under what circumstances can you leave them out?</p>
<p>SRID is generally needed when you want to store your data in a PostGIS database. This is because PostGIS generally puts an SRID constraint on spatial columns that you create, in order to make sure you don&#8217;t mismatch coordinate systems. So, if you are storing data in or pulling data from PostGIS, you should always specify the SRID in the factory.</p>
<p>The proj4 string has a different purpose: it is used by the Proj library to describe how to convert coordinates between coordinate systems. Here is how this works.</p>
<p>Suppose you are reading a set of points from a data source that uses the &#8220;NAD83 / Washington North&#8221; coordinate system, and you wanted to convert them to latitude and longitude. To perform this conversion, you need two factories, one in the source coordinate system and one in the destination coordinate system. Both factories need to have their <code>:proj4</code> string set. This lets the Proj library understand both coordinate systems so it can figure out how to convert from one to the other.</p>
<p>In the above examples when we created our <code>north_wa_factory</code> and <code>wgs84_factory</code>, we provided the needed proj4 strings. So these factories are ready to be used.</p>
<p>Now, to actually convert the data, load it using the source factory, and then use RGeo&#8217;s &#8220;cast&#8221; mechanism to cast it to the other factory. Call <code>RGeo::Feature.cast</code>, pass it the original object and the destination factory, and set the <code>:project</code> argument to true to tell it to transform coordinates.</p>
<pre>space_needle = north_wa_factory.point(1266457.58, 230052.50)
space_needle_latlon = RGeo::Feature.cast(space_needle,
  :factory =&gt; wgs84_factory, :project =&gt; true)</pre>
<p>This code tells RGeo to take the <code>space_needle</code> object (which is in the projected coordinate system) and convert it to the WGS84 factory, while transforming (projecting) its coordinates so they are correct in the new factory&#8217;s coordinate system. As a result, <code>space_needle_latlon</code> is created, containing latitude and longitude coordinates, and with its factory set to <code>wgs84_factory</code>.</p>
<p>What about the WKT string? Currently, RGeo does not have any functional use for the WKT string; it is just an informational field. So you can omit it if you want. However, it turns out that in many cases you can theoretically use the WKT to transform coordinates in the same way as the Proj4 string. RGeo will likely provide this capability in the future&#8211; you will be able to pass WKT instead of Proj4 to allow factories to transform coordinates.</p>
<h2>Using Coordinate Systems in Rails</h2>
<p>In <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>, we looked at an example in which we created a PostGIS column that stored point data in geographic (latitude-longitude) coordinates. We did this by installing the activerecord-postgis-adapter, which extends ActiveRecord migrations to create spatial columns, and which exposes spatial attributes as RGeo data objects.</p>
<p>Now let&#8217;s consider another example. Suppose we obtained a set of polygons (say, zip code boundaries) from a data source that uses the &#8220;NAD83 / Washington North&#8221; projection. We could convert the polygons to latitude-longitude, but remember that doing so can actually change the shape of the polygon. So let&#8217;s suppose we decided this was unacceptable, and we need to store the polygons in the database in the projected coordinate system. Here&#8217;s the migration for this case:</p>
<pre>class CreateLocations &lt; ActiveRecord::Migration
  def change
    create_table :zip_codes do |t|
      t.integer :zip
      t.polygon :boundary, :srid =&gt; 2285
    end
  end
end</pre>
<p>The name of our column is &#8220;<code>boundary</code>&#8220;, and we set its type to &#8220;<code>polygon</code>&#8220;. Now, in the example in part 2, we had set the <code>:geographic</code> property of the column, indicating that it was storing latitude and longitude and that it should use the PostGIS features designed for that case. In this new example, we are storing projected coordinates, so we do not set <code>:geographic</code>. Instead, we just set the SRID to match the coordinate system that we are using. Setting a SRID on the database column actually sets up a database constraint: it allows only geometries with a matching SRID to be stored in the column. This is PostGIS&#8217;s way of helping you avoid the mistake of our unfortunate air traffic control technician: that of mismatching our coordinate systems.</p>
<p>As we did in part 2, we also need to set the factory for this column so that Rails knows what factory to use when it reads geometries from the database.</p>
<pre>class ZipCode &lt; ActiveRecord::Base
  north_wa_factory = ... # use the factory we created earlier
  set_rgeo_factory_for_column(:boundary, north_wa_factory)
end</pre>
<p>In general, it is important that the factory you set in your ActiveRecord class matches the constraints in the database column, so that both sides handle the data in the same way. In particular, the SRIDs should match, and either both should be Cartesian or both should be geographic.</p>
<p>Now we can read and write the polygon data. We just need to remember that the data is not in latitude-longitude, but in the projected coordinate system. So when we write polygons to the database and read from the database, we will receive projected coordinates.</p>
<p>This example, of course, brings up an important question. We need to decide up front whether the database should contain projected data or latitude-longitude data. How do we choose? This can be a somewhat complicated question, and I will dive more deeply into the pros and cons of different strategies in a later article. However, for now we know enough to understand a few of the issues.</p>
<p>In many simple cases&#8211; if you are working only with points, or with small features that do not cover a large part of the globe, or if extreme accuracy is not important&#8211; you may find it easiest to think in latitude and longitude. In those cases, you can just create a <code>:geographic</code> column in the database and convert everything to a geographic coordinate system. Just be aware that there are potential issues whenever you have to convert data from one coordinate system to another. As we saw in our story, lines and polygons that span a large area can change their shape dramatically when switching coordinate systems. So if accuracy is essential, it may be desirable for your database to use the same coordinate system as your data source. You also should consider what type of spatial queries you are likely to run against your data. Remember that coordinates in a query must match the SRID and coordinate system of the data in your database.</p>
<p><span class="Apple-style-span" style="color: #000000; font-size: 22px; line-height: 32px;">Where to go from here</span></p>
<p>Congratulations on making it through this article! Understanding coordinate systems can be tricky, but it is very necessary for doing nontrivial applications. In this discussion, I&#8217;ve deliberately left out a number of more advanced topics that I&#8217;ll probably cover in a later article. But you should now have enough information so you won&#8217;t get lost when people start talking about projections and SRIDs.</p>
<p>If you&#8217;d like to explore more about map projections and geospatial coordinate systems, the articles on Wikipedia are not too bad. I&#8217;m not aware of any good books on this material geared towards web developers, but if you know of any, please send me a line. For reference, you&#8217;ll probably find yourself going to <a title="Spatialreference.org" href="http://www.spatialreference.org/" target="_blank">spatialreference.org</a> frequently when you need information on the coordinate system referenced by a particular SRID.</p>
<p>For the next article, I&#8217;m currently planning on covering common file and data interchange formats used for geospatial data. Stay tuned, and let&#8217;s bring Rails down to earth!</p>
<p><em>This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/106/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 3: Spatial Data Types with RGeo</title>
		<link>http://www.daniel-azuma.com/blog/archives/88</link>
		<comments>http://www.daniel-azuma.com/blog/archives/88#comments</comments>
		<pubDate>Mon, 05 Dec 2011 08:03:07 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=88</guid>
		<description><![CDATA[RGeo is a library and framework for handling spatial data in a Ruby application. It&#8217;s currently designed more for completeness than ease of use, so there&#8217;s a bit of an initial learning curve. This article is an attempt to smooth &#8230; <a href="http://www.daniel-azuma.com/blog/archives/88">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> is a library and framework for handling spatial data in a Ruby application. It&#8217;s currently designed more for completeness than ease of use, so there&#8217;s a bit of an initial learning curve. This article is an attempt to smooth that learning curve a bit. It contains a tutorial introduction to RGeo, covering the basics that every RGeo user needs to know, and a bit of discussion of where the library came from. Included is:</p>
<div>
<ul>
<li>An introduction to the industry standard spatial data types</li>
<li>Working with spatial data objects in RGeo</li>
<li>Factories: why RGeo uses them and what they&#8217;re for</li>
<li>A comparison with GeoRuby</li>
<li>A guide to the RDocs</li>
</ul>
</div>
<p>RGeo includes a number of advanced features which I&#8217;ll cover in future articles. But for now, I think these are the important topics that will get you started.</p>
<p>This is part 3 of my series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<p><span id="more-88"></span></p>
<h2>Standard Spatial Data Types</h2>
<p>Most serious geospatial systems operate on a standard set of spatial data types specified by a standard known as the <a title="SFA Spec" href="http://www.opengeospatial.org/standards/sfa" target="_blank">Simple Feature Access Specification</a>, which is maintained by the <a title="OGC" href="http://www.opengeospatial.org/" target="_blank">Open Geospatial Consortium</a>. This spec (which I&#8217;ll abbreviate SFS) defines a suite of seven concrete data types capable of representing points and piecewise linear objects in two-dimensional space, along with a set of standard operations that can be performed on them.</p>
<p>The SFS has gone through several iterations. Most current production systems are based on version 1.1 of the SFS, although newer versions have added a few more data subtypes. Since 1.1 is the most commonly supported revision, it is what RGeo implements and what I will cover here.</p>
<p>The seven data types defined by the SFS include three geometric types, and four collection types. They are as follows.</p>
<p><strong>Point</strong>. This is a simple point in two-dimensional space, identified by an x and y coordinate. Often, Points are used to represent locations on the surface of the earth, and sometimes (but not always) the x and y coordinate are interpreted as longitude and latitude, respectively. In other cases, a Point could simply represent a point on the X-Y plane.</p>
<p><strong>LineString</strong>. This is a set of one or more straight line segments connected end to end. A common use for a LineString might be a set of driving directions. LineStrings may be self-intersecting, and some special LineStrings may be closed loops where the start point is the same as the end point. Below are a few examples of LineStrings. (I lifted this diagram straight out of the SFS document.)</p>
<p style="text-align: center;"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-spec-linestrings.png"><img class="size-full wp-image-89 aligncenter" title="georails-3-spec-linestrings" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-spec-linestrings.png" alt="" width="428" height="164" /></a></p>
<p><strong>Polygon</strong>. This is a continguous area in the plane, with piecewise linear borders. Polygons can also have holes. A common use for a Polygon might be a city or country boundary. Below are a few examples of Polygons (again lifted out of the SFS document.)</p>
<p><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-spec-polygons.png"><img class="aligncenter size-full wp-image-90" title="georails-3-spec-polygons" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-spec-polygons.png" alt="" width="411" height="143" /></a></p>
<p>For each of the above three types, there is a corresponding collection type that can represent zero or more of that type of object. So <strong>MultiPoint</strong> may include zero or more Points, <strong>MultiLineString</strong> may include zero or more separate LineStrings, and <strong>MultiPolygon</strong> may include zero or more nonoverlapping Polygons.</p>
<p>Finally, there is a generic <strong>GeometryCollection</strong> type that may contain zero or more of any type of object, without any restrictions.</p>
<p>The SFS arranges these spatial types in a class hierarchy. A number of operations (such as intersection and distance) are defined across all types, but a few (such as area) are specific to certain types. The operations defined in such a way as to make them more or less language-agnostic. RGeo, at its heart, can be thought of as a Ruby implementation of these SFS types.</p>
<h2>Working with Spatial Data in RGeo</h2>
<p>Now we&#8217;ll go through some basic examples of handling spatial data in RGeo. This assumes you have RGeo installed along with Geos and Proj4. Please refer to <a href="http://www.daniel-azuma.com/blog/archives/60" target="_blank">part 1</a> (as well as the RGeo <a href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">README</a>) for instructions on installing RGeo if you are having difficulty.</p>
<p>In these examples, we&#8217;ll work with simple planar data. RGeo refers to planar data as &#8220;Cartesian&#8221;, and provides a factory object for creating planar objects.</p>
<pre>factory = RGeo::Cartesian.factory</pre>
<p>Factories are discussed in more detail below; for now, you simply create spatial data objects using the factory. Let&#8217;s create some <em>Points</em>:</p>
<pre>point1 = factory.point(1, 0)
point2 = factory.point(1, 4)
point3 = factory.point(-2, 0)
point4 = factory.point(-2, 4)</pre>
<div id="attachment_91" class="wp-caption aligncenter" style="width: 327px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-example-points.png"><img class="size-full wp-image-91" title="georails-3-example-points" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-example-points.png" alt="" width="317" height="282" /></a><p class="wp-caption-text">Four points plotted on the X-Y plane</p></div>
<p>You can extract the coordinates of a point.</p>
<pre>point1.x # =&gt; 1.0
point1.y # =&gt; 0.0</pre>
<p>As well as perform a rich set of spatial operations. Distance is a pretty common operation:</p>
<pre>point2.distance(point3) # =&gt; 5.0</pre>
<p>Create <em>LineString</em> objects by providing a series of points, indicating the endpoints of the LineString. This first example has two segments specified using three points:</p>
<pre>line_string1 = factory.line_string([point1, point2, point3])</pre>
<p>You can extract the individual points that make up the LineString.</p>
<pre>line_string1.num_points # =&gt; 3
line_string1.point_n(0) == point1 # =&gt; true
line_string1.end_point == point3 # =&gt; true</pre>
<p>Here we create a new LineString and determine whether the two LineStrings intersect:</p>
<pre>point5 = factory.point(0, 1)
line_string2 = factory.line_string([point4, point5])
line_string1.intersects(line_string2) # =&gt; true</pre>
<div id="attachment_92" class="wp-caption aligncenter" style="width: 323px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-example-linestrings.png"><img class="size-full wp-image-92" title="georails-3-example-linestrings" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-example-linestrings.png" alt="" width="313" height="277" /></a><p class="wp-caption-text">LineString 2, in green, intersects LineString 1, in blue</p></div>
<p>To create a <em>Polygon</em> object, provide the boundary as a LineString.</p>
<pre>large_triangle = factory.polygon(line_string1)</pre>
<p>To create a polygon with holes, provide the boundaries of the holes in the optional second argument.</p>
<pre>point6 = factory.point(0, 2)
point7 = factory.point(-1, 1)
line_string3 = factory.line_string([point5, point6, point7])
triangle_with_hole = factory.polygon(line_string1, [line_string3])</pre>
<div id="attachment_93" class="wp-caption aligncenter" style="width: 320px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-example-polygons.png"><img class="size-full wp-image-93" title="georails-3-example-polygons" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/georails-3-example-polygons.png" alt="" width="310" height="275" /></a><p class="wp-caption-text">The polygon triangle_with_hole</p></div>
<p>You can also create that triangle with a hole using a spatial operation, by subtracting the small triangle from the larger one.</p>
<pre>small_triangle = factory.polygon(line_string3)
triangle_with_hole = large_triangle - small_triangle</pre>
<p>To create a collection, provide the elements as an enumeration. <em>MultiPoint</em>, <em>MultiLineString</em>, and <em>MultiPolygon</em> restrict the types of their elements; <em>GeometryCollection</em> has no restriction.</p>
<pre>four_points = factory.multi_point([point1, point2, point3, point4])
general_collection = factory.collection([line_string1, point5])</pre>
<p>In addition to the basic spatial operations, collections implement Enumerable:</p>
<pre>four_points.each{ |p| ... }</pre>
<p>There&#8217;s a lot of depth in the SFS spatial classes and the operations and analysis you can perform on them. I&#8217;ll cover more advanced topics in a later articles. But first, we should address a burning question.</p>
<h2>RGeo Factories</h2>
<p>In the example above, we created geometric objects using a factory. Now, for some of us with a Java background, this might conjure up some less-than-pleasant memories. Factories? How un-Ruby-like!</p>
<p>I must admit, I struggled with this while designing RGeo. But in the end, in RGeo&#8217;s case, I decided they were appropriate. (Or at least a necessary evil.)</p>
<p>In the above examples, we were working with points on the Cartesian X-Y plane. The geometric objects we worked with follow the rules of Euclidean geometry that you&#8217;re probably familiar with from high school mathematics classes. The distance between two points, for example, can be determined using the <a href="http://en.wikipedia.org/wiki/Pythagorean_theorem" target="_blank">Pythagorean Theorem</a>.</p>
<p>However, we&#8217;re not always going to be handling Cartesian objects, especially when we&#8217;re working with location data. Location is generally measured across the surface of the earth, and the surface of the earth is not flat. This means our familiar theorems and formulas for Euclidean geometry may not work, especially for objects covering large areas.</p>
<p>So when RGeo measures a distance, computes an intersection, or performs almost any kind of spatial operation, it needs to know the context: whether you&#8217;re working with points on an X-Y plane, or a latitude-longitude. And even in the latter case, it actually needs to know <em>which</em> latitude-longitude, since there are in fact a number of different ways to define latitude and longitude.</p>
<p>A factory provides this context. It knows whether the coordinate system is an X-Y Cartesian coordinate system, or whether it is latitude and longitude, or something else. It is basically a set of preferences directing how RGeo handles data and performs computations. All the spatial objects created by a factory inherit its preferences.</p>
<p>Or here&#8217;s another way to put it. A point may have coordinates (2, 3). The factory tells you what the &#8220;2&#8243; and the &#8220;3&#8243; actually <em>mean</em> and how they relate to the real world. Are they degrees, feet, or light years? Which direction are they? And what assumptions about the nature of reality do they imply?</p>
<p>Another aspect controlled by RGeo&#8217;s factories is the implementation. When RGeo works with Cartesian coordinates, its factory calls into the Geos library to handle most of the computational geometry. However, sometimes Geos may not be available on your system. In this case, you can use a different factory that also computes Cartesian geometry but uses a pure Ruby implementation. This alternate factory is not as fast as Geos and is currently missing a number of capabilities, but it is available in case you cannot install Geos.</p>
<p>You can obtain the factory object providing the context for any geographic object by calling its &#8220;factory&#8221; method.</p>
<pre>triangle_with_hole.factory # =&gt; factory</pre>
<p>Generally, when you cause two objects to interact by comparing them or performing some binary operation on them, they must have the same factory and live in the same context. It makes sense to find the distance between two points &#8212; say, (2, 3) and (4, 5) &#8212; on the Cartesian X-Y plane, but it doesn&#8217;t make sense to find the &#8220;distance&#8221; between the point (2, 3) on the X-Y plane, and the point at latitude 47.606, longitude -122.332.</p>
<p>I will say more about coordinate systems and the different factories available in RGeo in later articles. For now, two factories you will probably use often are the Cartesian factory we saw above; and the &#8220;spherical&#8221; geographic factory. This latter factory handles latitudes and longitudes, and supports basic spatial operations but is currently missing some of the more complex operations.</p>
<pre>geographic_factory = RGeo::Geographic.spherical_factory</pre>
<h2>RGeo Factories and Rails</h2>
<p>In <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a> of this series, we saw that <code>activerecord-postgis-adapter</code> exposes spatial column values as RGeo objects. Each of these objects, of course, has a factory that provides its coordinate system and context. Now we can look a little more closely at this process.</p>
<p>In the tutorial for part 2, we added this line to the Location model:</p>
<pre>class Location &lt; ActiveRecord::Base
  set_rgeo_factory_for_column(:latlon,
    RGeo::Geographic.spherical_factory(:srid =&gt; 4326))
end</pre>
<p>What this does is provide a specific factory for the latlon attribute of the Location model. In this case, we use the spherical geographic factory discussed above. When you get a Location from the database and read the latlon attribute, it returns a Point created by that factory. The &#8220;srid&#8221; argument controls the Spatial Reference ID, which must be set to 4326 for the PostGIS geographic type. We will cover SRIDs in a later article; for now, just think of it as a required parameter.</p>
<p>You can ask the model for the factory as follows:</p>
<pre>latlon_factory = Location.rgeo_factory_for_column(:latlon)</pre>
<p>Now you can use this factory to create values. In particular, if you do not want to use WKT to set a latlon value, you can set it directly from a point object created from this factory.</p>
<pre>loc3 = Location.create(:name =&gt; 'Columbia Tower')
loc3.latlon = latlon_factory.point(-122.330779, 47.604828)
loc3.save</pre>
<h2>RGeo vs. GeoRuby</h2>
<p>One question I am asked quite a bit is, how does RGeo compare with <a title="GeoRuby" href="http://georuby.rubyforge.org/" target="_blank">GeoRuby</a>. GeoRuby is an older Ruby library that provides classes for the SFS geometry objects. It is considerably smaller than RGeo, and somewhat easier to get started with. Indeed, I also started off using GeoRuby once upon a time, but I quickly decided that a fundamental redesign was necessary in order to support the functionality I needed. Among those:</p>
<ul>
<li>GeoRuby provides a small subset of the spatial operations defined by the SFS. For example, it computes distance between points but not distances involving lines or polygons, and it doesn&#8217;t do intersections or other such geometric operations. RGeo implements the entire SFS &#8212; every single operation. To accomplish this, it uses Geos, the same industry standard computational geometry library that PostGIS uses internally, so you can be confident of its speed and stability.</li>
<li>GeoRuby assumes most objects are in a flat Cartesian coordinate system; it generally does not handle different coordinate systems. The sole exception is that it provides specialized methods to measure distance across the globe, but they require that you keep track of the coordinate system yourself. RGeo automatically ensures that computations take place in the right coordinate system, and provides rich tools for managing and converting coordinate systems.</li>
<li>The original GeoRuby project has not been updated for a long time. There is a recent <a title="Nofxx fork of GeoRuby" href="http://github.com/nofxx/georuby" target="_blank">fork</a> that is being maintained somewhat more actively. But even the fork doesn&#8217;t look like it is likely to have the basic capabilities many non-trivial applications require, at least not anytime soon.</li>
</ul>
<p>That said, some of the early inspiration for RGeo did come from GeoRuby. Although RGeo&#8217;s design is markedly different, it was created to solve some of the same basic problems and so is something of a spiritual descendant.</p>
<h2>RGeo Documentation</h2>
<p>One thing missing right now with RGeo is a really good tutorial and/or user&#8217;s guide. However, the RDocs are fairly extensive and should often provide you with enough information to get started.</p>
<p>Most of the APIs that you will work with are documented as modules within the <a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature.html" target="_blank">RGeo::Feature</a> namespace. Factories should follow the API defined by <a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Factory.html" target="_blank">RGeo::Feature::Factory</a>, which specifies a method for constructing each type of spatial object. Each object type, in turn, has its own corresponding interface &#8212; for example, <a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Point.html" target="_blank">RGeo::Feature::Point</a> defines the interface for point objects. All these interfaces inherit from the base interface <a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Geometry.html" target="_blank">RGeo::Feature::Geometry</a>, which defines methods common to all spatial objects.</p>
<p>One important thing to note is that the interface modules in RGeo::Feature may not necessarily be included in the objects themselves. That is, it is not necessarily true that:</p>
<pre>point1.is_a?(RGeo::Feature::Point) # may not be true
factory.is_a?(RGeo::Feature::Factory) # also may not be true</pre>
<p>However, the objects will still &#8220;duck-type&#8221; (that is, implement the same methods as) the interface modules, so to find documentation on a particular object, you need only look at the RDocs for the relevant interface modules.</p>
<p>Here&#8217;s a map of the important interfaces. First, the factory interface:</p>
<ul>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Factory.html" target="_blank">RGeo::Feature::Factory</a></li>
</ul>
<p>Next, the interfaces corresponding to the types and subtypes defined by the SFS:</p>
<ul>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Geometry.html" target="_blank">RGeo::Feature::Geometry</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Point.html" target="_blank">RGeo::Feature::Point</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Curve.html" target="_blank">RGeo::Feature::Curve</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/LineString.html" target="_blank">RGeo::Feature::LineString</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Line.html" target="_blank">RGeo::Feature::Line</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/LinearRing.html" target="_blank">RGeo::Feature::LinearRing</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Surface.html" target="_blank">RGeo::Feature::Surface</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/Polygon.html" target="_blank">RGeo::Feature::Polygon</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/GeometryCollection.html" target="_blank">RGeo::Feature::GeometryCollection</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/MultiPoint.html" target="_blank">RGeo::Feature::MultiPoint</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/MultiCurve.html" target="_blank">RGeo::Feature::MultiCurve</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/MultiLineString.html" target="_blank">RGeo::Feature::MultiLineString</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/MultiSurface.html" target="_blank">RGeo::Feature::MultiSurface</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Feature/MultiPolygon.html" target="_blank">RGeo::Feature::MultiPolygon</a></li>
</ul>
<p>How do you create a factory in the first place? For the most part, you will use class methods provided for this purpose. These modules will contain methods for getting factories:</p>
<ul>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Cartesian.html" target="_blank">RGeo::Cartesian</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Geos.html" target="_blank">RGeo::Geos</a></li>
<li><a href="http://virtuoso.rubyforge.org/rgeo/RGeo/Geographic.html" target="_blank">RGeo::Geographic</a></li>
</ul>
<h2>Where to go from here</h2>
<p>We have taken a whirlwind tour of the basic features of RGeo. RGeo provides a deep set of tools, and at this point you should have enough background to do some pretty interesting geospatial analysis.</p>
<p>In addition to the RDocs for RGeo, I think the actual Simple Features Spec is essential background reading. It&#8217;s quite accessible and provides a useful overview of the data types and computations available.</p>
<p>The next topic for this series is likely to be an introduction to coordinate systems and projections. Stay tuned!</p>
<p><em>This is part 3 of my series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/88/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 2: Setting Up a Geospatial Rails App</title>
		<link>http://www.daniel-azuma.com/blog/archives/69</link>
		<comments>http://www.daniel-azuma.com/blog/archives/69#comments</comments>
		<pubDate>Mon, 28 Nov 2011 08:21:03 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[Rails]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=69</guid>
		<description><![CDATA[Before going in depth into any particular topic, I thought it would be useful to write a getting-started tutorial, walking through setting up and working with a simple example Rails app using RGeo. In this tutorial, we will: Install the &#8230; <a href="http://www.daniel-azuma.com/blog/archives/69">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Before going in depth into any particular topic, I thought it would be useful to write a getting-started tutorial, walking through setting up and working with a simple example Rails app using RGeo. In this tutorial, we will:</p>
<ul>
<li>Install the main software components we need for a geospatial application, including a spatial database and the needed Ruby libraries.</li>
<li>Set up a new Rails application configured to use the spatial database.</li>
<li>Create an ActiveRecord model with a spatial attribute.</li>
<li>Experiment with location data in the model.</li>
<li>Perform simple spatial queries.</li>
</ul>
<p>This should help you get started writing basic location features in Rails, giving you a feel for what the tools are and how they fit together.</p>
<p>This is part 2 of my series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<p><span id="more-69"></span></p>
<h2>Installing the software</h2>
<p>First, you&#8217;ll need a spatial database. Recent versions of <a title="MySQL" href="http://mysql.com/" target="_blank">MySQL</a> have basic spatial features built in, but I generally recommend using <a title="PostgreSQL" href="http://www.postgresql.org/" target="_blank">PostgreSQL</a> and installing the <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> add-on, because PostGIS is a much more feature complete and better supported product. For the rest of this example, I&#8217;ll assume you&#8217;re using PostGIS.</p>
<p>If you have a package manager such as apt-get, yum, <a title="MacPorts" href="http://www.macports.org/" target="_blank">MacPorts</a>, or <a title="Homebrew" href="http://mxcl.github.com/homebrew/" target="_blank">Homebrew</a>, that&#8217;s usually the easiest way to install PostGIS. Your package manager should also install PostGIS&#8217;s dependencies, including PostgreSQL itself and two important libraries: <a title="GEOS" href="http://geos.osgeo.org/" target="_blank">GEOS</a> and <a title="Proj" href="http://proj.osgeo.org/" target="_blank">Proj</a>.</p>
<p>If you need to build and install from source, you can download PostGIS from <a href="http://www.postgis.org/" target="_blank">http://www.postgis.org/</a> and review the instructions in the documentation section there. You will first need to build and install PostgreSQL (<a href="http://www.postgresql.org/" target="_blank">http://www.postgresql.org/</a>), GEOS (<a href="http://geos.osgeo.org/" target="_blank">http://geos.osgeo.org/</a>) and Proj (<a href="http://proj.osgeo.org/" target="_blank">http://proj.osgeo.org/</a>). The build configuration for PostGIS requires that you provide the paths to your installations of those three dependencies.</p>
<p>We&#8217;ll use the latest Rails 3.1 release for this example. Most of the gems you require will come with a simple <code>gem install rails</code>. However, to hook up with PostGIS, you&#8217;ll need a few more gems. Installing these gems will require you to know where the database software was installed. Specifically, you need the <code>--prefix</code> for the GEOS and Proj libraries, and you need the path to the <code>pg_config</code> binary in your PostgreSQL installation.</p>
<p>Obviously, these locations will depend on how you installed the software. If you used MacPorts on Mac OS X, things get installed in <code>/opt/local</code>. So here are the gem install commands for that case. If you used a different system, you&#8217;ll need to modify the paths accordingly.</p>
<pre><em>%</em> gem install pg -- --with-pg-config=/opt/local/bin/pg_config
<em>%</em> gem install rgeo -- --with-geos-dir=/opt/local \
    --with-proj-dir=/opt/local
<em>%</em> gem install activerecord-postgis-adapter</pre>
<p>The main gem you just installed is <code>activerecord-postgis-adapter</code>, which provides a special ActiveRecord adapter for talking to PostGIS and handling geospatial data.</p>
<h2>Setting up the Rails app</h2>
<p>Now, we&#8217;ll build our Rails app. I&#8217;ll assume you&#8217;re using the latest Rails 3.1.</p>
<pre><em>%</em> rails new geo_rails_test -d postgresql</pre>
<p>Rails out of the box doesn&#8217;t know about PostGIS, but if we set the database initially to PostgreSQL, Rails will be able to set up most of the configuration for you. Now we just need to do some tweaking.</p>
<p>Add the PostGIS ActiveRecord adapter to your <code>Gemfile</code>, by inserting this line:</p>
<pre>gem 'activerecord-postgis-adapter'</pre>
<p>Install the adapter into your Rails app by inserting this line into <code>config/application.rb</code>, just after the <code>require 'rails/all'</code> line:</p>
<pre>require 'active_record/connection_adapters/postgis_adapter/railtie'</pre>
<p>Now we need to modify <code>config/database.yml</code> to use the PostGIS adapter. There are several different strategies for handling the database, but I&#8217;ll choose a simple one to get you started. For each of the environments in your <code>database.yml</code>, do the following:</p>
<p>First, change the &#8220;<code>adapter</code>&#8221; from &#8220;<code>postgresql</code>&#8221; to &#8220;<code>postgis</code>&#8220;.</p>
<p>Next, add the following line to each environment:</p>
<pre>schema_search_path: "public,postgis"</pre>
<p>Next, add the following line to your development and test environments. The path should match the scripts directory installed by PostGIS. Assuming you used MacPorts and installed PostGIS 1.5.x, the path will probably be as follows, but you should modify it depending on where PostGIS was installed:</p>
<pre>script_dir: /opt/local/share/contrib/postgis-1.5</pre>
<p>You&#8217;ll set up <em>two</em> users in the database. One is the normal database user that your Rails app will connect as. The other needs to be a superuser, and is used to create the databases initially (i.e. using <code>rake db:create</code>). Creating a PostGIS database requires superuser privileges unless you set up a special template for it, and for simplicity, we won&#8217;t do that right now.</p>
<p>Your normal user should be specified by the &#8220;username&#8221; and &#8220;password&#8221; properties:</p>
<pre>username: geo_rails_test
password: &lt;your-password&gt;</pre>
<p>The superuser for creating the database should be specified by the &#8220;su_username&#8221; and &#8220;su_password&#8221; properties:</p>
<pre>su_username: geo_rails_test_creator
su_password: &lt;your-password&gt;</pre>
<p>You will, of course, also need to create these users in the database. You can do so by running the PostgreSQL &#8220;<code>createuser</code>&#8221; command:</p>
<pre><em>%</em> createuser --pwprompt --superuser geo_rails_test_creator
<em>%</em> createuser --pwprompt geo_rails_test</pre>
<p>Now, you should be able to create the database. A PostGIS database has several internal tables and a bunch of functions and other objects built in. If the configuration above has been set up correctly, the <code>activerecord-postgis-adapter</code> should be able to create all of this internal structure for you automatically.</p>
<pre><em>%</em> rake db:create</pre>
<p>The hard part is over! Now we can start building models with location features.</p>
<h2>A model with location</h2>
<p>Most databases will support the familiar data types such as strings, integers, floating-point values, booleans, and the like. Spatial databases are simply normal databases that include support for several additional data types commonly used to represent location. These data types can include points on the surface of the earth (latitude and longitude) as well as lines, polygons, and other shapes. RGeo, together with the <code>activerecord-postgis-adapter</code>, lets you interact with these data types just like you would any other attribute on an ActiveRecord object.</p>
<p>For this tutorial, we will create a simple model that includes a location. If you have represented location in an database before, you may have simply added two columns to your database, one for latitude and one for longitude. In a spatial database, you represent a location with a single column whose type is &#8220;point&#8221;. A point type is simply an ordered pair of latitude and longitude.</p>
<pre><em>%</em> rails generate model Location name:string latlon:point</pre>
<p>Before we migrate the database, let&#8217;s look at the migration that was generated. It should look something like this:</p>
<pre>class CreateLocations &lt; ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :latlon
      t.timestamps
    end
  end
end</pre>
<p>Notice that the &#8220;<code>latlon</code>&#8221; column is of type &#8220;point&#8221;. Spatial data types such as &#8220;point&#8221; often have further configurations that you may need to apply&#8211; similar to how, for example, string columns can have length limits, or other types can have various constraints. In our case we want the latlon column to contain not just any two-dimensional coordinate, but specifically a latitude and longitude. To specify this, we will add the &#8220;geographic&#8221; constraint to it. Change the &#8220;<code>latlon</code>&#8221; column in the migration and add &#8220;<code>:geographic =&gt; true</code>&#8220;. Now your migration should look like:</p>
<pre>class CreateLocations &lt; ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :latlon, :geographic =&gt; true
      t.timestamps
    end
  end
end</pre>
<p>One last thing we should do is configure the model class so it knows what kind of geospatial data you are storing. For the most part, the <code>activerecord-postgis-adapter</code> is able to infer most of this information from the database. However, this inference is not perfect, and anyway it is generally good practice to set this explicitly, for documentation sake if for nothing else.</p>
<p>Open your <code>app/models/location.rb</code> and add a line to the Location class:</p>
<pre>class Location &lt; ActiveRecord::Base
  set_rgeo_factory_for_column(:latlon,
    RGeo::Geographic.spherical_factory(:srid =&gt; 4326))
end</pre>
<p>That line says, for the &#8220;<code>latlon</code>&#8221; field, use a spherical geographic coordinate system with spatial reference ID 4326. This means, computations done in Ruby will assume a spherical earth, and the spatial reference ID should be set to 4326 to match what PostGIS expects for a &#8220;geographic&#8221; column. In many cases, you can configure each geographic column in this same way. For now, don&#8217;t worry too much about the details. We&#8217;ll cover coordinate systems and spatial references in a later article.</p>
<p>Now run the migration to get this table into your database.</p>
<pre><em>%</em> rake db:migrate</pre>
<p>Now that we&#8217;ve got the model set up and the database migrated, let&#8217;s take a look into the actual location data in the model.</p>
<h2>Working with location data</h2>
<p>For simplicity, let&#8217;s dive into the rails console to start playing with our new model.</p>
<pre><em>%</em> rails console
<em>Loading development environment (Rails 3.1.3)</em>
<em>ruby-1.9.3-p0 :001 &gt;</em></pre>
<p>An ActiveRecord model with spatial data is just the same as any other ActiveRecord model. We can create and start working with it directly in the console:</p>
<pre><em>ruby-1.9.3-p0 :001 &gt;</em> loc = Location.create
 <em>=&gt; #&lt;Location id: 1, name: nil, latlon: nil, created_at: "2011-11-28 02:52:10",</em>
     <em>updated_at: "2011-11-28 02:52:10"&gt;</em></pre>
<p>Our model has two attributes, the &#8220;<code>name</code>&#8221; string and the &#8220;<code>latlon</code>&#8221; point. They started off as nil, but we can set them.</p>
<pre><em>ruby-1.9.3-p0 :002 &gt;</em> loc.name = "Pirq Headquarters"
<em>ruby-1.9.3-p0 :003 &gt;</em> loc.latlon = "POINT(-122.193963 47.675086)"</pre>
<p>Note the string that we used to set the latlon field. This is a standard syntax called &#8220;WKT&#8221; (Well-Known Text), which is commonly used in spatial applications. In the WKT representation of a location, notice that the longitude comes first, and there is no comma between longitude and latitude. The model understands WKT syntax when you set data, but internally converts it to a &#8220;point&#8221; object.</p>
<pre><em>ruby-1.9.3-p0 :004 &gt;</em> loc.latlon
 <em>=&gt; #&lt;RGeo::Geographic::SphericalPointImpl:0x817d61f4</em>
     <em>"POINT (-122.193963 47.675086)"&gt;</em></pre>
<p>A point object is one of the spatial Ruby classes provided by RGeo. Using these spatial classes, you can perform powerful geometric and geographic computations and analyses, or you can simply use them to pass data around. We will cover some of their capabilities in later entries in this blog series. For now, here&#8217;s a quick example, measuring the distance between two Location objects:</p>
<pre><em>ruby-1.9.3-p0 :005 &gt;</em> loc2 = Location.create(:name =&gt; 'Space Needle',
                        :latlon =&gt; 'POINT(-122.349341 47.620471)')
<em>ruby-1.9.3-p0 :006 &gt;</em> puts "Distance is %.02f meters" %
                      loc.latlon.distance(loc2.latlon)
<em>Distance is 13143.18 meters</em></pre>
<p>You do not have to set a spatial field using WKT; you can also set it directly using a spatial object such as a point object. For example, you can set the Pirq Headquarters location to be the same as the Space Needle location:</p>
<pre><em>ruby-1.9.3-p0 :007 &gt;</em> loc.latlon = loc2.latlon
 <em>=&gt; #&lt;RGeo::Geographic::SphericalPointImpl:0x8175f234</em>
     <em>"POINT (-122.349341 47.620471)"&gt;</em></pre>
<p>Spatial attributes are loaded and saved in the same way as any other attribute on your model. So until you save the &#8220;loc&#8221; object, the latlon value in the database remains unchanged.</p>
<h2>Querying by location</h2>
<p>The real power of a spatial database such as PostGIS comes from its query capabilities. Spatial databases typically provide a rich set of SQL functions that you can use to build a wide variety of location-based queries.</p>
<p>Let&#8217;s go through a couple of simple examples, querying against the two locations we just created. (We&#8217;ll assume you didn&#8217;t save loc in the previous example, so the two model objects still have their different latlon values.)</p>
<p>These first two queries find the objects, respectively less than and greater than 10 kilometers from a particular point (the location of the Columbia Tower in downtown Seattle).</p>
<pre><em>ruby-1.9.3-p0 :008 &gt;</em> Location.where("ST_Distance(latlon, "+
                   "'POINT(-122.330779 47.604828)') &lt; 10000").
                   map{ |ar| ar.name }
 <em>=&gt; ["Space Needle"]</em>
<em>ruby-1.9.3-p0 :009 &gt;</em> Location.where("ST_Distance(latlon, "+
                   "'POINT(-122.330779 47.604828)') &gt; 10000").
                   map{ |ar| ar.name }
 <em>=&gt; ["Pirq Headquarters"]</em></pre>
<p>The following query draws a triangle and finds the objects within the triangle.</p>
<pre><em>ruby-1.9.3-p0 :010 &gt;</em> Location.where("ST_Intersects(latlon, "+
                   "'POLYGON((-122.19 47.68, -122.2 47.675, "+
                   "-122.19 47.67, -122.19 47.68))')").
                   map{ |ar| ar.name }
 <em>=&gt; ["Pirq Headquarters"]</em></pre>
<p>See the PostGIS documentation for a full set of the various spatial SQL functions you can use in your queries.</p>
<p>Once you have a lot of data in your database, you&#8217;ll need to add a spatial index to speed up location queries, similar to how you index any other column. However, there are some nuances in how you construct a spatial index, and especially how you should write queries to take advantage of a spatial index. This involves a separate discussion that I&#8217;ll cover in a later article.</p>
<h2>Where to go from here</h2>
<p>This has been a bit of a long entry, but an important first step towards working with spatial data in Rails. We&#8217;ve gone through setting up a Rails application with spatial capabilities, and investigated a few of the basic ways in which we can store, manipulate, and query spatial data.</p>
<p>You may find it useful to start looking through the <a title="PostGIS online documentation" href="http://postgis.org/documentation/manual-1.5/" target="_blank">documentation</a> for PostGIS. This will give more detailed information about the how the database handles and queries spatial data. The <a title="readme" href="http://virtuoso.rubyforge.org/activerecord-postgis-adapter/README_rdoc.html" target="_blank">readme</a> for the activerecord-postgis-adapter gem provides more information about configuring the database connection from Rails.</p>
<p>In upcoming articles, we&#8217;ll start looking a little more deeply at the various topics we&#8217;ve touched on here, including working with RGeo&#8217;s spatial Ruby classes, working with coordinate systems, and setting up spatial indexes. Stay tuned, and let&#8217;s bring Rails down to earth!</p>
<p><em>This is part 2 of my series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/69/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>

