<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Daniel Azuma</title>
	<atom:link href="http://www.daniel-azuma.com/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.daniel-azuma.com/blog</link>
	<description>Theology and software development</description>
	<lastBuildDate>Sat, 05 May 2012 00:43:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>RailsConf 2012 Talk Notes</title>
		<link>http://www.daniel-azuma.com/blog/archives/256</link>
		<comments>http://www.daniel-azuma.com/blog/archives/256#comments</comments>
		<pubDate>Mon, 23 Apr 2012 19:03:47 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[RailsConf]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=256</guid>
		<description><![CDATA[In this post is a collection of useful material related to my RailsConf 2012 talk, entitled Getting Down To Earth: Geospatial Analysis With Rails. You can download the slides for the talk, and I&#8217;ve provided links to all the software &#8230; <a href="http://www.daniel-azuma.com/blog/archives/256">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In this post is a collection of useful material related to my RailsConf 2012 talk, entitled <a title="Getting Down To Earth: Geospatial Analysis With Rails" href="http://railsconf2012.com/sessions/19" target="_blank">Getting Down To Earth: Geospatial Analysis With Rails</a>. You can download the slides for the talk, and I&#8217;ve provided links to all the software mentioned (as well as some that wasn&#8217;t mentioned but that I think is useful or at least interesting). Finally, I&#8217;ve curated links to a bunch of articles, reference material, and other information for those interested in digging deeper into geospatial features.</p>
<p>This is by no means a complete list, and I&#8217;ll be adding more links over the next few days so check back often. And please leave me a note if you have any suggestions!</p>
<p><em><strong>Update:</strong></em> thanks to Confreaks, the video is now up as well.</p>
<h1><span id="more-256"></span></h1>
<h1>A few notes post-talk&#8230;</h1>
<p>Thank you to those of you who attended my talk this morning! A few quick notes before getting into the links and downloads.</p>
<ul>
<li>One thing I should have made more clear is that your choice of coordinate system is not limited to just lat/long or Mercator. Those two will be useful for many cases, but there are a plethora of other coordinate systems, any one or more of which may be appropriate for your application.</li>
<li>We covered PostGIS in this talk because it&#8217;s currently probably the best and most feature complete open source spatial database, and because it&#8217;s the best supported from our stack. But there are alternatives that may be more or less useful. MongoDB does have limited spatial capabilities, as does MySQL. Oracle and MS SQL Server are supposedly pretty full-featured in their spatial capability. There&#8217;s even a plugin for Sqlite that works quite well.</li>
<li>I&#8217;m serious about Squeel. Use it. It&#8217;s awesome. Last year I was working on coming up with a solution for writing these complex spatial queries by extending Arel. But then I found Squeel, which does it a dozen times better than I could have come up with. Thank you Ernie Miller for this tool!</li>
</ul>
<p>Now on to the links&#8230;</p>
<h1>Talk-Related Material</h1>
<ul>
<li><a title="Talk slides on SpeakerDeck" href="http://speakerdeck.com/u/dazuma/p/getting-down-to-earth-geospatial-analysis-with-rails" target="_blank">Talk slides on SpeakerDeck</a></li>
<li>Talk slides in PDF format &#8212; <a title="Talk slides full size" href="http://www.daniel-azuma.com/files/railsconf2012-georails.pdf" target="_blank">Full size</a> (about 20 megs) or <a title="Talk slides compressed" href="http://www.daniel-azuma.com/files/railsconf2012-georails-small.pdf" target="_blank">compressed</a> (about 3.6 megs)</li>
<li><a title="Talk video from Confreaks" href="http://www.confreaks.com/videos/856-railsconf2012-getting-down-to-earth-geospatial-analysis-with-rails" target="_blank">Video from Confreaks</a></li>
<li><a title="Talk notes" href="http://github.com/newhavenrb/railsconf2012/blob/master/Getting-Down-To-Earth:-Geospatial-Analysis-With-Rails.md" target="_blank">Talk notes</a> &#8212; Posted by NewHaven.rb</li>
</ul>
<h1>Software &#8212; Databases</h1>
<ul>
<li><a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> &#8212; spatial plugin for PostgreSQL</li>
<li><a title="Spatialite" href="http://www.gaia-gis.it/gaia-sins/" target="_blank">Spatialite</a> &#8212; spatial plugin for Sqlite3</li>
<li><a title="GeoCouch" href="https://github.com/couchbase/geocouch" target="_blank">GeoCouch</a> &#8212; spatial plugin for CouchDB and Couchbase</li>
<li><a title="Solr Spatial Search" href="http://wiki.apache.org/solr/SpatialSearch" target="_blank">Solr Spatial Search</a> &#8212; available in recent versions of Solr</li>
</ul>
<h1>Software &#8212; Ruby</h1>
<ul>
<li><a title="rgeo" href="http://github.com/dazuma/rgeo" target="_blank">rgeo</a> &#8212; Spatial data types</li>
<li><a title="rgeo-shapefile" href="http://github.com/dazuma/rgeo-shapefile" target="_blank">rgeo-shapefile</a> &#8212; Shapefile reader</li>
<li><a title="rgeo-geojson" href="http://github.com/dazuma/rgeo-geojson" target="_blank">rgeo-geojson</a> &#8212; GeoJSON reader/writer</li>
<li><a title="ffi-geos" href="http://github.com/dark-panda/ffi-geos" target="_blank">ffi-geos</a> &#8212; Low-level ruby bindings to GEOS library</li>
<li><a title="Squeel" href="http://erniemiller.org/projects/squeel/" target="_blank">squeel</a> &#8212; ActiveRecord enhancement for query writing</li>
<li><a title="activerecord-postgis-adapter" href="http://github.com/dazuma/activerecord-postgis-adapter" target="_blank">activerecord-postgis-adapter</a> &#8212; ActiveRecord adapter for PostGIS</li>
<li><a title="activerecord-spatialite-adapter" href="http://github.com/dazuma/activerecord-spatialite-adapter" target="_blank">activerecord-spatialite-adapter</a> &#8212; ActiveRecord adapter for Spatialite</li>
<li><a title="mongoid_geospatial" href="http://github.com/nofxx/mongoid_geospatial" target="_blank">mongoid_geospatial</a> &#8212; Mongoid extension with geospatial capabilities</li>
<li><a title="Ruby Geocoder" href="http://www.rubygeocoder.com/" target="_blank">ruby geocoder</a> &#8212; Integration with geocoding services</li>
<li><a title="GeoRuby" href="http://georuby.rubyforge.org/" target="_blank">GeoRuby</a> and <a title="SpatialAdapter" href="http://github.com/fragility/spatial_adapter" target="_blank">SpatialAdapter</a> &#8212; Older libraries for spatial data types and ActiveRecord integration (not compatible with rgeo)</li>
</ul>
<h1>Software &#8212; Client Side</h1>
<div>
<ul>
<li><a title="Heatmap.js" href="http://www.patrick-wied.at/static/heatmapjs/" target="_blank">heatmap.js</a> &#8212; A heatmap implementation for Javascript</li>
<li><a title="Thermo.js" href="http://github.com/dazuma/thermo.js" target="_blank">thermo.js</a> &#8212; Another heatmap implementation for Javascript</li>
<li><a title="Heatcanvas.js" href="http://github.com/sunng87/heatcanvas" target="_blank">heatcanvas.js</a> &#8212; Yet another heatmap implementation for Javascript</li>
<li><a title="OpenLayers" href="http://openlayers.org/" target="_blank">OpenLayers</a> &#8212; An open-source map/visualization tool; an alternative to Google Maps</li>
</ul>
</div>
<h1>Software &#8212; Other Libraries</h1>
<ul>
<li><a title="libgeos" href="http://trac.osgeo.org/geos/" target="_blank">libgeos</a> &#8212; C library for geometric analysis</li>
<li><a title="libproj" href="http://trac.osgeo.org/proj/" target="_blank">libproj</a> &#8212; C library for coordinate transforms</li>
<li><a title="GDAL" href="http://www.gdal.org/" target="_blank">libgdal</a> &#8212; C library for rasters</li>
</ul>
<h1>Articles/Blogs/Resources Online</h1>
<ul>
<li><a title="Geo-Rails Series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">Geo-Rails series</a> &#8212; Nine (and counting) articles on geospatial development with Rails</li>
<li><a title="Map Projections" href="http://www.progonos.com/furuti/MapProj/Normal/TOC/cartTOC.html" target="_blank">Cartographical Map Projections</a> &#8212; A good introduction to projected coordinate systems</li>
<li><a title="SpatialReference.org" href="http://spatialreference.org/" target="_blank">SpatialReference.org</a> &#8212; Source for coordinate system information.</li>
<li><a title="Shapefile spec" href="http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf" target="_blank">Shapefile spec</a> from ESRI</li>
<li><a title="SimpleGeo migration options" href="https://support.urbanairship.com/customer/portal/articles/311996-simplegeo-migration-options" target="_blank">SimpleGeo migration guide</a> &#8212; A very useful set of links put together by Urban Airship after they bought (and dismantled) SimpleGeo, including sources for data, services, and implementation guides.</li>
</ul>
<h1>Books and Other Reading</h1>
<ul>
<li><a title="GIS A Computing Perspective" href="http://www.amazon.com/GIS-Computing-Perspective-Second-Edition/dp/0415283752" target="_blank">GIS: A Computing Perspective</a> by Michael Worboys and Matt Duckham &#8212; Very useful overview of the major computing challenges involving geometry, rasters, databases, and so forth, including discussions on modeling uncertainty and architectural considerations. Highly recommended.</li>
<li><a title="Geographic Information Analysis" href="http://www.amazon.com/Geographic-Information-Analysis-David-OSullivan/dp/0470288574/" target="_blank">Geographic Information Analysis</a> by David O&#8217;Sullivan and David J. Unwin &#8212; Good source for methods of statistical analysis and visualization of geo-data.</li>
<li><a title="GIS For Web Developers" href="http://pragprog.com/book/sdgis/gis-for-web-developers" target="_blank">GIS For Web Developers</a> by Scott Davis (published by Pragmatic Programmers) &#8212; Somewhat dated, and based on Java. But still a lot of useful conceptual material. Paper book is out of print, but eBook is available.</li>
</ul>
<h1>Data Sources</h1>
<ul>
<li><a title="TZ Timezone Shapefiles" href="http://efele.net/maps/tz/world/" target="_blank">TZ Timezone Shapefiles</a> &#8212; Polygon boundaries of world timezones</li>
<li><a title="TIGER/Line files" href="http://www.census.gov/geo/www/tiger/shp.html" target="_blank">TIGER/Line files</a> from the US Census &#8212; lots of legal and statistical geographic data</li>
</ul>
<h1>Mailing Lists</h1>
<ul>
<li><a title="rgeo-users mailing list" href="http://groups.google.com/group/rgeo-users" target="_blank">rgeo-users</a> mailing list &#8212; Help and discussion group for RGeo users</li>
<li><a title="GeoRails mailing list" href="http://groups.google.com/group/georails" target="_blank">georails</a> mailing list &#8212; Discussion for Rails and geospatial dev (not very active)</li>
<li><a title="GeoRuby mailing list" href="http://groups.google.com/group/georuby" target="_blank">georuby</a> mailing list &#8212; Discussion for Ruby and geospatial dev (not very active)</li>
</ul>
<h1>Organizations and Conferences</h1>
<ul>
<li><a title="OGC" href="http://www.opengeospatial.org/" target="_blank">Open Geospatial Consortium</a> &#8212; Lots of standards and info</li>
<li><a title="OSGeo" href="http://www.osgeo.org/" target="_blank">Open Source Geospatial Foundation</a> &#8212; Umbrella organization for lots of geospatial software</li>
<li><a title="Foss4g" href="http://foss4g.org/" target="_blank">FOSS4G</a> &#8212; International open source geospatial software conference</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/256/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>RGeo updates for Rails 3.2</title>
		<link>http://www.daniel-azuma.com/blog/archives/247</link>
		<comments>http://www.daniel-azuma.com/blog/archives/247#comments</comments>
		<pubDate>Thu, 23 Feb 2012 00:33:50 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Rails]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=247</guid>
		<description><![CDATA[I just released several updates to RGeo and related libraries, focusing on bug fixes and Rails 3.2 compatibility issues. rgeo 0.3.4 fixes a segfault under Ruby 1.8.7, and supports prepared geometries in the FFI-Geos factory. activerecord-postgis-adapter 0.4.1 fixes some Rails &#8230; <a href="http://www.daniel-azuma.com/blog/archives/247">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I just released several updates to RGeo and related libraries, focusing on bug fixes and Rails 3.2 compatibility issues.</p>
<ul>
<li><strong>rgeo 0.3.4</strong> fixes a segfault under Ruby 1.8.7, and supports prepared geometries in the FFI-Geos factory.</li>
<li><strong>activerecord-postgis-adapter 0.4.1</strong> fixes some Rails 3.2 compatibility issues.</li>
<li><strong>activerecord-mysql2spatial-adapter 0.4.2</strong> also fixes some Rails 3.2 compatibility issues.</li>
</ul>
<p>Just update your gems to get these fixes. Many thanks to those who reported issues and submitted patches on Github. They were very helpful.</p>
<p>I&#8217;m still investigating a couple of issues in the spatialite adapter. I hope to get those resolved in a few days.</p>
<p>One other note. I&#8217;m not using Rails 3.2 on my own projects yet. (In fact, for the most part I&#8217;m still on 3.0.x.) So there may still be regressions and compatibility issues on Rails 3.2 that I haven&#8217;t found yet. Please comment here or email me if you find anything.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/247/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 9: The PostGIS spatial_ref_sys Table and You</title>
		<link>http://www.daniel-azuma.com/blog/archives/239</link>
		<comments>http://www.daniel-azuma.com/blog/archives/239#comments</comments>
		<pubDate>Mon, 06 Feb 2012 07:31:10 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[PostGIS]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=239</guid>
		<description><![CDATA[When you create a spatial database using PostGIS, you may notice that PostGIS automatically installs a table called &#8220;spatial_ref_sys&#8221;. This is a standard table for spatial databases, as required by the Open Geospatial Consortium&#8217;s specification. It defines which SRIDs are &#8230; <a href="http://www.daniel-azuma.com/blog/archives/239">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When you create a spatial database using PostGIS, you may notice that PostGIS automatically installs a table called &#8220;spatial_ref_sys&#8221;. This is a standard table for spatial databases, as required by the Open Geospatial Consortium&#8217;s specification. It defines which SRIDs are allowed in your geometries, and provides information about the corresponding coordinate systems.</p>
<p>In this article, we&#8217;ll take a brief look at the <code>spatial_ref_sys</code> table and how you can use it in your application. We&#8217;ll cover:</p>
<ul>
<li>What&#8217;s useful about the <code>spatial_ref_sys</code> table</li>
<li>Where the <code>spatial_ref_sys</code> data comes from, and how you can populate your own custom data.</li>
<li>Accessing <code>spatial_ref_sys</code> data from Ruby using RGeo&#8217;s <code>SRSDatabase</code></li>
</ul>
<p>This is part 9 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-239"></span>So what&#8217;s in the <code>spatial_ref_sys</code> table anyway?</h2>
<p>The <code>spatial_ref_sys</code> table is defined by an OGC specification entitled <a title="Simple Features for SQL specification" href="http://www.opengeospatial.org/standards/sfs" target="_blank">Simple Feature Access Part 2: SQL Option</a>. This is a companion to the Simple Features <a title="Simple Features Access specification" href="http://www.opengeospatial.org/standards/sfa" target="_blank">specification</a> we covered in earlier articles. It takes the standard data types and operations and specifies how such data should appear in an SQL-based relational database. Many of the &#8220;<code>ST_*</code>&#8221; functions that we&#8217;ve used when interacting with PostGIS are defined in this specification.</p>
<p>We recall from <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a> that when you&#8217;re working with spatial data, every coordinate references a coordinate system that defines what the coordinate means&#8212;whether it is latitude and longitude, or feet from your front door, or light-years from Alpha Centauri. Those coordinate systems are generally represented by a numeric ID reference known as the SRID. <code>Spatial_ref_sys</code> is merely a table of known coordinate systems keyed by their SRID. According to the spec:</p>
<p style="padding-left: 30px;"><em>Every Geometry Column is associated with a Spatial Reference System. The Spatial Reference System identifies the coordinate system for all geometric objects stored in the column, and gives meaning to the numeric coordinate values for any geometric object stored in the column. &#8230; The Spatial Reference System Identifier (SRID) constitutes a unique integer key for a Spatial Reference System within a database.</em></p>
<p>How is this useful? Generally, <code>spatial_ref_sys</code> will provide an actual definition of the coordinate system for each SRID. This theoretically provides enough information to correctly interpret every piece of coordinate data in your database, and even&#8212;where possible&#8212;to convert the data into whatever coordinate system you need.</p>
<p>According to the spec, all implementations of <code>spatial_ref_sys</code> include these columns:</p>
<ul>
<li><strong>srid</strong>: The numeric SRID. This should be the table&#8217;s primary key.</li>
<li><strong>auth_name</strong>: An authority name as a string. This is set if this coordinate system is specified by an outside authority such as EPSG.</li>
<li><strong>auth_srid</strong>: The numeric ID of the coordinate system in the above authority&#8217;s catalog.</li>
<li><strong>srtext</strong>: The Well-Known-Text (WKT) representation of the coordinate system (as we described in part 4).</li>
</ul>
<p>If you are using PostGIS, you&#8217;ll notice <code>spatial_ref_sys</code> has one more non-standard but very useful column:</p>
<ul>
<li><strong>proj4text</strong>: The Proj4 representation of the coordinate system.</li>
</ul>
<h2>Where does the data come from and what does it do?</h2>
<p>In many cases, a spatial database will prepopulate <code>spatial_ref_sys</code> for you with a standard set of EPSG data. If you are using PostGIS, this is handled by the <code>spatial_ref_sys.sql</code> script, which gets run when you create a spatial database. If you are using Rails and follow the steps in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>, the activerecord-postgis-adapter will do this for you automatically when you create your database.</p>
<p>The EPSG spatial reference database is such a universal standard that in most cases it is probably best to use one of its coordinate systems and its corresponding SRID. This will maximize the chances that your SRIDs will match those used by any external data sources you will interact with&#8212;meaning your data will be easily portable. However, you may run into a case where you need to define your own coordinate system, perhaps because you&#8217;re using an unusual map projection. In such cases, you can add rows to the <code>spatial_ref_sys</code> table. You can also delete SRID rows that you don&#8217;t need. It is, after all, merely a table in your database.</p>
<p>The main caveat is that most spatial databases (including PostGIS) will establish a foreign key constraint between SRIDs and this table. This means that your data can&#8217;t just choose any SRID it wants. The SRID has to <em>exist</em>&#8212;and by &#8220;exist&#8221;, we simply mean it has to be present in the <code>spatial_ref_sys</code> table. So you just need to make sure that you don&#8217;t delete any SRIDs that you need, and you add any that aren&#8217;t provided by default.</p>
<p>Some databases will provide additional tools that leverage the <code>spatial_ref_sys</code> information. PostGIS, for example, provides the SQL function <code>ST_Transform()</code>, which lets you transform a geometry from one coordinate system to another. When you call it, you provide the SRID of the desired coordinate system. PostGIS then looks up both the original and the target SRIDs in <code>spatial_ref_sys</code>, and uses the coordinate system information there to figure out how to compute the transformation.</p>
<h2>Accessing the <code>spatial_ref_sys</code> from Ruby</h2>
<p>The <a title="RGeo gem" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> library provides a convenient way to look up coordinate systems from the <code>spatial_ref_sys</code> table. If you&#8217;re already using PostGIS and activerecord-postgis-adapter, it&#8217;s quite easy:</p>
<pre>srs_database = RGeo::CoordSys::SRSDatabase::ActiveRecordTable.new
entry = srs_database.get(4326)
entry.identifier  # =&gt; 4326
entry.name        # =&gt; "WGS 84"
entry.proj4.to_s  # =&gt; " +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"</pre>
<p>You can now, for example, create a factory using the SRID (from <code>entry.identifier</code>) and the Proj4 object (from <code>entry.proj4</code>). As we saw in <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, you generally need to provide Proj4 information if you are going to be converting data between coordinate systems.</p>
<p>As a convenient shorthand, some factories let you pass in an <code>srs_database</code> object when you construct a factory. The factory constructor will then go through the process of looking up the coordinate system definition and extracting the Proj4 specification. For example:</p>
<pre>srs_database = RGeo::CoordSys::SRSDatabase::ActiveRecordTable.new
my_factory = RGeo::Geos.factory(:srs_database =&gt; srs_database, :srid =&gt; 3785)
my_factory.proj4.to_s  # =&gt; " +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +units=m +k=1.0 +nadgrids=@null +no_defs"</pre>
<h2>Other SRSDatabase sources</h2>
<p>The <code>spatial_ref_sys</code> table is not the only source of coordinate system information. The Proj4 library itself installs a bunch of data files that contain most of the EPSG codes and other information. There are also online services that you can use to look up coordinate systems; one particularly important one is <a title="spatialreference.org web site" href="http://spatialreference.org" target="_blank">spatialreference.org</a>.</p>
<p>The SRSDatabase mechanism in RGeo supports a common interface for coordinate system databases. In the section above, we looked up EPSG 4326 from the <code>spatial_ref_sys</code> table using the <code>ActiveRecordTable</code> class. Alternately, we can use the <code>SrOrg</code> class to look up information from spatialreference.org:</p>
<pre>srs_database = RGeo::CoordSys::SRSDatabase::SrOrg.new('EPSG')
entry = srs_database.get(4326)
# ...</pre>
<p>You can choose an appropriate source of coordinate system information based on the requirements of your application. In many cases, however, I recommend using the <code>spatial_ref_sys</code> table because it is convenient and readily available. See the RDocs for RGeo for more information on connecting to different SRSDatabase sources.</p>
<h2>Where to go from here</h2>
<p>In this article, we covered <code>spatial_ref_sys</code>, one of the standard tables that is included in most spatial databases. For more information on this table and how it is implemented in the <a title="PostGIS spatial database" href="http://www.postgis.org/" target="_blank">PostGIS</a> database, see the <a title="PostGIS documentation" href="http://www.postgis.org/documentation/" target="_blank">PostGIS documentation</a> online. The official specification of the table is available in the OGS <a title="OGC Simple Features for SQL spec" href="http://www.opengeospatial.org/standards/sfs" target="_blank">Simple Features for SQL</a> spec.</p>
<p>RGeo provides basic convenience tools for accessing coordinate system information from the <code>spatial_ref_sys</code> table. We covered a few basic examples in this article. For detailed information, see the RGeo::CoordSys::SRSDatabase module in the <a title="RGeo rdoc reference" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo documentation</a>.</p>
<p>Observant readers may notice that PostGIS includes one more automatic table, called <code>geometry_columns</code>. I may cover this table in a later article that digs deeper into PostGIS, but if you&#8217;re interested now, it&#8217;s described in the PostGIS documentation.</p>
<p><em>This is part 9 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/239/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Setting the Database Free with ActiveRecord&#8217;s Connection API</title>
		<link>http://www.daniel-azuma.com/blog/archives/216</link>
		<comments>http://www.daniel-azuma.com/blog/archives/216#comments</comments>
		<pubDate>Thu, 26 Jan 2012 01:26:13 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[ActiveRecord]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Rails]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=216</guid>
		<description><![CDATA[TL;DR: ActiveRecord is more than just an ORM. It also provides a convenient common interface for writing direct SQL queries, for those times when you need to access your database&#8217;s advanced features. This article provides an introduction to the low-level &#8230; <a href="http://www.daniel-azuma.com/blog/archives/216">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>TL;DR</strong>: <em>ActiveRecord is more than just an ORM. It also provides a convenient common interface for writing direct SQL queries, for those times when you need to access your database&#8217;s advanced features. This article provides an introduction to the low-level ActiveRecord Connection API, which you can use to bypass ActiveRecord &#8220;models&#8221; and work directly with your database.</em></p>
<h2><span id="more-216"></span>My story</h2>
<p>When I first started working with Rails back in 2006, I followed the common practice of abstracting all my data and business logic into ActiveRecord &#8220;models.&#8221; This worked out for a short time, but soon I realized that it didn&#8217;t scale all that well to complex applications. There were a number of reasons, some of which are now finally being discussed and addressed in the community.</p>
<p>In any case, what followed was the Big Refactor of my Rails development process and architecture. On the one hand, I rediscovered Ruby&#8217;s object-oriented roots, separating behavior from persistence, and building higher-level business objects focused on actually modeling my system. And on the other hand, I rediscovered the database, writing more and more custom SQL to leverage its capabilities.</p>
<p>I still use ActiveRecord where it suits the task at hand. In many cases, it makes sense to treat rows of a table as data objects principally identified by a numeric ID, and in those cases ActiveRecord still shines. However, in other cases I&#8217;m finding the ORM more of a hindrance than a help. Many of my model implementations have begun bypassing ActiveRecord-based persistence altogether, instead creating plain old tables and querying with custom SQL.</p>
<p>The good news is that ActiveRecord provides some amount of support for low-level queries. You can share ActiveRecord&#8217;s database connection, and you don&#8217;t need to know the API of the underlying database driver, making it convenient to add custom SQL incrementally to your Rails application. In this article, I&#8217;ll provide an overview of ActiveRecord&#8217;s connection API, a convenient low-level interface for writing raw SQL and interacting with your database on its own terms.</p>
<h2>Obtaining a Low-Level Connection</h2>
<p>Hidden underneath your ActiveRecord class is a useful low-level object called the <em>ActiveRecord connection adapter</em>. It wraps and abstracts away the underlying database-specific driver, and provides a common interface for database tasks such as creating and destroying databases, creating and modifying tables, inserting, updating, and deleting data, running queries, and managing transactions. Normally, the connection adapter is used internally by ActiveRecord, but you can access it yourself if you want to talk to the database directly without using ActiveRecord &#8220;models.&#8221;</p>
<p>To obtain a connection adapter object, simply call the <code>connection</code> method on your ActiveRecord class or any ActiveRecord object:</p>
<pre>connection = User.connection
obj = User.find(1)
connection = obj.connection</pre>
<p>Which ActiveRecord class should you use? In most Rails applications, you talk to just one database, as defined in your database.yml file. For such an application, every ActiveRecord class, including <code>ActiveRecord::Base</code>, will give you the same connection. So in those cases, it doesn&#8217;t matter which class you use. However, if your Rails application connects to a secondary database for some ActiveRecord classes (using the <code>establish_connection</code> method) then those classes will yield a connection object pointing at the secondary database. In those cases, you will need to pay attention to which database you need to talk to, and ask for a connection from the right class.</p>
<h2>Managing connections and the connection pool</h2>
<p>Each connection adapter object represents a single connection to a database. Rails generally opens several connections at once and manages them in a connection pool. When a task needs a database connection, it checks one out of the pool; when it finishes, it checks the connection back in so that the next task can use it. Connections can run only one SQL statement at a time, so generally one connection is opened per thread.</p>
<p>In most Rails applications, this takes place transparently. When you&#8217;re processing an HTTP request, ActiveRecord automatically checks out a connection and memoizes it for the duration of the request. Whether you use the standard ActiveRecord APIs, or obtain a raw connection by calling the <code>connection</code> method described above, you get the request&#8217;s memoized connection. At the end of the request, ActiveRecord&#8217;s Rack middleware automatically checks the connection back in.</p>
<p>However, if you access ActiveRecord outside the context of handling a request via your controller, then you need to do connection management yourself. ActiveRecord will still automatically check out and memoize a connection when you ask for one. However, if you are not using a normal Rails controller, or are otherwise not running inside ActiveRecord&#8217;s ConnectionManager middleware, you should check in the connection manually when you are done. You can do so using the <code>clear_active_connections!</code> method on any of your ActiveRecord classes:</p>
<pre>User.clear_active_connections!</pre>
<p>Now that we know how to get a connection, let&#8217;s see what we can do with one.</p>
<h2>Running Low-Level Queries</h2>
<p>Most calls in the standard ActiveRecord API return ActiveRecord &#8220;model&#8221; objects. However, there might be cases when you want to bypass the overhead of creating these full ActiveRecord objects, or maybe you want to query data that doesn&#8217;t have a corresponding ActiveRecord class. The connection adapter&#8217;s low-level query methods let you write your own SQL, and return &#8220;plain old data&#8221; as raw result tables.</p>
<p>In this first example, we get the &#8220;name&#8221; value from a single row in our &#8220;users&#8221; table. If we only need the name, and don&#8217;t need the full ActiveRecord object with the rest of its data and capabilities, we can grab the connection object and use the <code>select_value</code> method:</p>
<pre>connection = User.connection
name = connection.select_value("SELECT name FROM users WHERE id=1")
# =&gt; "Dave"</pre>
<p>Use <code>select_value</code> when you need a single value from a single row. The result will be either a string, or nil if no rows match. Note that the returned value is a string even if the column has a different type such as a number or timestamp. You will need to perform your own conversion:</p>
<pre>time_str = connection.select_value(
  "SELECT created_at FROM users WHERE id=1")
# =&gt; "2011-12-24 11:35:02"
time = Time.parse("#{time_str} UTC")</pre>
<p>(This often gets me when I ask for a numeric value from the database. The <code>select_value</code> method returns it as a string, so be sure to call <code>to_i</code> on it if you want to do any math.)</p>
<p>If you want a single column from multiple rows, use <code>select_values</code>, which returns an array of strings.</p>
<pre>names = connection.select_values(
  "SELECT name FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; ['Bob', 'Kathy']</pre>
<p>To obtain multiple columns, you can use <code>select_rows</code>. This method returns an array of arrays, each representing a row in the order of the selected columns. For example:</p>
<pre>rows = connection.select_rows(
  "SELECT id,name FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; [['2', 'Bob'], ['4', 'Kathy']]</pre>
<p>This returns an array of two-element arrays. Remember that all values are returned as strings. In the example above, the &#8220;id&#8221; column is numeric, so you may need to call <code>to_i</code> if you want to treat the IDs as integers.</p>
<p>There is no <code>select_row</code> method. To select a single row, just use <code>select_rows</code> and take the first element.</p>
<p>Alternately, you can return rows as hashes (of column name to value) using <code>select_one</code> (for a single row) or <code>select_all</code> (for multiple rows).</p>
<pre>records = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; [{'id'=&gt;'2', 'name'=&gt;'Bob'}, {'id'=&gt;'4', 'name'=&gt;'Kathy'}]</pre>
<h2>Low-Level Data Updates</h2>
<p>Updating data is accomplished using similar calls. Insert rows into the database using the <code>insert</code> method:</p>
<pre>connection.insert(
  "INSERT INTO users SET name='Sally', created_at='now'")</pre>
<p>Update rows in the database using the <code>update</code> method. This method returns the number of rows affected by the update.</p>
<pre>row_count = connection.update(
  "UPDATE users SET name='Robert' WHERE id=2")
# =&gt; 1</pre>
<p>You can delete rows using <code>delete</code>, which again returns the number of rows deleted.</p>
<pre>row_count = connection.delete(
  "DELETE FROM users WHERE created_at&gt;'2012-01-01 00:00:00'")
# =&gt; 3</pre>
<p>Normally, however, you won&#8217;t be hard-coding values when you insert and update. You&#8217;ll be injecting data obtained from the user or some other source. Therefore, to avoid SQL injection attacks and other issues, you need to quote these values when you construct your SQL statement. Use the <code>quote</code> method for this purpose. It takes a Ruby object and represents it, properly quoted, in the syntax expected by the database. Here are some examples from the PostgreSQL connection adapter. (Connection objects for other databases will yield slightly different results depending on the database.)</p>
<pre>connection.quote("Hello") # =&gt; "'Hello'"
connection.quote("Joe's") # =&gt; "'Joe''s'"
connection.quote(2) # =&gt; "2"
connection.quote(true) # =&gt; "'t'"
connection.quote(Time.now) # =&gt; "'2012-01-23 07:46:41.540033'"</pre>
<p>Use <code>quote</code> when constructing SQL statements that update the database:</p>
<pre>new_name = "Robert"
row_id = 2
row_count = connection.update(
  "UPDATE users SET name=#{connection.quote(new_name)}"+
  " WHERE id=#{connection.quote(row_id)}")</pre>
<h2>Using Prepared Statements</h2>
<p>Prepared statements are a common database optimization technique. The idea is that often many of the queries run by your application will have a common form. For example, you might find yourself running hundreds of queries of the form &#8220;<code>SELECT name FROM users WHERE id=</code><em>[something]</em>&#8220;. Instead of reparsing the entire statement and rerunning the query planner on every query, you can often instruct the database to parse and plan only once, and re-use that information on subsequent queries. The way to do that is with a prepared statement.</p>
<p>As of Rails 3.1, ActiveRecord includes support for prepared statements. The standard ActiveRecord API will utilize prepared statements transparently. If you are using the low-level connection API, you can set up prepared statements manually.</p>
<p>For running queries, the only one of the methods we&#8217;ve covered that supports prepared statements is <code>select_all</code>. Instead of the full statement, you must send a statement template and a list of values to substitute in. The list of values should be an array of two-element arrays; the second element of each array is the value. The first element is generally set to the ActiveRecord column type governing the type of the value, but you can set it to nil if you&#8217;re confident of the type you&#8217;re sending in. The values must be sent in as the third argument. (The second is a &#8220;name&#8221; for the query, used for annotating the logs. You can set it to nil.) Here&#8217;s the earlier <code>select_all</code> example, rewritten using a prepared statement.</p>
<pre>records = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;$1",
  nil, [[nil, ::Time.utc(2012,1,1)]])
# =&gt; [{'id'=&gt;'2', 'name'=&gt;'Bob'}, {'id'=&gt;'4', 'name'=&gt;'Kathy'}]</pre>
<p>Now, if you have additional requests of the same form (using the same template), they should run faster because the prepared statement (with its prepared SQL and predetermined query plan) is being reused.</p>
<pre>records2 = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;$1",
  nil, [[nil, ::Time.utc(2012,1,2)]])
records3 = connection.select_all(
  "SELECT id,name FROM users WHERE created_at&gt;$1",
  nil, [[nil, ::Time.utc(2012,1,3)]])</pre>
<p>The <code>insert</code>, <code>update</code>, and <code>delete</code> methods provide similar access to prepared statements. For the <code>update</code> and <code>delete</code> methods, use the same arguments as <code>select_all</code>: the template first, followed by a query name (which can be nil) and then an array of values to inject into the statement:</p>
<pre>row_count = connection.update(
  "UPDATE users SET name=$1 WHERE id=$2",
  nil, [[nil, "Robert"], [nil, 2]])</pre>
<p>The <code>insert</code> method is a little more complex because it takes a series of three more arguments before the values list. (Those extra arguments are for supporting databases that make you manage primary key sequences manually. In most cases, you can set those to nil.) So the values array should be passed as the <em>sixth</em> argument to <code>insert</code>. It&#8217;s a little messy, but this is the way the API is set up as of Rails 3.1.</p>
<pre>connection.insert(
  "INSERT INTO users SET name=$1, created_at=$2",
  nil, nil, nil, nil, [[nil, "Sally"], [nil, "now"]])</pre>
<p>Prepared statement support requires ActiveRecord 3.1 or later. On older versions, you will need to construct the entire SQL statement using <code>quote</code> to inject values.</p>
<h2>Low-Level Migrations and Other Features</h2>
<p>Migrations are a very common case for dropping to low-level SQL statements, since there may be many cases when you&#8217;ll want more control over your schema than ActiveRecord gives you.</p>
<p>When you write an ActiveRecord migration, you actually are already using the connection adapter API. The <code>create_table</code> and similar methods are connection adapter methods; the migration simply delegates them to the adapter. This means you can use all the methods we&#8217;ve been discussing, such as insert and update, directly in your migration if you wanted to inject data during that process. (With the caveat, of course, that many consider it bad practice to alter data during a migration.)</p>
<p>To perform schema changes using raw SQL, you should generally use the <code>execute</code> method. This method returns the underlying database driver&#8217;s result object; it doesn&#8217;t try to do any postprocessing on the result, and so it generally may be faster than the calls we&#8217;ve been looking at so far. It&#8217;s useful for cases when you want the <em>very</em> low-level driver-specific result, or (as in most migrations) for cases when you don&#8217;t care about the result. Here&#8217;s an example using PostgreSQL&#8217;s flavor of SQL:</p>
<pre>class MyMigration &lt; ActiveRecord::Migration
  def up
    execute &lt;&lt;-SQL
      CREATE TABLE users (
        id SERIAL PRIMARY KEY,
        name CHARACTER VARYING NOT NULL,
        created_at TIMESTAMP NOT NULL)
    SQL
  end
  def down
    execute 'DROP TABLE users'
  end
end</pre>
<p>Nowadays when I write migrations, I use <code>execute</code> for almost everything because I generally want to fine-tune the database schema. It&#8217;s not strictly necessary in the above very simple case, but in more complex applications, you may want to set up constraints, triggers, and other data management features in your database, and you&#8217;ll need it in those cases.</p>
<p>Of course, if you use <code>execute</code>, you will have to write out both the forward and reverse migrations. You can&#8217;t use Rails 3.1&#8242;s &#8220;change&#8221; feature which attempts to auto-create the backward migration from the forward migration&#8212;Rails isn&#8217;t <em>quite</em> smart enough to figure out how to reverse a change made by arbitrary SQL. But you may find it an acceptable trade-off, as I have.</p>
<p>Another common use for the connection object is to delimit transactions. If you have a set of statements that should be wrapped in a transaction, use the <code>transaction</code> method:</p>
<pre>connection.transaction do
  connection.update("UPDATE users SET name='Robert' WHERE id=2")
  connection.update("UPDATE users SET name='Catherine' WHERE id=4")
end</pre>
<p>Because the connection adapter is shared with ActiveRecord, you can also include high-level ActiveRecord calls in your transaction block, interspersed with your low-level calls. For example, you could write the above code like this:</p>
<pre>connection.transaction do
  connection.update("UPDATE users SET name='Robert' WHERE id=2")
  obj = User.find(4)
  obj.name = "Catherine"
  obj.save
end</pre>
<p>I prefer the first version, however. For starters, it&#8217;s more performant&#8212;it doesn&#8217;t require an extra select to get the data, and it doesn&#8217;t require building the ActiveRecord object simply to update a single field. But in my opinion, it&#8217;s also cleaner and more succinct. I know some don&#8217;t like seeing SQL in your web app, but in many cases it&#8217;s a very expressive and readable language for tasks like this. I believe is simply a case of the right tool for the right job.</p>
<p>You can also obtain various information about the capabilities of the database and the database driver. Here are just a few examples:</p>
<pre>connection.supports_savepoints?       # Database supports savepoints?
connection.supports_statement_cache?  # Supports prepared statements?
connection.table_name_length          # Maximum table name length
connection.columns_per_table          # Maximum columns per table</pre>
<p>Finally, the connection adapter provides access to the underlying driver-specific connection for when you need to access <em>really</em> advanced or database-specific features. Here&#8217;s an example (assuming you&#8217;re using the postgresql database adapter).</p>
<pre>raw_connection = connection.raw_connection  # Returns a PGconn object
pg_server_version = raw_connection.server_version.to_s</pre>
<h2>For More Information</h2>
<p>This has been an overview of the tools that ActiveRecord provides for bypassing the ActiveRecord &#8220;models&#8221; and accessing the advanced features of your database directly. Most of us won&#8217;t write entire Rails applications using <em>only</em> this low-level API. However, it is a good tool to have in the toolbox. For those cases when the ActiveRecord ORM doesn&#8217;t quite do what you need, or does so clumsily or inefficiently, ActiveRecord lets you drop down to SQL quite easily.</p>
<p>The good news is that all these calls are well-documented on <a title="Rails API" href="http://api.rubyonrails.org/" target="_blank">http://api.rubyonrails.org/</a>:</p>
<ul>
<li>The <code>ActiveRecord::ConnectionAdapters::DatabaseStatements</code> module describes most of the low-level methods for querying and updating data that we covered in this article.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::Quoting</code> module describes various methods for quoting data for injection into SQL.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::SchemaStatements</code> module describes the schema-manipulation methods you are probably familiar with using for migrations.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::TableDefinition</code> class includes the methods you can call inside <code>create_table</code>.</li>
<li>The <code>ActiveRecord::ConnectionAdapters::DatabaseLimits</code> module describes a miscellaneous set of informational methods you can call.</li>
</ul>
<h2>About the Author</h2>
<p>Daniel Azuma is a Ruby developer specializing in geospatial technologies, computational geometry, graphics, and related fields. He is the author of <a title="RGeo gem" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">rgeo</a> and related gems for <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">geospatial analysis</a> in Ruby and Rails applications. He currently works as Chief Software Architect at <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>.</p>
<p><strong>Edits (Fri 27 Jan)&#8212;</strong> I made some corrections to the prepared statement examples so they actually work. Note to self: test examples before publishing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/216/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 8: ZCTA Lookup, A Worked Example</title>
		<link>http://www.daniel-azuma.com/blog/archives/191</link>
		<comments>http://www.daniel-azuma.com/blog/archives/191#comments</comments>
		<pubDate>Mon, 16 Jan 2012 08:02:35 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[RGeo]]></category>
		<category><![CDATA[segmentation]]></category>
		<category><![CDATA[Shapefile]]></category>
		<category><![CDATA[ZCTA]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=191</guid>
		<description><![CDATA[This week we&#8217;ll put together what we&#8217;ve covered so far in this series by implementing a simple but usable service: looking up the Zip Code Tabulation Area (ZCTA) for a location. This is an actual task I had to do &#8230; <a href="http://www.daniel-azuma.com/blog/archives/191">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This week we&#8217;ll put together what we&#8217;ve covered so far in this series by implementing a simple but usable service: looking up the Zip Code Tabulation Area (ZCTA) for a location. This is an actual task I had to do for my job at <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>, and while I will pare it down for this article, we&#8217;ll go through some of the actual trade-offs and optimization decisions I made in our implementation.</p>
<p>In this article, we will cover:</p>
<ul>
<li>The goals for the service, and what is a ZCTA anyway</li>
<li>Obtaining ZCTA data from the U.S. Census</li>
<li>Developing our own ZCTA database</li>
<li>Querying the database</li>
<li>Improving performance using polygon segmentation</li>
</ul>
<p>This is part 8 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-191"></span>ZCT-what?</h2>
<p>At <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>, we do a lot of geospatial analysis. One task we have to perform fairly frequently is to group data into neighborhoods for demographic and spatial analysis.</p>
<p>Now, mapping coordinates to neighborhoods is a more challenging task than you might suspect. There is no &#8220;official&#8221; database, and boundaries on the ground are often imprecise and subject to rapidly evolving local information. One way that you could go about it is to define neighborhoods as zip codes or clusters of zip codes, but this carries its own challenges because (perhaps counter-intuitively) zip code &#8220;boundaries&#8221; are not well-defined either. Zip codes are defined by postal routes, and don&#8217;t necessarily correspond to or respect any other boundary information, including even city and state boundaries. (For further discussion, see the US Census position on zip codes at <a title="Census position on zip code boundaries" href="http://www.census.gov/geo/www/tiger/tigermap.html#ZIP" target="_blank">http://www.census.gov/geo/www/tiger/tigermap.html#ZIP</a>.)</p>
<p>There are two different ways of tackling the problem of mapping neighborhoods or zip codes. One is to use a (paid) service to do the local heavy lifting&#8212;curating and cleaning the data, and tracking and applying local knowledge&#8212;for you. I generally recommend <a title="Maponics" href="http://www.maponics.com/" target="_blank">Maponics</a> for this task, but there are a variety of services available. Alternatively, if you&#8217;re content with approximate data, and/or you have more direct access to some amount of local knowledge, you can build your own database using Zip Code Tabulation Areas (ZCTA).</p>
<p>ZCTA (pronounced &#8220;zik-tuh&#8221;) is a system created by the US Census to address the zip code problem. The idea is this. Sometimes you need to look up the actual zip code for a location for reasons related to postal delivery. In such cases you will need to work directly with the US Postal Service or a third-party curator like Maponics. But other times you don&#8217;t necessarily need the <em>actual</em> zip code, but you just want to use something like a zip code as a convenient unit of delineation for geo-analysis&#8212;for example, to approximate neighborhood boundaries. In this latter case, it doesn&#8217;t matter that the zip code itself isn&#8217;t always 100% accurate. Rather, what&#8217;s important is that the boundaries are stable and make geographic and demographic sense. ZCTA is designed for this latter case.</p>
<p>Each ZCTA is a collection of US Census blocks for which, at the time of the census, the addresses fell largely if not completely within a particular zip code. Because ZCTAs are made up of Census blocks, they are well-defined, stable, and statistically useful. And usually, they approximate the actual zip code boundaries fairly well.</p>
<p>For this article, we&#8217;ll build a simple ZCTA lookup tool: one that will let you query a location (latitude and longitude) for which ZCTA contains it.</p>
<h2>Setting up our database to hold ZCTA data</h2>
<p>I&#8217;ll assume we&#8217;ve set up a Rails project and a <a title="PostGIS spatial database" href="http://www.postgis.org/" target="_blank">PostGIS</a> database as covered in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>. Let&#8217;s start by creating a model for the ZCTA data. For now we&#8217;ll just create a simple table capable of mapping geometry to ZCTA (zip code). You can of course modify this to add more fields.</p>
<pre>% rails generate model Zcta zcta:integer region:polygon</pre>
<p>This creates the following migration:</p>
<pre>class CreateZctas &lt; ActiveRecord::Migration
  def change
    create_table :zctas do |t|
      t.integer :zcta
      t.polygon :region
      t.timestamps
    end
  end
end</pre>
<p>Before we migrate, we&#8217;ll do a few things here. First, we don&#8217;t need those timestamps so we&#8217;ll get rid of them. Second, we will want to do spatial queries against the <code>:region</code> column, so we&#8217;ll create a spatial index on that column, as we covered in <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>.</p>
<p>Third, we&#8217;ll choose a coordinate system for the <code>:region</code> column. For this service, I&#8217;ll use the EPSG 3785 projection. This coordinate system is often useful because of its affinity with mapping software, its local conformality, and its ability to keep political boundaries straight. It&#8217;s also good to choose a flat projection rather than a geographic coordinate system because we&#8217;re going to do some geometric manipulation. You can read more discussion on choosing a coordinate system for your database in <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>.</p>
<p>Our migration now looks like this:</p>
<pre>class CreateZctas &lt; ActiveRecord::Migration
  def change
    create_table :zctas do |t|
      t.integer :zcta
      t.polygon :region, :srid =&gt; 3785
    end
    change_table :zctas do |t|
      t.index :region, :spatial =&gt; true
    end
  end
end</pre>
<p>Now we can run the migration:</p>
<pre>% rake db:migrate</pre>
<p>As we discussed in <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>, we also set up the ActiveRecord class to use the simple_mercator_factory.</p>
<pre>class Zcta &lt; ActiveRecord::Base
 FACTORY = RGeo::Geographic.simple_mercator_factory
 set_rgeo_factory_for_column(:region, FACTORY.projection_factory)
end</pre>
<h2>Obtaining ZCTA data and populating our database</h2>
<p>A nice feature of ZCTA is that it is public data freely downloadable from the US government. A good start point for exploring the current (2010) Census ZCTA data is <a title="ZCTA page at the US Census" href="http://www.census.gov/geo/ZCTA/zcta.html" target="_blank">http://www.census.gov/geo/ZCTA/zcta.html</a>. If you want to go straight to the downloads, head over <a title="ZCTA download" href="http://www.census.gov/cgi-bin/geo/shapefiles2010/main" target="_blank">here</a> and choose &#8220;Zip Code Tabulation Areas&#8221; from the menu. You can download shapefiles for individual states, or the entire database as one huge shapefile. (Warning: the combined shapefile download is half a gigabyte compressed.)</p>
<p>For this example, we&#8217;ll download just the state of Washington, but it should be trivial to modify the code to deal with the entire database. When you download the data for Washington, you&#8217;ll end up with a zip file &#8220;tl_2010_53_zcta510.zip&#8221;. Unzipping this file yields a set of five files:</p>
<ul>
<li>tl_2010_53_zcta510.dbf</li>
<li>tl_2010_53_zcta510.prj</li>
<li>tl_2010_53_zcta510.shp</li>
<li>tl_2010_53_zcta510.shp.xml</li>
<li>tl_2010_53_zcta510.shx</li>
</ul>
<p>The .shp extension clues us in that this is a shapefile, one of the formats we covered in <a title="Geo-Rails part 5" href="http://www.daniel-azuma.com/blog/archives/125" target="_blank">part 5</a>. Shapefiles are a very common format for public data sets. They&#8217;re great for downloading data, but not for running spatial searches. So our next task is to import the shapefile into our database.</p>
<p>The shapefile specifies that its geometric information is in the &#8220;NAD83&#8243; geographic coordinate system (EPSG 4269). This is a geographic (latitude-longitude) coordinate system optimized for the United States. It does have very slight differences from the WGS84-based coordinate system (EPSG 4326) that we usually use, but for our purposes, the differences are negligible, so we&#8217;ll treat these as standard WGS84 geographic coordinates.</p>
<p>Now, our database uses the EPSG 3785 projection, so we&#8217;ll need to convert the polygons into the projection. We covered how to use the simple_mercator_factory to perform these projections in <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>. As we saw in <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, converting lines and polygons between coordinate systems can change their shape if individual sides are long enough. For our purpose, ZCTA areas are small enough that we&#8217;ll ignore the effect.</p>
<p>One more issue is that the geometries we&#8217;ll read from the shapefile are actually MultiPolygons rather than Polygons. This is because some ZCTAs (such as those that cover islands) may include multiple disjoint areas. Since we defined our database column to store polygons, we&#8217;ll have to break up each MultiPolygon into its constituent parts. This means we may have some ZCTA numbers that are represented by multiple records in the database.</p>
<p>For the ZCTA number, we&#8217;ll take a quick peek at the Census-provided documentation at <a title="TIGER documentation" href="http://www.census.gov/geo/www/tiger/tgrshp2010/documentation.html" target="_blank">http://www.census.gov/geo/www/tiger/tgrshp2010/documentation.html</a>. Here it states that the ZCTA number can be found in the &#8220;<code>ZCTA5CE10</code>&#8221; property field of each shapefile record.</p>
<p>Now we have enough information to write a script to read the shapefile and populate our database!</p>
<pre>require 'rgeo-shapefile'
RGeo::Shapefile::Reader.open('tl_2010_53_zcta510.shp',
    :factory =&gt; Location::FACTORY) do |file|
  file.each do |record|
    zcta = record['ZCTA5CE10'].to_i
    # The record geometry is a MultiPolygon. Iterate
    # over its parts.
    record.geometry.projection.each do |poly|
      Zcta.create(:zcta =&gt; zcta, :region =&gt; poly)
    end
  end
end</pre>
<p>Let that run for a minute or two, and now we have a fully populated database of ZCTA data for the state of Washington.</p>
<h2>Running ZCTA queries</h2>
<p>Now let&#8217;s write a simple API for querying the ZCTA for a given location.</p>
<p>First we&#8217;ll write some scopes in the ActiveRecord class to construct the queries. To find the ZCTA that contains a particular point location (latitude and longitude), we can use the <code>ST_Intersects</code> function. We just need to make sure we convert the location to the projected coordinate system, as we covered at the end of <a title="Geo-Rails part 7" href="http://www.daniel-azuma.com/blog/archives/164" target="_blank">part 7</a>.</p>
<p>For best performance, we&#8217;ll write our queries to speak PostGIS&#8217;s native language, which is EWKB (see <a title="Geo-Rails part 5" href="http://www.daniel-azuma.com/blog/archives/125" target="_blank">part 5</a>).</p>
<pre>class Zcta &lt; ActiveRecord::Base

  # ...

  EWKB = RGeo::WKRep::WKBGenerator.new(:type_format =&gt; :ewkb,
    :emit_ewkb_srid =&gt; true, :hex_format =&gt; true)

  def self.containing_latlon(lat, lon)
    ewkb = EWKB.generate(FACTORY.point(lon, lat).projection)
    where("ST_Intersects(region, ST_GeomFromEWKB(E'\\\\x#{ewkb}'))")
  end

end</pre>
<p>We could also extend this to support queries by any arbitrary geometry, letting us find the ZCTAs that cover a line or polygon:</p>
<pre>class Zcta &lt; ActiveRecord::Base

  # ...

  def self.containing_geom(geom)
    ewkb = EWKB.generate(FACTORY.project(geom))
    where("ST_Intersects(region, ST_GeomFromEWKB(E'\\\\x#{ewkb)}'))")
  end

end</pre>
<p>Now it&#8217;s pretty straightforward to write a web service API wrapper for this function. Here&#8217;s one way it could be done:</p>
<pre>class ZctaController

  def lookup
    lat = params[:lat].to_f
    lon = params[:lon].to_f
    zcta = Zcta.containing_latlon(lat, lon).first
    render(:json =&gt; {:lat =&gt; lat, :lon =&gt; lon,
      :zcta =&gt; zcta ? zcta.zcta : nil})
  end

end</pre>
<h2>Segmenting polygons for improved performance</h2>
<p>Now that we have a basic implementation, let&#8217;s see if we can improve performance a bit. In <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>, we saw that large, complex geometries, such as polygons with many sides, can result in slow queries. The ZCTA data, it turns out, does have some fairly large polygons with side counts in the tens of thousands. Since all we&#8217;re interested in is looking up ZCTA by location, we may be able to improve performance using the segmentation technique.</p>
<p>Segmentation involves breaking up large polygons into smaller polygons with fewer sides. It trades off a smaller polygon size for a larger number of rows in the database. However, the spatial index can help mitigate queries against a large number of rows, so such a trade-off may be a performance win in some situations. (Of course, we should measure so we know for certain&#8212;we&#8217;ll do that below.)</p>
<p>We&#8217;ll segment using four-to-one subdivision as described in <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>. For each polygon, we&#8217;ll count its sides, and if the count is larger than some threshold, we&#8217;ll divide it in half horizontally and vertically. An easy way to accomplish this division is to take the polygon&#8217;s bounding box and divide it four ways into smaller rectangles. Then, take the intersections of the original polygon with those sub-rectangles. These functions are available in the Simple Features interfaces, and are implemented by RGeo, as we covered in <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>.</p>
<p>Note that it is possible for a subdivision to result in multiple disjoint polygons in each quadrant (that is, a MultiPolygon). So we have to handle that case in the code.</p>
<p>We&#8217;ll also perform one more optimization: if the polygon is long and thin, we&#8217;ll divide it in two rather than in four, in order to make the pieces closer to square.</p>
<p>Ready? Here&#8217;s our implementation:</p>
<pre>MAX_SIZE = 500
MAX_DEPTH = 12

require 'rgeo-shapefile'

# Handle a geometry of any type
def handle_geometry(depth, geom, zcta)
  case geom
  when ::RGeo::Feature::Polygon
    handle_polygon(depth, geom, zcta)
  when ::RGeo::Feature::MultiPolygon
    geom.each{ |polygon| handle_polygon(depth, polygon, zcta) }
  end
end

# Handle a polygon
def handle_polygon(depth, polygon, zcta)
  # Check the number of sides. We'll combine the number of sides for
  # the "outer edge" and any "holes" that the polygon might have.
  # A polygon boundary consists of a LineString that is closed, so
  # the first and last points are the same. Therefore, to count the
  # sides, count the number of vertices and subtract 1.
  sides = polygon.exterior_ring.num_points - 1
  polygon.interior_rings.each{ |ring| sides += ring.num_points - 1 }
  if depth &gt;= MAX_DEPTH || sides &lt;= MAX_SIZE
    # The polygon is small enough, or we recursed as far as we're
    # willing. Just add the polygon.
    Zcta.create(:zcta =&gt; zcta, :region =&gt; polygon)
  else
    # Split the polygon 4-to-1 and recurse
    depth = depth + 1
    # Find the bounding box for the polygon
    envelope = polygon.envelope.exterior_ring
    p1 = envelope.point_n(0)
    p2 = envelope.point_n(2)
    min_x = p1.x
    max_x = p2.x
    min_x, max_x = max_x, min_x if min_x &gt; max_x
    min_y = p1.y
    max_y = p2.y
    min_y, max_y = max_y, min_y if min_y &gt; max_y
    # dx and dy are the size of the bounding box.
    # cx and cy are the center point.
    dx = max_x - min_x
    dy = max_y - min_y
    cx = (min_x + max_x) * 0.5
    cy = (min_y + max_y) * 0.5
    # Check the aspect ratio of the bounding box. If it's very wide
    # or very tall, then only split in half. Otherwise, split in 4.
    if dy &gt; dx * 2
      # The bounding box is tall, so split in half
      handle_quadrant(depth, polygon, min_x, min_y, max_x, cy, zcta)
      handle_quadrant(depth, polygon, min_x, cy, max_x, max_y, zcta)
    elsif dx &gt; dy * 2
      # The bounding box is wide, so split in half
      handle_quadrant(depth, polygon, min_x, min_y, cx, max_y, zcta)
      handle_quadrant(depth, polygon, cx, min_y, max_x, max_y, zcta)
    else
      # The bounding box is close to square so split in four
      handle_quadrant(depth, polygon, min_x, min_y, cx, cy, zcta)
      handle_quadrant(depth, polygon, cx, min_y, max_x, cy, zcta)
      handle_quadrant(depth, polygon, min_x, cy, cx, max_y, zcta)
      handle_quadrant(depth, polygon, cx, cy, max_x, max_y, zcta)
    end
  end
end

# Take a polygon and a box. Run the algorithm on the part of the
# polygon that falls within the box.
def handle_quadrant(depth, polygon, min_x, min_y, max_x, max_y, zcta)
  # We do this by creating a rectangle for the box, and computing
  # the intersection with the input polygon. The result could be a
  # polygon, a MultiPolygon, or an empty geometry.
  box = Zcta::FACTORY.polygon(Zcta::FACTORY.linear_ring([
    Zcta::FACTORY.point(min_x, min_y),
    Zcta::FACTORY.point(min_x, max_y),
    Zcta::FACTORY.point(max_x, max_y),
    Zcta::FACTORY.point(max_x, min_y)]))
  handle_geometry(depth, polygon.intersection(box), zcta)
end

# The main shapefile reader.
RGeo::Shapefile::Reader.open('tl_2010_53_zcta510.shp',
    :factory =&gt; Zcta::FACTORY) do |file|
  file.each do |record|
    # For each MultiPolygon, analyze it and add to the database
    handle_geometry(0, record.geometry.projection,
      record['ZCTA5CE10'].to_i)
  end
end</pre>
<p>Now whenever we consider an optimization, we have to measure its effect. Does it actually work? And if it does, what value of MAX_SIZE should we use?</p>
<p>To find out, I ran the segmentation on the Washington state ZCTA data with different values of MAX_SIZE, and then ran a simple benchmark on each segmentation. The benchmark consisted of 50000 queries randomly distributed across the state. I timed the results on my laptop (an early 2011 Macbook Pro running OSX 10.6.8, Ruby 1.9.2, PostgreSQL 9.0.6, and PostGIS 1.5.3).</p>
<p>This first graph shows the total number of polygons (database rows) created by the segmentation process, plotted against the MAX_SIZE parameter. The original database had 622 polygons, with a maximum of 9893 sides. As our threshold on the number of sides approaches the low hundreds and smaller, the number of polygons (and hence the number of rows in the database) gets very large.</p>
<p><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/polygons_my_maxsize.png"><img class="aligncenter size-full wp-image-195" title="polygons_my_maxsize" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/polygons_my_maxsize.png" alt="" width="486" height="324" /></a></p>
<p>This second graph shows the total time taken by 50,000 queries against the segmented database, plotted against the MAX_SIZE parameter. The benchmark against the original database took 18.15 seconds. As we can see, decreasing the size of each polygon (by running more subdivisions) improves our query performance up to a point, where the larger number of rows becomes significant.</p>
<p><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/time_by_maxsize.png"><img class="aligncenter size-full wp-image-196" title="time_by_maxsize" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/time_by_maxsize.png" alt="" width="465" height="321" /></a></p>
<p>The sweet spot seems to be around the 300-500 side range. At that point, our optimization has cut average query times roughly in half. When I segmented the entire US database of ZCTAs for <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>, we set MAX_SIZE to 500.</p>
<p>A more difficult question is, is the benefit we measured worth the extra complexity introduced by segmenting? That will depend. At Pirq, we sometimes run these queries in an inner loop, so every optimization matters. We also loaded a few much larger polygons, for which the segmentation procedure had a much more dramatic effect. So for our application, we determined that it was worth it. However, as with all benchmarking, it&#8217;s important to do your own measurements for your application.</p>
<h2>Where to go from here</h2>
<p>This article concludes my original outline for this series on geospatial development with Rails. But not to worry&#8212;it&#8217;s not the end. There&#8217;s still plenty of material to cover and plenty of discussion to be had. I just don&#8217;t have the next few parts already planned out yet. So this is your chance to direct where this series goes from here. If you have a topic you&#8217;d like covered, leave me a comment!</p>
<p>That said, there are a couple of major topics that I haven&#8217;t yet covered.</p>
<ul>
<li>I&#8217;ve covered <em>vector</em> data (i.e. points, lines, and polygons) but not <em>raster</em> data (i.e. image overlays). This is for several reasons. First, raster support in PostGIS is relatively new and not yet that mature. (In fact, unless you are using prereleases of PostGIS 2.0, you have to install another third-party library for raster support.) Second, the Ruby tools, notably RGeo, don&#8217;t yet support raster data either. And third, you may have noticed that a major theme of these first eight articles has been understanding coordinate systems and projections. This is critical background knowledge for handling raster data, so I though it was important to cover it first.</li>
<li>I haven&#8217;t covered much on visualization tools. This is largely because my own work has been largely focused on the back-end, so I don&#8217;t yet have a lot to contribute on the view side.</li>
</ul>
<p>I will write something on those topics in the future, but I&#8217;m not sure when I&#8217;ll get to the point where I have enough useful material. In the meantime, the floor is open for other topics!</p>
<p>For now I&#8217;ll conclude with links to resources on the tools that we&#8217;ve been working with during these articles.</p>
<p><a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> is the open source geospatial database of choice. It is an add-on library to the venerable <a title="PostgreSQL" href="http://www.postgresql.org/" target="_blank">PostgreSQL</a> open source database. For more information on PostGIS, see the <a title="PostGIS documentation" href="http://www.postgis.org/documentation/" target="_blank">documentation</a> online, and sign up for the <a title="postgis-users mailing list" href="http://postgis.refractions.net/mailman/listinfo/postgis-users" target="_blank">postgis-users</a> mailing list.</p>
<p><a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> provides the core geospatial vector data types for Ruby. It is installed as the gem &#8220;<a title="rgeo gem" href="http://rubygems.org/gems/rgeo" target="_blank">rgeo</a>&#8220;. For more information on RGeo, see the <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">documentation</a> online, report <a title="RGeo issues" href="http://github.com/dazuma/rgeo/issues" target="_blank">issues</a> and contribute to the <a title="RGeo source" href="http://github.com/dazuma/rgeo" target="_blank">source</a> at Github, and sign up for the <a title="rgeo-users mailing list" href="http://groups.google.com/group/rgeo-users" target="_blank">rgeo-users</a> mailing list.</p>
<p>We have also covered several other ruby gems that are used for more specialized tasks. These include:</p>
<ul>
<li><a title="rgeo-shapefile gem" href="http://rubygems.org/gems/rgeo-shapefile" target="_blank">rgeo-shapefile</a> (reading ESRI shapefiles). <a title="rgeo-shapefile documentation" href="http://virtuoso.rubyforge.org/rgeo-shapefile/README_rdoc.html" target="_blank">documentation</a> / <a title="rgeo-shapefile issues" href="http://github.com/dazuma/rgeo-shapefile/issues" target="_blank">issues</a> / <a title="rgeo-shapefile on Github" href="http://github.com/dazuma/rgeo-shapefile" target="_blank">source</a></li>
<li><a title="rgeo-geojson gem" href="http://rubygems.org/gems/rgeo-geojson" target="_blank">rgeo-geojson</a> (reading and writing GeoJSON). <a title="rgeo-geojson documentation" href="http://virtuoso.rubyforge.org/rgeo-geojson/README_rdoc.html" target="_blank">documentation</a> / <a title="rgeo-geojson issues" href="http://github.com/dazuma/rgeo-geojson/issues" target="_blank">issues</a> / <a title="rgeo-geojson on Github" href="http://github.com/dazuma/rgeo-geojson" target="_blank">source</a></li>
<li><a title="activerecord-postgis-adapter gem" href="http://rubygems.org/gems/activerecord-postgis-adapter" target="_blank">activerecord-postgis-adapter</a> (ActiveRecord adapter for PostGIS). <a title="activerecord-postgis-adapter documentation" href="http://virtuoso.rubyforge.org/activerecord-postgis-adapter/README_rdoc.html" target="_blank">documentation</a> / <a title="activerecord-postgis-adapter issues" href="http://github.com/dazuma/activerecord-postgis-adapter/issues" target="_blank">issues</a> / <a title="activerecord-postgis-adapter on Github" href="http://github.com/dazuma/activerecord-postgis-adapter" target="_blank">source</a></li>
</ul>
<p>There are, of course, many other ruby libraries for other related tasks such as geocoding. Some of these will likely be the subject of future articles.</p>
<p>Finally, I started a mailing list for general geospatial rails discussion, the &#8220;<a title="GeoRails google group" href="http://groups.google.com/group/georails" target="_blank">georails</a>&#8221; google group. Sign up if you&#8217;re interested in more community discussion.</p>
<p><em>This is part 8 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/191/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 7: Geometry vs. Geography, or, How I Learned To Stop Worrying And Love Projections</title>
		<link>http://www.daniel-azuma.com/blog/archives/164</link>
		<comments>http://www.daniel-azuma.com/blog/archives/164#comments</comments>
		<pubDate>Mon, 09 Jan 2012 08:04:32 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[Projections]]></category>
		<category><![CDATA[Rails]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=164</guid>
		<description><![CDATA[This week we&#8217;re going to look at how to choose a coordinate system for your database. In PostGIS, this includes the choice of geometry vs geography columns, as well as which projection (if any) to use, and how to interact &#8230; <a href="http://www.daniel-azuma.com/blog/archives/164">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This week we&#8217;re going to look at how to choose a coordinate system for your database. In PostGIS, this includes the choice of geometry vs geography columns, as well as which projection (if any) to use, and how to interact with it from Rails.</p>
<p>In this article, we&#8217;ll:</p>
<ul>
<li>Review geographic and projected coordinate systems</li>
<li>Discuss the pros and cons of using the PostGIS geographic type</li>
<li>See why I typically store data in a projection</li>
<li>Look at some specific projections I recommend using (or avoiding)</li>
<li>Learn how to handle projected data in Rails</li>
</ul>
<p>My original series plan for this week called for a worked example of a location-based web service, bringing together much of the material that we&#8217;ve covered so far. But as I was writing it, I realized there was one more topic we probably ought to cover first. So I&#8217;ll publish the example next week.</p>
<p>This is part 7 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-164"></span>A tale of two coordinate systems</h2>
<p>In <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, we took a first look at coordinate systems. We saw that coordinate systems are different ways of assigning <em>meaning</em> to coordinate values. Or, put another way, any particular meaning (such as a location) can be described in multiple ways. Each of those ways would use a different set of values, according to a different coordinate system.</p>
<p>Locations on the earth&#8217;s surface are typically specified using one of two general types of coordinate systems: <em>geographic</em> coordinate systems and <em>projected</em> coordinate systems. Geographic coordinate systems usually use some notion of latitude and longitude, measuring angles along the surface of the earth. They are also embedded in a curved domain. What this means is, you can&#8217;t technically show latitude and longitude on a flat piece of paper or computer screen. Objects described in latitude and longitude are always curved like the surface of the earth; distances measured between latitudes and longitudes are always measured along a curved surface.</p>
<p>Projected coordinate systems are formed by &#8220;flattening&#8221; the earth&#8217;s surface into a flat domain. Coordinates in a projected system are not in latitude and longitude. They do not measure angles. Instead, they measure distance and position along that flattened surface. Because of this, the actual coordinate values in a projection may not be immediately recognizable. However, the benefit is that objects in a projected coordinate system are flat, so you can draw them on a flat piece of paper or computer screen, and you can perform analysis and calculations the way you are used to used to from your high school geometry class.</p>
<p>Here are two sets of coordinates for the Space Needle in Seattle. The first uses a geographic coordinate system, and the values are the familiar longitude and latitude. The second, called &#8220;NAD83 / Washington North&#8221;, is the <em>state plane</em> projected coordinate system for northern Washington state. The coordinates in this projection may not be immediately recognizable, but it points to the same location.</p>
<pre>POINT(-122.34978 47.620578)  -- geographic
POINT(1266457.58 230052.50)  -- projected</pre>
<p>In the beginning of <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, we looked at some of the ramifications of using different coordinate systems. They can drastically change the way that objects are shaped or computations are done. Now we&#8217;ll look at some practical advice regarding choosing coordinate systems to use.</p>
<h2>The PostGIS geographic type</h2>
<p>The <a title="PostGIS geospatial database" href="http://www.postgis.org/" target="_blank">PostGIS</a> database provides two different types of spatial columns: geometric and geographic. We saw in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a> that we can specify which type to use in our Rails migrations, through the use of the <code>:geographic</code> modifier:</p>
<pre>class CreateLocations &lt; ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :latlon, :geographic =&gt; true
      t.timestamps
    end
  end
end</pre>
<p>Geographic columns use a geographic coordinate system (latitude and longitude on a curved domain). Geometric columns use a projected coordinate system (on a flat domain). But which should you use for your application? To answer this question, we need to unpack what the coordinate system differences mean in the context of PostGIS.</p>
<p>Let&#8217;s start with the obvious. Geographic types use units of latitude and longitude. Since these are familiar concepts, we can put them directly into the database and pull them out for display without having to perform any transformations on the values. This makes the geographic type very convenient for many simple applications.</p>
<p>Second, the shape of lines and polygons in geographic columns will follow the curvature of the earth. We saw a dramatic demonstration of this in the beginning of <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>: a &#8220;straight line&#8221; from San Francisco to Athens passes over Iceland in a geographic coordinate system, even though Iceland is far to the north of either endpoint.</p>
<p>Third, as a corollary to the previous point, geographic coordinates for the most part let you ignore seams and singularities. Take a short line segment from <code>POINT(179 0)</code> to <code>POINT(-179 0)</code>. On the globe, in a geographic coordinate system, this is a short line that crosses the International Date Line. Projections, in contrast, have to flatten the earth, and in order to do so, they have to &#8220;cut&#8221; the globe someplace. This cut becomes the edge of the map. Many projections perform this cut along the Date Line. Hence, if we take our two points on either side of the Date Line, and draw a line segment between then in such a projection, that line would run the other way, crossing most of the world.</p>
<div id="attachment_167" class="wp-caption aligncenter" style="width: 401px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_geographic.png"><img class="size-full wp-image-167" title="segment_geographic" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_geographic.png" alt="" width="391" height="387" /></a><p class="wp-caption-text">A line segment connecting two points on either side of the Date Line, in a geographic coordinate system.</p></div>
<div id="attachment_168" class="wp-caption aligncenter" style="width: 490px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_projected.png"><img class="size-full wp-image-168" title="segment_projected" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/segment_projected.png" alt="" width="480" height="244" /></a><p class="wp-caption-text">A line segment connecting the same endpoints, in some projections, may cross the entire world.</p></div>
<p>Similarly, the north and south poles also cause problems for many projections. As a result, if you deal with objects that cross the Date Line or live near or especially surrounding the poles, you may have to deal with these (literal) edge cases specially. Generally, the geographic type lets you avoid having to think about these special cases because a globe has no edges.</p>
<p>Now the bad news. Computations across a curved surface are more complex than across a flat surface. Distance calculation, intersections, and so forth, will be slower on geographic types than on projections. In fact, some computations will not be available at all. In <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>, we considered an example &#8220;counties&#8221; table, in which we chose to use a projected coordinate system to store polygons. The reason I did that is that I wanted to cover <code>ST_Relate</code>, a function that PostGIS supports for geometric types but not geographic types.</p>
<p>Finally, geographic types are also subject to the model of the earth that you are using. The earth is actually not a perfect sphere, but is slightly flattened along its axis of rotation. In order to perform computations across a large area with a high degree of accuracy, you need to take that flattening into account. Unfortunately, the flattening makes the already complex computations maddeningly complex (and correspondingly slower). Because of this, PostGIS gives you the option of choosing whether to perform computations using the spherical or flattened shape, trading off speed for accuracy. Each function that supports geographic inputs performs the more accurate computations by default, but you can change it to use the faster spherical formulas by passing FALSE as an optional final parameter.</p>
<pre>ST_Distance(pt1, pt2)         -- Uses more accurate computation
ST_Distance(pt1, pt2, FALSE)  -- Uses faster spherical computation</pre>
<h2>A case for projections</h2>
<p>So which type should you use? There will be some cases when the decision is clear. If you need to perform computations across large sections of the globe, for example, you will usually want to use the geographic type. However, my experience has been that, for <em>most</em> use cases that you&#8217;re likely to encounter in a Rails application, you&#8217;ll get better results by choosing a reasonable projection.</p>
<p>Why do I say that?</p>
<p><strong>Spatial data storage should match its usage.</strong> This is, I think, the most important but most overlooked consideration. Often, your application will lend itself to particular projection based on what it <em>does</em> with the data, and it is almost always beneficial to structure your data storage accordingly. I know as engineers we often want to abstract our data representation from our application functionality. But you don&#8217;t always have that luxury with big data&#8212;whether you like it or not, you have to accommodate the resource and performance needs of your database. This goes double with geospatial data, because the queries and analysis can get quite expensive.</p>
<p>One very common application is simply the display of your database objects on a Google Map or similar visualization tool. In such an application, most of your queries might be of the form: <em>Give me all the objects that appear within this rectangle on a Google Map.</em> If your data is stored and queried in the same coordinate system as that used by Google Maps, then those rectangular map areas will translate directly into simple rectangular queries in your database. If, however, your database uses a geographic coordinate system or a different projection, your query may map into a distorted or non-rectangular area in your database&#8217;s coordinate system, resulting in more complex code and/or decreased performance.</p>
<p><strong>Many shapes are best represented in a (particular) projection.</strong> Let&#8217;s take a look at a shape that should be familiar to most readers, the outline of the United States:</p>
<div id="attachment_175" class="wp-caption aligncenter" style="width: 388px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_lambert2.png"><img class="size-full wp-image-175" title="us_lambert" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_lambert2.png" alt="" width="378" height="240" /></a><p class="wp-caption-text">The United States, in a Lambert Conformal Conic projection. (credit: http://csanet.org/newsletter/winter03/nlw0303.html)</p></div>
<p>Now, much of the northern border with Canada follows a line of latitude, the &#8220;49th parallel&#8221;. A straight line. Except, in the above image, it&#8217;s not straight; it&#8217;s curved slightly. This map is in a <em>Lambert Conformal Conic</em> projection, very commonly used for US national and state maps. To represent the northern border of the country in this projection, you would need a curved line (or, in practice, a bunch of short straight lines that together approximate a curved line.) But in some other projections&#8212;for example a <em>Mercator</em> projection&#8212;lines of latitude are straight, making the shape much easier and more efficient to represent.</p>
<div id="attachment_176" class="wp-caption aligncenter" style="width: 415px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_mercator1.png"><img class="size-full wp-image-176" title="us_mercator" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/us_mercator1.png" alt="" width="405" height="231" /></a><p class="wp-caption-text">The United States, in a Mercator projection. (credit: http://csanet.org/newsletter/winter03/nlw0303.html)</p></div>
<p>East-west and north-south lines in most political boundaries tend to follow lines of latitude and longitude, respectively, and so are best represented in a projection (such as Mercator) that preserves those lines as straight. Remember, most lines of latitude are <em>not</em> straight in a geographic coordinate system, so a geographic latitude-longitude coordinate system is <em>not</em> particularly well-suited for large political boundaries such as states and countries.</p>
<p><strong>Most data is hyperlocal</strong>. The geographic type&#8217;s advantages come to the foreground when you&#8217;re dealing with data spread over the entire globe, or when you need to deal with objects covering large areas or distances covering significant portions of the globe. However, in practice I&#8217;ve found there are very few applications like that. In most cases, you&#8217;ll be dealing with primarily point data, or if you do have line or polygonal data, the individual objects are small: streets, parcel boundaries, municipal and statistical boundaries, and so forth. Furthermore, in most cases, your data will be limited to a particular part of the world, or at least you&#8217;ll seldom need to handle data that crosses seams such as poles or the Date Line. So in practice, you seldom actually run into the problems that would be solved by using the geographic type.</p>
<p><strong>Performance does matter</strong>. Many operations gain a substantial performance improvement from using the PostGIS geometry type rather than the geography type. Furthermore, using geometry saves you from having to think about which functions are available and which are not.</p>
<h2>A projection to avoid and a projection to consider</h2>
<p>You might be tempted to store latitude and longitude in a geometry type column. That is, to set up your PostGIS column with a geometry type, but use SRID=4326 (which is the EPSG number for WGS 84 latitude and longitude).</p>
<p>Don&#8217;t do this.</p>
<p>I did this a few times in my naive youth, and it came back to bite me. What you&#8217;re really doing here is employing a particular projection called <em>Plate Carree</em>, which simply maps latitude and longitude directly to <em>x</em> and <em>y</em> on the plane. Remember, any time you use geometry rather than geography, you are working with a flat coordinate system, and thus a projection. You might think you&#8217;re working with latitude and longitude, but you&#8217;re actually not.</p>
<div id="attachment_173" class="wp-caption aligncenter" style="width: 490px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/plate_carree.png"><img class="size-full wp-image-173" title="plate_carree" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/plate_carree.png" alt="" width="480" height="238" /></a><p class="wp-caption-text">The Plate-Carree projection. (Credit: http://kartoweb.itc.nl/geometrics/Map%20projections/body.htm)</p></div>
<p>Plate Carree is not a particularly useful projection (except that it is trivial to compute). It doesn&#8217;t preserve distances, angles, directions, areas, or any other cartographically useful properties, and its distortion in polar regions is severe. In almost all cases, you can do much better with a different projection.</p>
<p>The projection I tend to recommend for many applications is Mercator. In particular, a minor variation on Mercator that is used by Google and Bing Maps:</p>
<div id="attachment_172" class="wp-caption aligncenter" style="width: 448px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/google_world.png"><img class="size-full wp-image-172" title="google_world" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/google_world.png" alt="" width="438" height="322" /></a><p class="wp-caption-text">The Google world map, a slight variation on a Mercator projection</p></div>
<p>This coordinate system has EPSG number 3785, and has a number of helpful properties.</p>
<ul>
<li>It&#8217;s used by <a title="Google maps" href="http://maps.google.com/" target="_blank">Google maps</a> and <a title="Bing maps" href="http://www.bing.com/maps/" target="_blank">Bing maps</a> (and possibly other mapping systems as well), so if you use those systems for visualization, you have a good match between your data storage and application.</li>
<li>It preserves angles and shapes locally. (In cartographic terms, it is <em>conformal</em>.) This means if you zoom into any part of the map, the shapes and aspect ratio will closely match the real shapes on the globe. This is, I think, the primary reason it is popular with mapping visualizations.</li>
<li>Lines of latitude and longitude are straight, so political boundaries tend to work well.</li>
<li>It&#8217;s relatively simple to compute.</li>
</ul>
<p>As with any projection, there will be times when this one is not appropriate. By now, you should have enough understanding to identify many of these cases. However, a few of the common objections you might encounter, are not as important as they sound, and I think I should say a few words about them.</p>
<p>You might hear people object to using EPSG 3785 on the grounds that it contains a simplification that introduces cartographic inaccuracies. (Specifically, it treats its underlying geography as a sphere rather than a flattened ellipsoid.) In most cases, this argument makes too much of too little. <em>All</em> projections rely on simplifications that introduce inaccuracies in one form or another. If your application is to bounce a laser across a continent, then by all means dig deep into the corrective factors. But for most web applications, 3785 should be more than sufficient. Indeed, the inaccuracies in most of the data you will gather, including GPS and geocoded data, will far outweigh most of what can be introduced by the projection.</p>
<p>You also might hear people object to using the Mercator projection at all, on the grounds that it gives a distorted picture of the nature of the world. Because the projection magnifies areas further from the Equator, it generates map images that appear to privilege richer countries in higher latitudes while downgrading the importance of poorer countries closer to the Equator. In 1989, a well-publicized resolution, signed by a number of prominent geographers, was published in <em>American Cartographer</em>, decrying the use of Mercator and similar rectangular projections for these and other reasons. This point is well-taken, and if you are displaying a full world map, I generally do not recommend Mercator if you can help it. However, here we are talking specifically about database structure, not visualization, so for our purposes I think the point is moot.</p>
<h2>Working with projected data in Rails</h2>
<p>So let&#8217;s see some code! I&#8217;ll demonstrate how to set up your PostGIS database to store data using EPSG 3785, and how to read and write data using ActiveRecord.</p>
<p>We&#8217;ll use our code from <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a> as a starting point. But now, in our migration, we no longer set <code>:geographic</code>, but instead use a geometric (flat) coordinate system with SRID = 3785, as follows. (We&#8217;ll also set up a spatial index, as we saw in <a title="Geo-Rails part 6" href="http://www.daniel-azuma.com/blog/archives/134" target="_blank">part 6</a>.)</p>
<pre>class CreateLocations &lt; ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :loc, :srid =&gt; 3785
      t.timestamps
    end
    change_table :locations do |t|
      t.index :loc, :spatial =&gt; true
    end
  end
end</pre>
<p>We also need to specify a corresponding factory in our ActiveRecord class. Here I&#8217;m going to introduce a rather dirty little feature of RGeo: &#8220;projected geographic&#8221; factories. Now, if you cringed a little at that description, then you&#8217;re getting the hang of coordinate systems. Geographic coordinate systems are by definition <em>not</em> projected! However, sometimes when you&#8217;re working with a projection, you&#8217;ll want a quick way to interact with the data in latitude and longitude&#8212;a quick way to transform individual points to geographic coordinates and back again. This is where RGeo&#8217;s projected geographic factories come in handy.</p>
<p>These factories really use a projected coordinate system under the hood. In fact, they reference a full Cartesian factory internally, and you can gain access to that &#8220;real&#8221; projected factory by calling the <code>projection_factory</code> method. However, they provide you with a convenience interface that lets you look at the data as latitudes and longitudes, as if it were a geographic factory.</p>
<p>The &#8220;simple_mercator&#8221; factory is a useful example. Its &#8220;real&#8221; internal factory has SRID 3785, indicating the Google Maps style Mercator projection, but the wrapper factory reports latitudes and longitudes. In this way, it mirrors the Google Maps Javascript API. It talks latitudes and longitudes on the outside, but converts them internally to the projection for use with the map.</p>
<p>In our ActiveRecord class, we&#8217;ll set up the factory so it correctly interacts with the database in projected coordinates.</p>
<pre>class Location &lt; ActiveRecord::Base

  # Create a simple mercator factory. This factory itself is
  # geographic (latitude-longitude) but it also contains a
  # companion projection factory that uses EPSG 3785.
  FACTORY = RGeo::Geographic.simple_mercator_factory

  # We're storing data in the database in the projection.
  # So data gotten straight from the "loc" attribute will be in
  # projected coordinates.
  set_rgeo_factory_for_column(:loc, FACTORY.projection_factory)

  # To interact in projected coordinates, just use the "loc"
  # attribute directly.
  def loc_projected
    self.loc
  end
  def loc_projected=(value)
    self.loc = value
  end

  # To use geographic (lat/lon) coordinates, convert them using
  # the wrapper factory.
  def loc_geographic
    FACTORY.unproject(self.loc)
  end
  def loc_geographic=(value)
    self.loc = FACTORY.project(value)
  end

end</pre>
<p>Now let&#8217;s do an example query. Suppose our basic query is a simple map search where we want to return all the locations in a given rectangle on our map visualization. Since our data is in the same projection as the original map, a rectangular query in the map translates into a rectangular query in our database. So we&#8217;ll take the latitudes and longitudes of the rectangle edges as parameters, and convert them to projected coordinates. Once there, we can use a simple PostGIS box intersection to run the query itself. It&#8217;s a simple query that can be accelerated using the spatial index.</p>
<p>We&#8217;ll add a scope to our class as follows:</p>
<pre>class Location &lt; ActiveRecord::Base

  # ...

  # w,s,e,n are in latitude-longitude
  def self.in_rect(w, s, e, n)
    # Create lat-lon points, and then get the projections.
    sw = FACTORY.point(w, s).projection
    ne = FACTORY.point(e, n).projection
    # Now we can create a scope for this query.
    where("loc &amp;&amp; '#{sw.x},#{sw.y},#{ne.x},#{ne.y}'::box")
  end

end</pre>
<p>Now rectangle searches are simple:</p>
<pre>locations = Location.in_rect(-122, 47, -121, 48).all</pre>
<h2>Where to go from here</h2>
<p>In this article, we saw some of the pros and cons of using different coordinate systems for your database. The right coordinate system will depend on your application, but I&#8217;ve found that for many applications, using a projection&#8212;often the specific projection EPSG 3785&#8212;produces good results.</p>
<p>It may be useful at this point to gain a general feel for the different types of projections, how they work, and what their pros and cons are. A very good online resource for this is provided <a title="USGS introduction to map projections" href="http://egsc.usgs.gov/isb/pubs/MapProjections/projections.html" target="_blank">here</a> by the USGS.</p>
<p>The <code>RGeo::Geographic.simple_mercator_factory</code> is useful for storing data in EPSG 3785. However, if you want to use a different projection under the hood, you can use a more powerful method, <code>RGeo::Geographic.projected_factory</code>, which lets you specify arbitrary projections using Proj4. Read about it in the <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo documentation</a>.</p>
<p>Next time, I will get to the worked example I promised last week. Stay tuned, and let&#8217;s bring Rails down to earth!</p>
<p><em>This is part 7 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/164/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 6: Scaling Spatial Applications</title>
		<link>http://www.daniel-azuma.com/blog/archives/134</link>
		<comments>http://www.daniel-azuma.com/blog/archives/134#comments</comments>
		<pubDate>Mon, 02 Jan 2012 10:24:02 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=134</guid>
		<description><![CDATA[Scaling, scaling, scaling. Can Rails really scale? It&#8217;s been a source of FUD and the butt of running jokes. But scaling is a serious matter when it comes to large data sets, and it&#8217;s something we need to pay attention &#8230; <a href="http://www.daniel-azuma.com/blog/archives/134">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Scaling, scaling, scaling. Can Rails really scale? It&#8217;s been a source of FUD and the butt of <a title="Can Rails Scale - Joke" href="http://canrailsscale.com/" target="_blank">running jokes</a>. But scaling is a serious matter when it comes to large data sets, and it&#8217;s something we need to pay attention to in the geospatial realm where big data is commonplace.</p>
<p>In this week&#8217;s article, I&#8217;ll go over the basic issues every geospatial programmer should know about scaling, and provide tips for writing your geospatial Rails application so it doesn&#8217;t fall over when you go national. We will cover:</p>
<ul>
<li>The bottom line regarding scaling</li>
<li>Building spatial indexes for your database</li>
<li>Writing queries to take advantage of indexes</li>
<li>Simplification and segmentation of large objects</li>
</ul>
<p>This is part 6 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-134"></span>Scaling is complex but not difficult</h2>
<p>Scaling is a high-profile issue. We all notice when our blog gets slashdotted, and when Twitter or Amazon goes down it makes national news. Failures happen to everyone, and seem almost inevitable. Are they?</p>
<p>Now, I don&#8217;t want to downplay the complexity of the scaling task, but we do have to start with an important observation. Scaling, in its essence, is a solved problem. The techniques involved have been well-understood for decades. We all learned about logarithmic searching, branch and bound, and similar algorithms in our computer science classes. And as web developers, we should already know what these algorithms look like in practice: database indexes, sharding, caching, replication, load balancing, and so forth.</p>
<p>So why the hoopla?</p>
<p>Because scaling, though a solved problem, is not an <em>automatically</em> solved problem. It requires our attention. More to the point, it requires that we understand every aspect of the system we are building, how the various components work and how they interact. If we need our database to scale, at some level we need to understand how it works, how we&#8217;re using it, and thus what we need to <em>do</em> to make it scale.</p>
<p>This is probably the main reason why Rails has historically had a negative reputation about scaling. Rails purports to make web development simple. But that&#8217;s a bit misleading. If you think about it, a website is a complex system with a lot of moving parts: the network, the server, the MVC control flow, databases, caches, client-side code, control flow from one page to another, interactions with external systems, security layers&#8230;, and that doesn&#8217;t even include your application logic with all of its complexities. It&#8217;s hardly simple. Rails tries to deal with this complexity by hiding elements, at least partly. It hides the database behind ActiveRecord, and in doing so trains us not to think about the database. But that&#8217;s a deceit. We have to think about the database if we&#8217;re going to scale it.</p>
<p>And that is one of the key motivations behind this entire series. I&#8217;ve heard some comments that this material is difficult, that there&#8217;s no <em>TL;DR</em>. Yes, the material is difficult, and I&#8217;m not going to try to hide or gloss over that fact. I could try to make geospatial programming really easy, creating a one-size-fits-all tool or recipe for everyone to cargo-cult. But that is the wide road that leads to destruction. Eventually, you&#8217;ll need to figure out how to scale, and at that time, if you don&#8217;t have some understanding of what&#8217;s going on under the hood, you&#8217;ll get very stuck very quickly.</p>
<p>Now, that said, dealing with geospatial features does not fundamentally change the scaling task. Scaling is still a solved problem. As we prepare to scale our applications, there is a well-known, systematic process we all go through. We measure, find the bottlenecks, apply well-understood techniques to address those bottlenecks, and repeat. It can be a tedious process, and (believe me, I know) it is sometimes difficult to sell our business partners on the fact that we need to spend time on it before it&#8217;s too late. But it&#8217;s not like we don&#8217;t know what we&#8217;re doing. Scaling is complex, but not difficult. It simply requires that we have a general understanding of how spatial data works.</p>
<p>So, sermon over, let&#8217;s dive in.</p>
<h2>About spatial database indexes</h2>
<p>Spatial data is often big data, and as with any big data, our basic scaling task involves making it smaller.</p>
<p>In a database, this is generally accomplished by judicious use of <em>indexes</em>. An index provides a fast way to look up data by some criteria, without having to read and compare against every single row. For example, if your table has a million rows, each identified by a numeric ID, you can generally speed up ID lookups by creating an index on that column.</p>
<p>Similarly, <em>spatial database indexes</em> can accelerate queries that include spatial criteria. If your million-row table also contains latitude-longitude coordinate, and you want to find rows whose coordinate falls within a certain region, you should consider building a spatial index on the coordinate column. This allows your spatial search to avoid checking every row in the database, thus speeding up your queries.</p>
<p>The important thing to understand about spatial indexes is that although they are conceptually the same as &#8220;standard&#8221; database indexes, they are implemented differently under the hood. In most databases, a simple index on an ID column will use an algorithm known as a <em>B-tree</em>. Such an index relies on a global ordering of the data, and builds a balanced binary tree, which, as we remember from computer science, lets us do lookups in logarithmic time.</p>
<p>Spatial data, however, has some important differences from normal scalar data. A simple numeric ID is an infinitesimal point on a one-dimensional number line, whereas a polygon is a finite area on a two-dimensional surface. For data that covers finite areas or lives in more than one dimension, a B-tree does not work. We have to resort to a more complex algorithm, usually a variant on what is known as an <em>R-tree</em>.</p>
<p>I&#8217;ll save the gory details on R-trees for a later article on spatial index design, but there is one upshot you&#8217;ll need to understand: spatial indexes are heavier and more expensive than standard indexes. An R-tree takes up more space in memory and on disk than a similarly-sized B-tree. Queries against an R-tree can be a little slower than against a B-tree, and R-tree updates can be considerably slower. However, R-trees still provide logarithmic-time queries, and so will still give you speed-ups in many situations. So the usual database mantra still applies, and indeed goes double for spatial indexes: Index your common queries but don&#8217;t index everything. And of course, Measure, Measure, Measure.</p>
<p>Because R-tree updates can be slow, it is also usually a good idea to remove or disable a spatial index if you are going to be loading a lot of data, and then turn it back on once you are done. In this way, you pay the cost of building the index only once at the end, rather than having to incrementally update it on every insert.</p>
<h2>Creating and using spatial indexes</h2>
<p>Because a spatial index is constructed differently from most indexes, creating one usually requires a special syntax. For a Rails project, you can usually let RGeo&#8217;s ActiveRecord adapters handle this for you. Create a spatial index in a migration simply by providing the <code>:spatial</code> attribute. Following is a snippet from a migration that creates a &#8220;counties&#8221; table with polygons, along with a spatial index on the polygons. (Here we&#8217;ll use geometric column with the &#8220;NAD83 / Washington North&#8221; projection, which has SRID 2285&#8212;see <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>. If you&#8217;re using PostGIS, indexes do work for the geographic type, but have some limitations.)</p>
<pre>create_table :counties do |t|
  t.string :name
  t.polygon :poly, :srid =&gt; 2285
end
change_table :counties do |t|
  t.index :poly, :spatial =&gt; true
end</pre>
<p>If you are managing your schema manually, you&#8217;ll need to use the database&#8217;s particular syntax. In <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a>, spatial indexes use the GIST framework, so you denote a spatial index with &#8220;USING gist&#8221;:</p>
<pre>CREATE INDEX "counties_poly_idx" ON "counties" USING gist ("poly");</pre>
<p><a title="MySQL" href="http://dev.mysql.com/" target="_blank">MySQL</a>&#8216;s spatial extension defines a separate index type:</p>
<pre>CREATE SPATIAL INDEX `counties_poly_index` ON `counties` (`poly`);</pre>
<p>In <a title="SpatiaLite" href="http://www.gaia-gis.it/fossil/libspatialite/index" target="_blank">SpatiaLite</a>, a spatial index is actually a separate table that you must join to. Creating a spatial index involves calling a special function provided by the SpatiaLite library:</p>
<pre>SELECT CreateSpatialIndex('counties', 'poly');</pre>
<p>Once you&#8217;ve created a spatial index, it is usually a good idea to verify that your queries will take advantage of it. You&#8217;ll want to make sure the database&#8217;s <em>query planner</em>, the component that analyzes a query and decides how to attack it, is producing an optimal plan. This is generally good practice for all your database design, but more so for spatial queries because they are less commonly used, and query planners do not always do as good a job with them as we would like.</p>
<p>Your best tool for interacting with the query planner, whether or not you&#8217;re using a spatial database, is <code>EXPLAIN</code>. This SQL command takes a query and returns the query planner&#8217;s plan of attack for that query, usually including which indexes it intends to use and its estimate of how expensive the query will be.</p>
<p>For most databases, you can invoke the EXPLAIN command simply by prefixing your query with &#8220;<code>EXPLAIN</code>&#8220;. For example, using PostGIS, let&#8217;s see what the query planner does with a query asking for the county containing the Seattle Space Needle:</p>
<pre>EXPLAIN
  SELECT "name" FROM "counties" WHERE
    ST_Intersects("poly", ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)'));</pre>
<p>Postgres will return a query plan that looks something like this:</p>
<pre>QUERY PLAN
----------------------------------------------------------------------------------------------
Index Scan using counties_poly_idx on counties (cost=0.00..8.52 rows=1 width=68)
  Index Cond: (poly &amp;&amp; '0101000020ED08000048E17A94195333410000000024150C41'::geometry)
  Filter: _st_intersects(poly, '0101000020ED08000048E17A94195333410000000024150C41'::geometry)</pre>
<p>Note that it&#8217;s using the &#8220;<code>counties_poly_idx</code>&#8221; index that we created. PostGIS is currently quite good about knowing how to use a spatial index for most queries. With the EXPLAIN command, we can be sure this query will use our spatial index for maximum efficiency.</p>
<h2>Optimizing difficult queries in PostGIS</h2>
<p>Unfortunately, there are a few cases when the query planner won&#8217;t be able to figure out by itself that an index is useful. For example, suppose we want to perform a sanity check of our counties database, making sure we don&#8217;t have any overlapping polygons. More precisely, while we expect that county polygons will &#8220;touch&#8221;&#8212;that is, share boundaries&#8212;we don&#8217;t want counties to actually share <em>interior</em> points. That could mean a problem in our data.</p>
<p><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/county_boundaries.png"><img class="aligncenter size-full wp-image-154" title="county_boundaries" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/county_boundaries.png" alt="" width="475" height="273" /></a></p>
<p>Unfortunately, PostGIS doesn&#8217;t provide this kind of &#8220;interior intersection&#8221; function out of the box. The <code>ST_Intersects</code> function we used earlier will flag the &#8220;touch&#8221; case as well, and we don&#8217;t want that.</p>
<p>But we <em>can</em> build &#8220;interior intersection&#8221; using the function <code>ST_Relate</code>. This powerful function lets you test a wide variety of relationships using the <em>Dimensionally Extended Nine-Intersection Model</em>. I won&#8217;t cover this model in detail for now&#8212;you can read about it in the <a title="OGC Simple Feature Access spec" href="http://www.opengeospatial.org/standards/sfa" target="_blank">Simple Features Spec</a>. For our purposes, what&#8217;s important is that, by giving it a particular specification string, &#8220;<code>T********</code>&#8220;, it can implement the relationship we want to test.</p>
<p>Unfortunately, because <code>ST_Relate</code> is such a powerful and general tool, the query planner can&#8217;t optimize it very well, and tends to fall back on the lowest common denominator, which is sequential scan.</p>
<pre>EXPLAIN
  SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
    ON c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<pre>QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=0.00..10372.17 rows=229633 width=64)
  Join Filter: st_relate(c1.poly, c2.poly, 'T********'::text)
  -&gt; Seq Scan on counties c1 (cost=0.00..18.30 rows=830 width=68)
  -&gt; Materialize (cost=0.00..22.45 rows=830 width=68)
       -&gt; Seq Scan on counties c2 (cost=0.00..18.30 rows=830 width=64)</pre>
<p>Ouch! That&#8217;s an unfortunate query plan. It does nested sequential scans, comparing every county polygon with every other county polygon, an <em>n-squared</em> operation. In a table with thousands of counties, this can be slow.</p>
<p>But it turns out we can do better. The &#8220;interior intersection&#8221; operation actually <em>can</em> be optimized using the index. The query planner doesn&#8217;t realize this, so we need to give it some help.</p>
<p>Here a bit of trivia about spatial indexes will help us. In general, the &#8220;native&#8221; operation for an R-tree index is <em>bounding box intersection</em>. It can take the bounding box of an input geometry, and determine which geometries in the table have bounding boxes that intersect the input. At the most basic level, the query planner for PostGIS works by looking for opportunities to apply this native operation. It asks, &#8220;How can I reduce the search space by applying a bounding box intersection?&#8221;</p>
<p>In our first example above, when we used <code>ST_Intersects</code> to find the polygon containing the Space Needle, the query planner reasoned thus: <em>Every time two geometries intersect, their bounding boxes also intersect. So I can add a bounding box intersection and not change the result of the query. I like bounding box intersections because they let me use the index.</em> So actually what happened behind the scenes, was that PostGIS rewrote our query from:</p>
<pre>SELECT "name" FROM "counties" WHERE
  ST_Intersects("poly", ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)'));</pre>
<p>to:</p>
<pre>SELECT "name" FROM "counties" WHERE
  "poly" &amp;&amp; ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)') AND
  ST_Intersects("poly", ST_GeomFromEWKT('SRID=2285;POINT(1266457.58 230052.50)'));</pre>
<p>&#8230;using the PostgreSQL operator for bounding box intersection: &#8220;<code>&amp;&amp;</code>&#8220;. Now, when it creates the actual query plan, it uses the spatial index to optimize the bounding box intersection. Let&#8217;s take another look at that query plan:</p>
<pre>QUERY PLAN
----------------------------------------------------------------------------------------------
Index Scan using counties_poly_index on counties (cost=0.00..8.52 rows=1 width=68)
  Index Cond: (poly &amp;&amp; '0101000020ED08000048E17A94195333410000000024150C41'::geometry)
  Filter: _st_intersects(poly, '0101000020ED08000048E17A94195333410000000024150C41'::geometry)</pre>
<p>See the bounding box intersection &#8220;&amp;&amp;&#8221; in the Index Condition? That wasn&#8217;t in our original query, but PostGIS rewrote our query and put it there so it could use the index. Pretty clever, PostGIS is.</p>
<p>Well, sometimes PostGIS isn&#8217;t quite clever enough, and we have to give it some help. In our &#8220;interior intersection&#8221; example, we can improve the query plan by going through this process manually. We reason thus: <em>PostGIS doesn&#8217;t realize this, but every time two geometries have an &#8220;interior intersection&#8221; using ST_Relate, their bounding boxes also intersect. So I can add a bounding box intersection and not change the result of the query. Bounding box intersections are good because they let me use the index.</em></p>
<p>So let&#8217;s manually rewrite our query from:</p>
<pre>SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
  ON c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<p>to:</p>
<pre>SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
  ON c1.poly &amp;&amp; c2.poly AND
  c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<p>Now we give this to PostGIS, and <em>voila</em>! The query planner now uses the index:</p>
<pre>EXPLAIN
  SELECT c1.name, c2.name FROM counties AS c1 INNER JOIN counties AS c2
    ON c1.poly &amp;&amp; c2.poly AND
    c1.id != c2.id AND ST_Relate(c1.poly, c2.poly, 'T********');</pre>
<pre>QUERY PLAN
----------------------------------------------------------------------------------------
Nested Loop (cost=0.00..296.81 rows=1 width=64)
  Join Filter: st_relate(c1.poly, c2.poly, 'T********'::text)
  -&gt; Seq Scan on counties c1 (cost=0.00..18.30 rows=830 width=68)
  -&gt; Index Scan using counties_poly_idx on counties c2 (cost=0.00..0.32 rows=1 width=68)
       Index Cond: (c1.poly &amp;&amp; c2.poly)</pre>
<p>This improved plan still does one full sequential scan, because it still has to check every county in the database. But the nested scan, which checks whether that county overlaps any other county, is now accelerated using the index. We&#8217;ve reduced the <em>n-squared</em> query to an <em>n log n</em> query. The computer scientist in us rejoices!</p>
<p>Going through this process does require some creativity, and it helps to have a bit of experience. The good news is that PostGIS is smart enough to handle most cases automatically. But you should still make liberal use of the EXPLAIN tool and look carefully at the query plan that is generated, to see if it&#8217;s doing as well as you think it ought. There may be opportunities to improve your query performance dramatically just by giving it a little bit of help.</p>
<h2>Indexing and queries in MySQL and SpatiaLite</h2>
<p>Generally, I recommend <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> as an open source spatial database. But there are others out there that you may need to use from time to time, and each one will have its quirks.</p>
<p>As we&#8217;ve seen, the spatial extensions to <a title="MySQL" href="http://dev.mysql.com" target="_blank">MySQL</a> do support spatial indexes. However, there are some significant limitations in comparison with PostGIS.</p>
<p>First, spatial indexes currently work only on MyISAM tables. This means you can&#8217;t use spatial indexes and get the transaction safety benefits of InnoDB on the same table. Ugh.</p>
<p>Second, MySQL supports only a very limited set of spatial relationship functions. In particular, nearly all MySQL&#8217;s functions work on bounding boxes (which MySQL calls <em>Minimum Bounding Rectangles</em>, or <em>MBR</em>) rather than the geometry itself. So for example, MySQL&#8217;s <code>Intersects</code> function is actually only an alias to <code>MBRIntersects</code>, which tests the bounding boxes for intersection. If you want to test actual geometric intersection, you&#8217;ll have to do some post-filtering on the result set (which you can do using <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a>).</p>
<p>I generally don&#8217;t recommend using MySQL Spatial unless you&#8217;re already using MySQL. But then I typically don&#8217;t recommend using MySQL in general either&#8230;</p>
<p><a title="Spatialite" href="http://www.gaia-gis.it/fossil/libspatialite/index" target="_blank">SpatiaLite</a> is a set of spatial extensions to the popular <a title="SQLite" href="http://www.sqlite.org/" target="_blank">SQLite</a> database. I haven&#8217;t used SpatiaLite much. It does seem to have a fairly complete feature set, at least in comparison with MySQL Spatial, though it doesn&#8217;t compare in maturity with PostGIS.</p>
<p>Spatial indexes in SpatiaLite are a bit of a pain, however. They are implemented as a separate set of managed join tables tied to your main table using triggers. All this is handled fairly transparently, except for queries. When you want to write a query that takes advantage of a spatial index in SpatiaLite, you must explicitly join to the index table.</p>
<p>For the sake of space, I won&#8217;t go into the details here. Instead, I highly recommend an excellent online publication by the author of SpatiaLite, the <a title="SpatiaLite Cookbook" href="http://www.gaia-gis.it/gaia-sins/spatialite-cookbook/index.html" target="_blank">SpatiaLite Cookbook</a>, which serves as the user&#8217;s manual for SpatiaLite, and provides a number of very helpful examples.</p>
<h2>Simplifying and segmenting data</h2>
<p>But wait&#8212;there&#8217;s more!</p>
<p>Remember that the basic scaling task is to make big data smaller. If we have big data, we try to do clever things, such as applying indexes, so that we don&#8217;t have to analyze all the data at once.</p>
<p>Now, there are two ways in which spatial data can be &#8220;big&#8221;. First, there may be a lot of objects, lots of rows. In this case, we can often speed up queries by adding a spatial index, as we have seen.</p>
<p>However, individual objects can also be &#8220;big&#8221;, particularly when you&#8217;re dealing with polygons. Take our table of county polygons. Some county boundaries are simple polygons with just a few sides, but many others have complex, crinkly boundaries that follow rivers, coastlines, mountain divides, or other natural features. The number of sides in such polygons can quickly rise into the thousands or more. When you want to compute, say, an intersection with such a polygon, it can be slow.</p>
<p>There are several different strategies you can use to address this problem. I&#8217;m just going to summarize a couple of the important ones here. But first I need to emphasize one thing. There is no one-size-fits-all solution. Each approach has its pros and cons, and your choice will depend on the requirements of your particular application.</p>
<p>To this end, <em>measurement</em> is absolutely critical. Before, during, and after applying any optimization technique, run a benchmark and make sure that (1) you&#8217;re addressing the right problem, and (2) the performance is going in the right direction. This is doubly important when dealing with spatial data, because the algorithms involved are somewhat more complex, and it may surprise you what&#8217;s fast and what&#8217;s slow.</p>
<p>There are two general techniques for dealing with large polygons: <em>simplification</em> and <em>segmentation</em>.</p>
<p><strong>Simplification</strong> can be applied when you&#8217;re more concerned with speed than accuracy. For example, you might have a polygon with a thousand sides, but if you&#8217;re going to be displaying it in a relatively small area on a map, or you&#8217;re running some spatial queries where you don&#8217;t care too much if you&#8217;re a little off, then you can probably get away with an approximation of the polygon with fewer sides.</p>
<div id="attachment_139" class="wp-caption aligncenter" style="width: 532px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/simplification.png"><img class=" wp-image-139 " title="simplification" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/simplification.png" alt="" width="522" height="400" /></a><p class="wp-caption-text">An example of polygon simplification for part of the coast of France. (Credit: http://vis4.net/blog/posts/rendering_country_maps/)</p></div>
<p>There are a number of polygon simplification techniques out there, useful for different circumstances. I don&#8217;t have space here for a full discussion, but I may write a specialized article on simplification at a later time, because it&#8217;s an interesting (and sometimes tricky) problem.</p>
<p><strong>Segmentation</strong> is often useful for speeding up queries against big polygons, when you care not about the shape of the polygon itself but merely whether you&#8217;re intersecting it. Segmentation, for example, might be useful in our county boundary example.</p>
<p>The idea is to break up large polygons into smaller polygons that can be stored in separate rows in your database. That is, we trade &#8220;width&#8221; of the data (i.e. how big each object is, in terms of number of vertices) for &#8220;length&#8221; (i.e. how many objects there are). Since we have spatial indexes that mitigate big &#8220;length&#8221;, this trade-off can be a win for us.</p>
<p>There are many ways to break up a large polygon. The simplest approach, usually good enough in practice, is to perform <em>recursive four-to-one subdivision</em>. Don&#8217;t be scared off by the name; it&#8217;s actually quite straightforward. The idea is to take your large polygon with many sides, and split it down the middle horizontally and vertically. This will typically result in four polygons, each covering about a quarter of the area and containing about a quarter of the number of sides:</p>
<div id="attachment_158" class="wp-caption aligncenter" style="width: 498px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/split.png"><img class="size-full wp-image-158" title="split" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2012/01/split.png" alt="" width="488" height="168" /></a><p class="wp-caption-text">One four-to-one split of a province in France.</p></div>
<p>&nbsp;</p>
<p>Now, if any of the four polygons still has too many sides, you can do the same thing again, recursively, and so forth until you reach a number of sides that you&#8217;re comfortable with. Once you&#8217;re done, the <em>union</em> of all the resulting polygons will still be your original polygon. So, in our counties example, a county now <code>has_many</code> polygons, and to find the county containing a particular point, do a spatial query for the polygon containing that point and map back to the county.</p>
<p>So how deeply should you segment a polygon? What&#8217;s the &#8220;sweet spot&#8221; in the number of sides? That&#8217;s where you have to test and measure, because it will depend on many factors. In one recent project in which I did some polygon segmentation, I measured the optimal number of sides between three and five hundred. But your mileage will vary.</p>
<h2>Where to go from here</h2>
<p>It is worth diving into the manual for your spatial database for tips on the effective use of spatial indexes. The PostGIS manual is <a title="PostGIS documentation" href="http://www.postgis.org/documentation" target="_blank">online</a>.</p>
<p>EXPLAIN is a very powerful tool for studying and optimizing your database performance in general, not only when you&#8217;re working with spatial data. I highly recommend getting familiar with using it in your database. For PostgreSQL, a good place to start is the <a title="Using Explain in PostgreSQL" href="http://www.postgresql.org/docs/current/static/using-explain.html" target="_blank">Using Explain</a> page in the manual. SQLite and MySQL also have sections in their manuals. Additionally, it looks like Rails 3.2 will <a title="Explain in Rails 3.2" href="http://weblog.rubyonrails.org/2011/12/6/what-s-new-in-edge-rails-explain" target="_blank">include</a> some useful EXPLAIN-based tools out of the box.</p>
<p>This week&#8217;s article didn&#8217;t include a lot of code. This was because I had a lot of material to get through, and I decided it was more important to cover the concepts at this stage, rather than encourage code cargo-culting. In next week&#8217;s article, however, I&#8217;ll go through a fully worked example, with code, that mirrors an actual task I recently had to do for my job at <a title="Pirq" href="http://www.pirq.com/" target="_blank">Pirq</a>.</p>
<p>Until then, have fun and let&#8217;s bring Rails down to earth!</p>
<p><em>This is part 6 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/134/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>RGeo 0.3.3 Released</title>
		<link>http://www.daniel-azuma.com/blog/archives/132</link>
		<comments>http://www.daniel-azuma.com/blog/archives/132#comments</comments>
		<pubDate>Wed, 21 Dec 2011 08:05:16 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[RGeo]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=132</guid>
		<description><![CDATA[I&#8217;ve released RGeo 0.3.3. This is a bug fix release with several important fixes, and upgrading is highly recommended. Changes include: The WKRep WKT parser recognizes MultiPoint WKTs in which individual points are not contained in parens. This syntax is technically &#8230; <a href="http://www.daniel-azuma.com/blog/archives/132">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve released <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> 0.3.3. This is a bug fix release with several important fixes, and upgrading is highly recommended.</p>
<p>Changes include:</p>
<ul>
<li>The WKRep WKT parser recognizes MultiPoint WKTs in which individual points are not contained in parens. This syntax is technically incorrect, but we are now supporting it because there was some ambiguity due to an error in early versions of the spec, and apparently there are now examples in the wild. (Reported by J Smith.)</li>
<li>The Geos CAPI implementation sometimes returned the wrong result from <code>GeometryCollection#geometry_n</code>. Fixed.</li>
<li>Fixed a hang when validating certain projected linestrings. (Patch contributed by Toby Rahilly.)</li>
<li>Several rdoc updates (including a contribution by Andy Allan).</li>
<li>Separated declarations and code in the C extensions to avert warnings on some compilers.</li>
</ul>
<p>RGeo is a spatial data library for Ruby, providing full implementations of the standard spatial data types. It is the basis for a suite of useful gems for writing geospatial applications in Ruby and Rails. For more information, see the documentation at <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">http://virtuoso.rubyforge.org/rgeo/README_rdoc.html</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/132/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 5: Spatial Data Formats</title>
		<link>http://www.daniel-azuma.com/blog/archives/125</link>
		<comments>http://www.daniel-azuma.com/blog/archives/125#comments</comments>
		<pubDate>Mon, 19 Dec 2011 09:36:32 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[GeoJSON]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[RGeo]]></category>
		<category><![CDATA[Shapefile]]></category>
		<category><![CDATA[WKB]]></category>
		<category><![CDATA[WKT]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=125</guid>
		<description><![CDATA[The location revolution is a revolution of data. Ubiquitous data, from mobile GPS and user input as well as from census and other datasets, is what makes location-aware applications possible. And so the first task of many geospatial projects is &#8230; <a href="http://www.daniel-azuma.com/blog/archives/125">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The location revolution is a revolution of data. Ubiquitous data, from mobile GPS and user input as well as from census and other datasets, is what makes location-aware applications possible. And so the first task of many geospatial projects is to determine how to find and utilize (and, in some cases, produce) external data.</p>
<p>In this article, we will survey some of the important spatial data formats, including serialization, file formats, and api-oriented formats. Specifically, we will look at:</p>
<ul>
<li>Basic serialization using WKT and WKB</li>
<li>Variants on WKT and WKB</li>
<li>Reading public datasets from shapefiles</li>
<li>Web service oriented formats such as GeoJSON</li>
<li>XML-based formats commonly used in web services</li>
</ul>
<p>We will also go over a few quick examples using Ruby and <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a>. This will be a fairly high-level overview and we won&#8217;t go into a lot of detail. We&#8217;ll take deeper looks at some of these formats in future articles.</p>
<p>This is part 5 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<h2><span id="more-125"></span>The standard OGC serialization formats</h2>
<p>If, after reading <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>, you looked through the Simple Feature interfaces (or the corresponding RGeo interfaces), you may have noticed two serialization methods provided for geometries: <code>as_text</code> and <code>as_binary</code>. These methods respectively output the &#8220;Well-Known Text&#8221; and &#8220;Well-Known Binary&#8221; representations of the geometry. These two standard serialization formats are defined by the <a title="Simple Features Spec" href="http://www.opengeospatial.org/standards/sfa" target="_blank">OGC Simple Feature Access specification</a>, and commonly supported by most GIS systems.</p>
<p>Well-Known Text (often abbreviated WKT) is a human-readable and parseable text-based format for all geometry objects. You can read the exact format specification in the Simple Features Spec, but a few examples are probably sufficient to get the general hang of it.</p>
<pre>Point(-122.1 47.2)
LineString(2 4, 5 4, 5 8, 2 4)
Polygon((0 0, 5 0, 5 5, 0 5, 0 0), (2 2, 2 3, 3 3, 3 2, 2 2))
MultiPoint((-122.1 47.2), (-93.5 39.4))
GeometryCollection(Point(3 5), LineString(-2 0, -3 -4))
MultiLineString EMPTY</pre>
<p>Don&#8217;t confuse the simple features WKT format with the coordinate system WKT format we covered in <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>. Unfortunately, both are commonly known as Well-Known Text (WKT), but they are distinct formats: one represents geometric objects whereas the other represents coordinate systems.</p>
<p>Well-Known Binary (often abbreviated WKB) is a binary format that uses numeric codes and IEEE floating-point representations. It is not human-readable but is much more compact than WKT.</p>
<p>Using RGeo, you can obtain the WKT and WKB representations of a geometric object by calling <code>as_text</code> and <code>as_binary</code>, respectively. Factory objects will provide methods to parse WKT and WKB format and recover the geometric object.</p>
<pre>point = factory.point(1, 2)
wkt = point.as_text   # =&gt; "Point(1 2)"
point2 = factory.parse_wkt(wkt)
point == point2       # =&gt; true</pre>
<h2>Variants on WKT and WKB</h2>
<p>As simple and well-supported as they are, WKT an WKB have several important weaknesses that have caused headaches for spatial databases and applications. In <a title="Geo-Rails part 4" href="http://www.daniel-azuma.com/blog/archives/106" target="_blank">part 4</a>, we saw that to properly interpret a geometric object, you need to know the coordinate system, which is usually specified by a spatial reference ID (SRID). Unfortunately, neither WKT nor WKB include a way to represent SRID. They expect SRID to be specified or implied elsewhere, which is sometimes but not always true.</p>
<p>Furthermore, some applications use additional coordinates in their geometric data. Applications that store altitude or other third-dimensional data may include a &#8220;Z&#8221; coordinate in their geometries. Other applications may include a measurement (such as temperature or population) stored in an &#8220;M&#8221; coordinate. Version 1.1 of the Simple Features Spec (and the corresponding WKT and WKB specifications) do not directly support these extra coordinates, (although version 1.2 does address this, as we will see.)</p>
<p>Finally, neither WKT nor WKB by themselves provide a way to associate metadata, such as object names or other properties, with geometric objects. This limits their usefulness as a complete format for data transmission.</p>
<p>Because of these limitations, several variants have appeared that you should be aware of. The <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> database supports an extension to WKT called &#8220;EWKT&#8221;, which supports SRID as well as &#8220;Z&#8221; and &#8220;M&#8221; coordinates. The SRID, if present, appears at the front of the EWKT string:</p>
<pre>SRID=4326;Point(-122.34978 47.62058)</pre>
<p>EWKT supporting &#8220;Z&#8221; and &#8220;M&#8221; coordinates to be appended to each pair of coordinates as third and fourth coordinate values. When both &#8220;Z&#8221; and &#8220;M&#8221; are present (i.e. four coordinate values per point), the third coordinate is used for &#8220;Z&#8221; while the fourth is used for &#8220;M&#8221;. If only one is used (i.e. three coordinate values per point), you must specify whether it is &#8220;Z&#8221; or &#8220;M&#8221;. Here are some examples:</p>
<pre>Point(-122.34978 47.62057 20.0 -3)  # X,Y,Z,M in EWKT
PointM(-122.34978 47.62057 -3)      # X,Y,M
PointZ(-122.34978 47.62057 20.0)    # X,Y,Z</pre>
<p>PostGIS also defines a corresponding &#8220;EWKB&#8221; format with appropriate extensions to the binary format to support SRID as well as Z and M. EWKB is (or at least appears to be) the native internal format used by PostGIS to represent geometric data.</p>
<p>More recent versions of the OGC Simple Features Spec (version 1.2 and later) also provide support for Z and M. However, beware that the OGC format is not the same as the PostGIS EWKT and EWKB. The WKT update expects a space between the geometry type and the Z/M specifier, and it also requires the modifier in the &#8220;four-dimensional&#8221; ZM case:</p>
<pre>Point ZM(-122.34978 47.62057 20.0 -3)  # X,Y,Z,M in WKT 1.2
Point M(-122.34978 47.62057 -3)        # X,Y,M
Point Z(-122.34978 47.62057 20.0)      # X,Y,Z</pre>
<p>Furthermore, the updated WKT format still does not support a SRID. The updated WKB similarly supports Z and M (but not SRID), but uses different binary codes than those used by EWKB. Hence, these two extensions are not fully compatible with each other.</p>
<p>Because of this fragmentation, neither of these extensions are, in practice, used frequently for long-term serialization. However, you will likely need to work with EWKT at some point if you use PostGIS, so it is important to be familiar with it.</p>
<p>RGeo provides support for parsing and generating both variants in the <code>RGeo::WKRep</code> module. See the <a title="RGeo documentation" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">rdocs</a> for more details. Here&#8217;s a really quick code example as a starting point:</p>
<pre>parser = RGeo::WKRep::WKTParser.new(nil, :support_ewkt =&gt; true)
point = parser.parse('SRID=4326;Point(-122.1 47.3)')
point.srid   # =&gt; 4326</pre>
<h2>Shapefiles and public datasets</h2>
<p>Location is driven by data, and a lot of the data you will need to work with will likely come in the form of <em>shapefiles</em>. The shapefile is a flat file format for geospatial data originally developed by <a title="ESRI" href="http://www.esri.com/" target="_blank">ESRI</a> for storing sets of geographic features. It supports certain vector shapes&#8211; points, lines, and polygons&#8211; along with associated attributes. Although shapefile began as a proprietary format, the format specification is readily available, and it is now a <em>de facto</em> standard for large datasets, including those provided by government agencies such as the <a title="Census" href="http://www.census.gov/" target="_blank">US Census Bureau</a>.</p>
<p>A shapefile actually consists of three (and sometimes more) related files, each with the same base filename but different extensions. The main file has the extension &#8220;<code>.shp</code>&#8221; and contains the geometric data itself in a binary format. An auxiliary &#8220;<code>.shx</code>&#8221; file provides a simple flat index allowing random access into the shapefile. A second auxiliary &#8220;<code>.dbf</code>&#8221; file provides the attribute data in <a title="dBASE" href="http://dbase.com/" target="_blank">dBASE</a> format. All shapefiles should have those three core files, although some shapefiles may include additional files containing coordinate system, spatial index, or other related information.</p>
<p>Most Rails applications will not read a shapefile directly, but will instead transfer the data to a spatial database such as PostGIS for rapid query and data retrieval. In Ruby, you can use the <a title="rgeo-shapefile gem" href="http://virtuoso.rubyforge.org/rgeo-shapefile/README_rdoc.html" target="_blank">rgeo-shapefile</a> gem to help with this task. This gem does the heavy lifting involved with parsing and analyzing a shapefile, and exposes the data to you as RGeo geometric objects. You should also install the <a title="DBF gem" href="https://rubygems.org/gems/dbf" target="_blank">dbf</a> gem, which lets you read the dBASE attributes in the shapefile.</p>
<pre>% gem install rgeo-shapefile
% gem install dbf</pre>
<p>Once you have the gems installed, and you&#8217;ve downloaded and unpacked a shapefile, use the <code>RGeo::Shapefile::Reader</code> class to open and read the file. The following example reads objects sequentially:</p>
<pre>factory = RGeo::Geographic.spherical_factory(:srid =&gt; 4326)
RGeo::Shapefile::Reader.open('myfile.shp', :factory =&gt; factory) do |file|
  file.each do |record|
    geom = record.geometry
    # geom is now an RGeo geometry object.
    name = record['Name']
    # You can read any other attribute similarly.
    # Now, you can do whatever you want with the data,
    # such as inserting rows into your database...
  end
end</pre>
<p>Notice that we provide a factory for the objects being read. Shapefiles generally do not provide an SRID, so we must supply that. The above example assumes the shapefile contains latitude-longitude coordinates in WSG84.</p>
<p>The <code>RGeo::Shapefile::Reader</code> class also lets you do random access reads, and get other information about the shapefile&#8217;s contents. See the <a title="rgeo-shapefile rdocs" href="http://virtuoso.rubyforge.org/rgeo-shapefile/README_rdoc.html" target="_blank">rdocs</a> for more details. The gem does not currently support writing shapefiles, but that feature is on the roadmap.</p>
<p>For more information on the shapefile format itself, you can find the original ESRI specification at <a title="Shapefile specification" href="http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf" target="_blank">http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf</a>. Another common (C-based) implementation of shapefile is Shapelib, which you can find at <a title="ShapeLib" href="http://shapelib.maptools.org/" target="_blank">http://shapelib.maptools.org/</a>.</p>
<h2>Web services and GeoJSON</h2>
<p>Another way to obtain location data is to call a web service such as <a title="Google Places API" href="http://code.google.com/apis/maps/documentation/places/" target="_blank">Google Places</a>, <a title="SimpleGeo" href="https://simplegeo.com/" target="_blank">SimpleGeo</a>, or <a title="Factual" href="http://www.factual.com/" target="_blank">Factual</a>. These services do the heavy lifting of curating, deduping, and managing location data, and generally provide an http REST api letting you query for location information of interest.</p>
<p>There are a number of different types of web services, including geocoders, point of interest search, location properties, and others. I&#8217;ll write up a survey of useful location-oriented web services in a later article. For this current article, however, we are interested in data formats that would typically be returned from a point of interest search. When you make a query, what sort of data can you expect to get?</p>
<p>In many cases, the web service will define its own schema for the returned data. You must then parse the returned document yourself to extract the information you want. There are well-known gems available for this task, such as <a title="JSON gem" href="http://flori.github.com/json/" target="_blank">json</a> for parsing <a title="JSON format" href="http://www.json.org/" target="_blank">JSON</a>, and <a title="Nokogiri XML parser" href="http://nokogiri.org/" target="_blank">nokogiri</a> for parsing XML. There are also, however, a few semi-standard schemas commonly used by a number of web services. Here we will take a quick tour of some of these formats and how you can go about using them.</p>
<p><strong>GeoJSON</strong> is an important emerging standard commonly used by SimpleGeo and similar modern APIs. It provides a standard JSON representation for each geometric type, as well as support for bounding boxes, coordinate systems, and a set of properties. Following is an example of GeoJSON, lifted out of the specification:</p>
<pre>{ "type": "FeatureCollection",
  "features": [
    { "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [102.0, 0.5]},
      "properties": {"prop0": "value0"}
      },
    { "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
          ]
        },
      "properties": {
        "prop0": "value0",
        "prop1": 0.0
        }
      },
    { "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0],
            [100.0, 1.0], [100.0, 0.0] ]
          ]
      },
      "properties": {
        "prop0": "value0",
        "prop1": {"this": "that"}
        }
      }
    ]
  }</pre>
<p>The core object type in GeoJSON is the <em>Feature</em>, which consists of a geometry and a set of properties. The geometry can be any of the OGC types, and its internal representation is closely modeled on WKT. Properties are simply named key-value pairs whose values can be any JSON object.</p>
<p>GeoJSON is simple and highly versatile, and is often an ideal format both for consuming and producing geospatial data. From Ruby, you can use the <a title="rgeo-geojson gem" href="http://virtuoso.rubyforge.org/rgeo-geojson/README_rdoc.html" target="_blank">rgeo-geojson</a> gem to read and write GeoJSON. Here are some quick examples to get you started:</p>
<pre>require 'rgeo/geo_json'

str1 = '{"type":"Point","coordinates":[1,2]}'
geom = RGeo::GeoJSON.decode(str1, :json_parser =&gt; :json)
geom.as_text              # =&gt; "POINT(1.0 2.0)"

str2 = '{"type":"Feature","geometry":{"type":"Point","coordinates":' +
  '[2.5,4.0]},"properties":{"color":"red"}}'
feature = RGeo::GeoJSON.decode(str2, :json_parser =&gt; :json)
feature['color']          # =&gt; 'red'
feature.geometry.as_text  # =&gt; "POINT(2.5 4.0)"

hash = RGeo::GeoJSON.encode(feature)
hash.to_json == str2      # =&gt; true</pre>
<p>For more information on GeoJSON, see <a title="GeoJSON" href="http://geojson.org/" target="_blank">http://geojson.org/</a>. The actual spec hosted on the website is quite short and very readable. You can find more information on the rgeo-geojson gem from its <a title="GeoJSON rdoc" href="http://virtuoso.rubyforge.org/rgeo-geojson/README_rdoc.html" target="_blank">rdocs</a>.</p>
<h2>XML-based formats</h2>
<p>Although JSON is often a format of choice for many modern web services because of its simplicity and its close affinity with Javascript and similar high-level languages, XML is still the established standard in many fields and applications. GIS services, in particular, have a long tradition of XML-based representation, and there are a number of XML-based geospatial formats you may encounter when writing location-aware applications. Among them:</p>
<p><strong>GeoRSS</strong> is a family of RSS extensions for embedding geospatial data into RSS or Atom feeds, often used to spatially tag feed entries. It comes in two flavors, <em>Simple GeoRSS</em> and <em>GML GeoRSS</em>. Simple GeoRSS is designed for simplicity, and supports a limited set of features. Notably, not all the OGC geometric types can be represented, and coordinate system is limited to WGS84 latitude/longitude. GML GeoRSS is a more full-featured but much more complex format, essentially a profile of <em>GML</em>, which we will cover below. Most actual implementations of GeoRSS are of the Simple flavor.</p>
<p>Below are a couple of examples of a basic GeoRSS element from an RSS feed, first in the Simple flavor and then in the GML flavor.</p>
<pre>&lt;georss:point&gt;47.604828 -122.330779&lt;/georss:point&gt;</pre>
<pre>&lt;GeoRSS:where&gt;
  &lt;gml:Point&gt;
    &lt;gml:pos&gt;47.604828 -122.330779&lt;/gml:pos&gt;
  &lt;/gml:Point&gt;
&lt;GeoRSS:where&gt;</pre>
<p>As of this writing, the georss.org website appears to be unmaintained and possibly hacked. The best starting point I can recommend for GeoRSS is an OGC whitepaper at <a title="GeoRSS whitepaper from OGC" href="http://portal.opengeospatial.org/files/?artifact_id=15755" target="_blank">http://portal.opengeospatial.org/files/?artifact_id=15755</a>.</p>
<p>I&#8217;m not currently aware of an RGeo-based Ruby implementation of GeoRSS. The older <a title="GeoRuby gem" href="http://georuby.rubyforge.org/" target="_blank">GeoRuby</a> gem, however, does have basic support for GeoRSS.</p>
<p><strong>Geography Markup Language</strong> (or <strong>GML</strong>) is an XML-based object model intended to describe geographic information. Its specification is maintained by the Open Geospatial Consortium. GML by itself is a highly general and flexible model that can represent not only geometric objects and coordinate systems such as we have looked at so far in this article series, but also observations, topological information, temporal information, and various other related entities.</p>
<p>You generally don&#8217;t work with GML directly, but instead use an application XML schema that references GML internally. Furthermore, most application schemas don&#8217;t utilize the entire GML specification, but a relevant subset, known as a <em>GML profile</em>. For example, GML GeoRSS is an application schema referencing a GML profile relevant to geotagging feed entries.</p>
<p>Another common GML-based XML schema is <em>CityGML</em> (<a title="CityGML website" href="http://www.citygml.org/" target="_blank">http://www.citygml.org/</a>), which is designed to model urban objects. CityGML is commonly used, for example, to model 3D visualizations of cities.</p>
<p>For more information on GML as a whole, you can review the OGC spec at <a title="OGC GML spec" href="http://www.opengeospatial.org/standards/gml" target="_blank">http://www.opengeospatial.org/standards/gml</a>.</p>
<p>I&#8217;m not currently aware of any specific Ruby support for GML or its various dialects.</p>
<p><strong>KML</strong> (or <strong>Keyhole Markup Language</strong>) is an XML schema that originated at Google for describing features in <a title="Google Earth" href="http://earth.google.com/" target="_blank">Google Earth</a>, but was later standardized by the OGC. Although it does have some overlap with GML, KML is often seen as complementary because of its particular emphasis on visualization. Its intended use is to describe how to display features within a Google Earth style application. You can, for example, open a KML file with Google Earth to display its contents.</p>
<p>For more information on KML, see the Google documentation at <a title="KML documentation from Google" href="http://code.google.com/apis/kml/documentation/" target="_blank">http://code.google.com/apis/kml/documentation/</a> or the OGC specification at <a title="OGC KML spec" href="http://www.opengeospatial.org/standards/kml" target="_blank">http://www.opengeospatial.org/standards/kml</a>.</p>
<p>I&#8217;m not currently aware of any specific Ruby support for KML.</p>
<h2>Where to go from here</h2>
<p>This article has covered just a few of the most common and/or promising major spatial data formats. There are a number of others currently in use, including many locale or application-specific forms. But as you can see, Ruby support for even the major formats is currently rather thin. We still have much work to do on our tools.</p>
<p>As the principal author of RGeo, I&#8217;m looking for help in this area. I released the rgeo-geojson and rgeo-shapefile gems based on work I&#8217;ve done to integrate my own applications with those formats. However, I haven&#8217;t yet had the need to actually use one of the XML formats, and as a result I haven&#8217;t written any tools to help with them. There is currently quite a bit of room to contribute to the community in this area.</p>
<p>Next week I&#8217;m going to take a break for the holidays, but I expect to release the next planned article on scaling spatial applications with the new year. Stay tuned, and let&#8217;s bring Rails down to earth!</p>
<div>
<p><em>This is part 5 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/125/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Geo-Rails part 4: Coordinate Systems and Projections</title>
		<link>http://www.daniel-azuma.com/blog/archives/106</link>
		<comments>http://www.daniel-azuma.com/blog/archives/106#comments</comments>
		<pubDate>Tue, 13 Dec 2011 05:31:45 +0000</pubDate>
		<dc:creator>Daniel Azuma</dc:creator>
				<category><![CDATA[GeoRails]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.daniel-azuma.com/blog/?p=106</guid>
		<description><![CDATA[When people speak of a learning curve in geospatial programming, they&#8217;re usually referring to handling coordinate systems. It&#8217;s true that many spatial applications require close attention to the coordinate system, and it&#8217;s true that there are some difficult concepts involved. &#8230; <a href="http://www.daniel-azuma.com/blog/archives/106">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When people speak of a learning curve in geospatial programming, they&#8217;re usually referring to handling coordinate systems. It&#8217;s true that many spatial applications require close attention to the coordinate system, and it&#8217;s true that there are some difficult concepts involved. However, it&#8217;s been my experience that once the light bulb turns on, it opens up a lot of the power and potential of geodata.</p>
<p>In this article, we&#8217;ll take a first look at coordinate systems and geographic projections. We will:</p>
<ul>
<li>Examine the importance and effect of coordinate system differences</li>
<li>Survey the various coordinate systems used for geospatial data</li>
<li>Become familiar with coordinate system representations and SRIDs</li>
<li>Specify coordinate systems in RGeo factories</li>
<li>Use RGeo to convert data between coordinate systems</li>
<li>Learn how to handle coordinate systems in Rails</li>
</ul>
<p>This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</p>
<p><span id="more-106"></span></p>
<h2>Why coordinate systems are important: a cautionary fable</h2>
<p>Once upon a time, a plane took off from San Francisco, California on a routine flight to Athens, Greece. The captain, being both valiant and diligent with his passengers&#8217; safety, sent a Twitter message to air traffic control. He wanted to ensure that his flight route would not be disrupted, for he had read on Hacker News that the diabolical EvilVolcano in Iceland was erupting, sending a deadly ash cloud high into the air lanes.</p>
<p>Meanwhile, an air traffic technician, having just finished installing a brand new Rails-based flight planning application, received the tweet. &#8220;<code>@air_traffic_control pls chk flt path SFO-ATH. Far enuf fr #EvilVolcano?</code>&#8221;</p>
<p>Excited to use his new tool, the technician got busy with his analysis. He looked up the latitude-longitude coordinates of San Francisco and Athens, and plotted a straight line between them, using Google Maps to verify that his path looked correct. Then he looked up the latitude-longitude coordinates of the diabolical EvilVolcano, and calculated the distance between it and the plane&#8217;s expected path.</p>
<p>&#8220;Lo!&#8221; he exclaimed. &#8220;Surely the flight path shall follow a straight line between two points, right along the 38-degree latitude line. But look&#8211; Iceland is nowhere near that path. The safety of our valiant pilot is assured!&#8221;</p>
<div id="attachment_111" class="wp-caption aligncenter" style="width: 522px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/wrong_path.png"><img class="size-full wp-image-111" title="wrong_path" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/wrong_path.png" alt="" width="512" height="280" /></a><p class="wp-caption-text">The flight path as seen by the air traffic technician</p></div>
<p>The technician proudly tweeted back his findings: &#8220;<code>@valiant_pilot Flt path safe dist fr #EvilVolcano. Have pleasant journey.</code>&#8221;</p>
<p>Having received this response on Twitter, the pilot proceeded to take off on his planned route. The last tweet received from the ill-fated flight, some seven hours later, read as follows: &#8220;<code>Oceanic Air flt 815 encountering ash #SmokeMonster from #EvilVolcano. #Mayday #Mayday</code>&#8220;. And the rest is history.</p>
<p>What went wrong?</p>
<p>For the most part, the air traffic technician did the right thing. He looked up the latitude-longitude coordinates of the departure and arrival cities, drew a straight line path between them, and then computed the distance bewteen that path and the EvilVolcano in Iceland. That is, he computed:</p>
<pre>Distance(LineString(-122.4 37.8, 23.7 37.9), Point(-19.6 63.6))</pre>
<p>However, he missed one thing. The shape of a &#8220;straight&#8221; line drawn between two points, and thus the distance calculated, may be vastly different depending on what coordinate system you are using, even though the latitude and longitude coordinates are the same!</p>
<p>Google Maps uses a <em>Mercator Projection</em> to display a world map. In this flat coordinate system, a straight line between San Francisco and Athens follows the 38-degree latitude line, passing through muggy Virginia and sunny southern Spain on its way to balmy (albeit bankrupt) Athens.</p>
<p>But the earth itself is not flat. Using a spherical coordinate system, the straight line shortest path between the two cities across the surface of the globe passes further north, directly over chilly and volcano-infested Iceland. This is the actual flight path of the ill-fated valiant pilot.</p>
<div id="attachment_112" class="wp-caption aligncenter" style="width: 448px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/actual_path.png"><img class="size-full wp-image-112" title="actual_path" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/actual_path.png" alt="" width="438" height="374" /></a><p class="wp-caption-text">The actual shortest straight-line path from San Francisco to Athens</p></div>
<p>Simply by using the wrong coordinate system, the air traffic technician got a grossly innacurate answer, resulting in dozens of innocent passengers becoming instant prime-time celebrities.</p>
<h2>Coordinate systems for geo data</h2>
<p>At a basic level, a coordinate system can be thought of as providing &#8220;meaning&#8221; to a set of coordinates. When you see the location &#8220;<code>Point(-19.6 63.6)</code>&#8220;, how do you interpret those numbers? They could be the latitude and longitude of the Iceland volcano, but they could equally be measurements in feet from your front door, or light years from Alpha Centauri. The coordinate system is what differentiates these cases.</p>
<p>Location applications generally work with coordinate systems related to the earth&#8217;s surface, and these coordinate systems fall into three types.</p>
<p><strong>Geocentric</strong> coordinate systems are three-dimensional coordinate systems with the origin located at the earth&#8217;s center. You won&#8217;t generally see much data in a geocentric coordinate system, but it is sometimes a convenient coordinate system to use for computational geometry and analysis algorithms.</p>
<div id="attachment_113" class="wp-caption aligncenter" style="width: 361px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geocentric.png"><img class="size-full wp-image-113 " title="geocentric" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geocentric.png" alt="" width="351" height="321" /></a><p class="wp-caption-text">Geocentric coordinates measure X, Y, and Z distances from the center of the earth. (Credit: http://kartoweb.itc.nl/geometrics/Coordinate%20systems/coordsys.html)</p></div>
<p><strong>Geographic</strong> coordinate systems are the familiar latitude-longitude systems identifying points on the earth&#8217;s surface in terms of degrees. The most common geographic coordinate system, the one used by GPS and expected by most mapping applications, is known as &#8220;<em>WGS 84</em>&#8220;.</p>
<div id="attachment_114" class="wp-caption aligncenter" style="width: 399px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geographic.png"><img class="size-full wp-image-114 " title="geographic" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/geographic.png" alt="" width="389" height="379" /></a><p class="wp-caption-text">Geographic coordinates measure our familiar latitude and longitude. (Credit: http://kartoweb.itc.nl/geometrics/Coordinate%20systems/coordsys.html)</p></div>
<p><strong>Projected</strong> coordinate systems involve taking a portion of the earth&#8217;s surface and &#8220;flattening&#8221; it. These coordinate systems are planar and generally Cartesian for easy display and computation; however, they introduce various kinds and amounts of distortion. Whenever you see a map displayed on a piece of paper, a computer screen, or other flat medium (basically anything other than a globe), you are looking at a projection.</p>
<p>There are hundreds of different projections in use, from common projections used for world maps, to special-purpose projections used for specific regions. When you view a Google Map, you are looking at a <em>Mercator Projection</em>. This is a projection designed to preserve shapes and straight-line directions, at the expense of distorting sizes and distances away from the Equator. A Google Map, for instance, implies that Greenland is larger than Africa, when in fact it is much, much smaller.</p>
<div id="attachment_115" class="wp-caption aligncenter" style="width: 448px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/google_world.png"><img class="size-full wp-image-115" title="google_world" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/google_world.png" alt="" width="438" height="322" /></a><p class="wp-caption-text">Google Maps uses a Mercator Projection</p></div>
<p>The United Nations logo includes a <em>polar projection</em> putting the north pole at the center. In this projection the pole is the least distorted region of the world, and everything else revolves around it, symbolizing the ideal of a world community privileging no particular country (except possibly the northern hemisphere, but we&#8217;ll ignore the politics for our purposes&#8230;)</p>
<div id="attachment_116" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/un_logo.png"><img class="size-full wp-image-116" title="un_logo" src="http://www.daniel-azuma.com/blog/wp-content/uploads/2011/12/un_logo.png" alt="" width="300" height="255" /></a><p class="wp-caption-text">The United Nations logo includes a north polar projection</p></div>
<p>Finally, whenever you look at a folding map&#8211; whether a street map for a city, a state map, or any other map that shows a limited area&#8211; you are usually looking at a <em>local projection</em>, one specifically tailored to that particular limited area. Such projections define a particular area in which they make sense. Objects inside that boundary can usually be displayed with minimal distortion, while objects outside that boundary often cannot be described at all.</p>
<h2>Representing and specifying coordinate systems</h2>
<p>Whenever you receive location data&#8211; whether from geocoding, user input, an external database, or any other source&#8211; the data should come with a coordinate system. Currently, there are two common ways a coordinate system can be defined:</p>
<p><strong>Proj4 Syntax</strong>. One common way to specify a coordinate system is through a syntax defined by the <a title="Proj" href="http://proj.osgeo.org/" target="_blank">Proj</a> library. We won&#8217;t go into detail on the syntax here, but it is intended to describe how to convert data to and from that coordinate system. That is, if you receive data in a projected coordinate system, if you have the Proj4 syntax for that projection, the Proj library can convert the data back into latitude and longitude, and vice versa. Because of this useful property, Proj4 syntax is quite ubiquitous. Below is an example of the Proj4 syntax for &#8220;NAD83 / Washington North&#8221;, a local projection commonly used for topgraphic mapping in northern Washington state. For now, don&#8217;t worry if you don&#8217;t understand every field. This is just an example so that you can recognize Proj4 syntax when you see it.</p>
<pre>+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 +lat_0=47
  +lon_0=-120.8333333333333 +x_0=500000.0001016001 +y_0=0
  +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs</pre>
<p><strong>OGC well-known-text</strong> (or <strong>WKT</strong>). The <a title="OGC" href="http://www.opengeospatial.org/" target="_blank">Open Geospatial Consortium</a>, the standards body that developed the <a title="OGC simple features spec" href="http://www.opengeospatial.org/standards/sfa" target="_blank">Simple Feature Access</a> specification described in <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>, also developed a syntax for representing <a title="OGC coordinate transformation spec" href="http://www.opengeospatial.org/standards/ct" target="_blank">coordinate systems and transformations</a>. Below is the well-known-text for &#8220;NAD83 / Washington North&#8221;. Again, for now you don&#8217;t need to understand every field here, but you should be able to recognize WKT format when you see it.</p>
<pre>PROJCS["NAD83 / Washington North (ftUS)",
  GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
      SPHEROID["GRS 1980",6378137,298.257222101,
        AUTHORITY["EPSG","7019"]],
      AUTHORITY["EPSG","6269"]],
    PRIMEM["Greenwich",0,
      AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.01745329251994328,
      AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4269"]],
  UNIT["US survey foot",0.3048006096012192,
    AUTHORITY["EPSG","9003"]],
  PROJECTION["Lambert_Conformal_Conic_2SP"],
  PARAMETER["standard_parallel_1",48.73333333333333],
  PARAMETER["standard_parallel_2",47.5],
  PARAMETER["latitude_of_origin",47],
  PARAMETER["central_meridian",-120.8333333333333],
  PARAMETER["false_easting",1640416.667],
  PARAMETER["false_northing",0],
  AUTHORITY["EPSG","2285"],
  AXIS["X",EAST],
  AXIS["Y",NORTH]]</pre>
<p>As we saw above, each piece of geometry needs a corresponding coordinate system in order to specify the meaning of its coordinates and thus how to handle its data. Instead of attaching an entire Proj4 or WKT formatted string to every latitude-longitude point in a system, most geospatial systems provide a database of coordinate systems, each identified by an ID known as the <em>Spatial Reference ID</em> (or <em>SRID</em>). Each geometric data object then includes an SRID field referencing an entry in that database.</p>
<p>Technically, a geospatial system can provide its own spatial reference database and set its own SRIDs. However, in practice, many systems use a <em>de facto</em> standard dataset known as the <em>EPSG dataset</em>. This is a database of several thousand coordinate systems managed by the <a title="OGP" href="http://www.ogp.org.uk/" target="_blank">International Association of Oil &amp; Gas Producers</a>. The EPSG dataset is ubiquitous enough that most spatial database tools include a copy of it. A spatially-enabled <a title="PostGIS" href="http://www.postgis.org/" target="_blank">PostGIS</a> database, for example, automatically includes a table called <code>spatial_ref_sys</code> that is typically prepopulated with the EPSG dataset. You can look up SRIDs and get the coordinate system name, the WKT representation, and sometimes the Proj4 representation. The &#8220;NAD83 / Washington North&#8221; coordinate system example above has SRID 2285 in the EPSG database.</p>
<p>One important EPSG-specified SRID that you will encounter often is 4326. 4326 refers to the &#8220;WGS 84&#8243; geographic (latitude-longitude) coordinate system we mentioned earlier&#8211; the coordinate system used by Global Positioning System (GPS). Typically, when you get a latitude-longitude coordinate from a GPS system, from a geocoder, from a Google map input, or most other common sources, it will implicitly be in the EPSG 4326 coordinate system. This is so universally true that PostGIS currently mandates that the SRID must be set to 4326 when you create a column of type &#8220;geography&#8221; (that is, a latitude-longitude column), as we saw in <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>.</p>
<p>You can browse the EPSG database at <a title="spatialreference.org" href="http://www.spatialreference.org/" target="_blank">http://www.spatialreference.org/</a>.</p>
<h2>Coordinate systems for RGeo factories</h2>
<p>In <a title="Geo-Rails part 3" href="http://www.daniel-azuma.com/blog/archives/88" target="_blank">part 3</a>, we discussed how <a title="RGeo" href="http://virtuoso.rubyforge.org/rgeo/README_rdoc.html" target="_blank">RGeo</a> manages coordinate systems through factories. Now that we understand coordinate systems in more detail, we can take a closer look at how to handle coordinate systems in Ruby.</p>
<p>RGeo supports two main types of factories: <em>Cartesian factories</em> and <em>geographic factories</em>. Cartesian factories are ideal for handling projected coordinate systems, in which the domain is flat. Geographic factories are useful for geographic coordinate systems representing latitude and longitude, in which the domain is curved like the surface of the earth.</p>
<p>When you create an RGeo factory, you may specify the exact coordinate system by passing arguments to the factory constructor. In most cases, you should provide an SRID using the <code>:srid</code> parameter. You may also provide Proj4 and/or WKT representations for the coordinate system using the <code>:proj4</code> and <code>:coord_sys</code> parmeters, respectively. For example, here&#8217;s how you could create a factory designed to handle the &#8220;NAD83 / Washington North&#8221; projection we discussed earlier.</p>
<pre>north_wa_proj4 = '+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 ' +
  '+lat_0=47 +lon_0=-120.8333333333333 +x_0=500000.0001016001 ' +
  '+y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 ' +
  '+no_defs'
north_wa_wkt = &lt;&lt;WKT
  PROJCS["NAD83 / Washington North (ftUS)",
    GEOGCS["NAD83",
      DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101,
          AUTHORITY["EPSG","7019"]],
        AUTHORITY["EPSG","6269"]],
      PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
      UNIT["degree",0.01745329251994328,
        AUTHORITY["EPSG","9122"]],
      AUTHORITY["EPSG","4269"]],
    UNIT["US survey foot",0.3048006096012192,
      AUTHORITY["EPSG","9003"]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",48.73333333333333],
    PARAMETER["standard_parallel_2",47.5],
    PARAMETER["latitude_of_origin",47],
    PARAMETER["central_meridian",-120.8333333333333],
    PARAMETER["false_easting",1640416.667],
    PARAMETER["false_northing",0],
    AUTHORITY["EPSG","2285"],
    AXIS["X",EAST],
    AXIS["Y",NORTH]]
WKT
north_wa_factory = RGeo::Cartesian.factory(:srid =&gt; 2285,
  :proj4 =&gt; north_wa_proj4, :coord_sys =&gt; north_wa_wkt)</pre>
<p>Notice that, since the coordinate system is a projection, we&#8217;re using a Cartesian factory that will perform computations in a flat domain.</p>
<p>When you work with a projected coordinate system like this one, the coordinates themselves are expressed in the projection rather than in latitude and longitude. For example, the location of the Space Needle in Seattle, which is at latitude 47.620578, longitude -122.34978, is expressed as <code>(1266457.58, 230052.50)</code> in this coordinate system.</p>
<pre>space_needle = north_wa_factory.point(1266457.58, 230052.50)</pre>
<p>Let&#8217;s consider another example. If you want to work with latitudes and longitudes, say from a GPS system, then you should use the &#8220;WGS 84&#8243; coordinate system, which has SRID 4326. Looking up this coordinate system in <a title="Spatialreference.org" href="http://www.spatialreference.org/" target="_blank">spatialreference.org</a>, we can find its Proj4 and WKT forms:</p>
<pre>wgs84_proj4 = '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
wgs84_wkt = &lt;&lt;WKT
  GEOGCS["WGS 84",
    DATUM["WGS_1984",
      SPHEROID["WGS 84",6378137,298.257223563,
        AUTHORITY["EPSG","7030"]],
      AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
      AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.01745329251994328,
      AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
WKT</pre>
<p>To create a factory for this coordinate system, we should use one of the geographic factories provided by RGeo. These factories perform computations over a curved earth rather than a flat earth. For example, they will correctly draw the line between San Francisco and Athens so that it passes through Iceland. You can use <code>RGeo::Geographic.spherical_factory</code> to create a factory that performs computations on a spherical earth:</p>
<pre>wgs84_factory = RGeo::Geographic.spherical_factory(:srid =&gt; 4326,
  :proj4 =&gt; wgs84_proj4, :coord_sys =&gt; wgs84_wkt)</pre>
<p>We now have a factory that will correctly manage features using latitude/longitude.</p>
<pre>space_needle = wgs84_factory.point(-122.34978, 47.620578)</pre>
<p>Technically, the earth is not a perfect sphere, but is slightly flattened. The WGS84 coordinate system actually uses a flattened ellipsoid to more closely model this shape. However, RGeo does not yet support ellipsoidal computations, so we used a spherical factory instead as an approximation. In many cases, this will be good enough, but if you need accuracy down to the millimeter, this is something to be aware of.</p>
<h2>Converting data between coordinate systems</h2>
<p>The above examples may have sparked an important question. In the examples in parts 2 and 3, we created factories without the benefit of the proj4 and wkt strings, and in some cases we also omitted the SRID. Under what circumstances are proj4, wkt, and/or SRID needed when you create an RGeo factory, and under what circumstances can you leave them out?</p>
<p>SRID is generally needed when you want to store your data in a PostGIS database. This is because PostGIS generally puts an SRID constraint on spatial columns that you create, in order to make sure you don&#8217;t mismatch coordinate systems. So, if you are storing data in or pulling data from PostGIS, you should always specify the SRID in the factory.</p>
<p>The proj4 string has a different purpose: it is used by the Proj library to describe how to convert coordinates between coordinate systems. Here is how this works.</p>
<p>Suppose you are reading a set of points from a data source that uses the &#8220;NAD83 / Washington North&#8221; coordinate system, and you wanted to convert them to latitude and longitude. To perform this conversion, you need two factories, one in the source coordinate system and one in the destination coordinate system. Both factories need to have their <code>:proj4</code> string set. This lets the Proj library understand both coordinate systems so it can figure out how to convert from one to the other.</p>
<p>In the above examples when we created our <code>north_wa_factory</code> and <code>wgs84_factory</code>, we provided the needed proj4 strings. So these factories are ready to be used.</p>
<p>Now, to actually convert the data, load it using the source factory, and then use RGeo&#8217;s &#8220;cast&#8221; mechanism to cast it to the other factory. Call <code>RGeo::Feature.cast</code>, pass it the original object and the destination factory, and set the <code>:project</code> argument to true to tell it to transform coordinates.</p>
<pre>space_needle = north_wa_factory.point(1266457.58, 230052.50)
space_needle_latlon = RGeo::Feature.cast(space_needle,
  :factory =&gt; wgs84_factory, :project =&gt; true)</pre>
<p>This code tells RGeo to take the <code>space_needle</code> object (which is in the projected coordinate system) and convert it to the WGS84 factory, while transforming (projecting) its coordinates so they are correct in the new factory&#8217;s coordinate system. As a result, <code>space_needle_latlon</code> is created, containing latitude and longitude coordinates, and with its factory set to <code>wgs84_factory</code>.</p>
<p>What about the WKT string? Currently, RGeo does not have any functional use for the WKT string; it is just an informational field. So you can omit it if you want. However, it turns out that in many cases you can theoretically use the WKT to transform coordinates in the same way as the Proj4 string. RGeo will likely provide this capability in the future&#8211; you will be able to pass WKT instead of Proj4 to allow factories to transform coordinates.</p>
<h2>Using Coordinate Systems in Rails</h2>
<p>In <a title="Geo-Rails part 2" href="http://www.daniel-azuma.com/blog/archives/69" target="_blank">part 2</a>, we looked at an example in which we created a PostGIS column that stored point data in geographic (latitude-longitude) coordinates. We did this by installing the activerecord-postgis-adapter, which extends ActiveRecord migrations to create spatial columns, and which exposes spatial attributes as RGeo data objects.</p>
<p>Now let&#8217;s consider another example. Suppose we obtained a set of polygons (say, zip code boundaries) from a data source that uses the &#8220;NAD83 / Washington North&#8221; projection. We could convert the polygons to latitude-longitude, but remember that doing so can actually change the shape of the polygon. So let&#8217;s suppose we decided this was unacceptable, and we need to store the polygons in the database in the projected coordinate system. Here&#8217;s the migration for this case:</p>
<pre>class CreateZipCodes &lt; ActiveRecord::Migration
  def change
    create_table :zip_codes do |t|
      t.integer :zip
      t.polygon :boundary, :srid =&gt; 2285
    end
  end
end</pre>
<p>The name of our column is &#8220;<code>boundary</code>&#8220;, and we set its type to &#8220;<code>polygon</code>&#8220;. Now, in the example in part 2, we had set the <code>:geographic</code> property of the column, indicating that it was storing latitude and longitude and that it should use the PostGIS features designed for that case. In this new example, we are storing projected coordinates, so we do not set <code>:geographic</code>. Instead, we just set the SRID to match the coordinate system that we are using. Setting a SRID on the database column actually sets up a database constraint: it allows only geometries with a matching SRID to be stored in the column. This is PostGIS&#8217;s way of helping you avoid the mistake of our unfortunate air traffic control technician: that of mismatching our coordinate systems.</p>
<p>As we did in part 2, we also need to set the factory for this column so that Rails knows what factory to use when it reads geometries from the database.</p>
<pre>class ZipCode &lt; ActiveRecord::Base
  north_wa_factory = ... # use the factory we created earlier
  set_rgeo_factory_for_column(:boundary, north_wa_factory)
end</pre>
<p>In general, it is important that the factory you set in your ActiveRecord class matches the constraints in the database column, so that both sides handle the data in the same way. In particular, the SRIDs should match, and either both should be Cartesian or both should be geographic.</p>
<p>Now we can read and write the polygon data. We just need to remember that the data is not in latitude-longitude, but in the projected coordinate system. So when we write polygons to the database and read from the database, we will receive projected coordinates.</p>
<p>This example, of course, brings up an important question. We need to decide up front whether the database should contain projected data or latitude-longitude data. How do we choose? This can be a somewhat complicated question, and I will dive more deeply into the pros and cons of different strategies in a later article. However, for now we know enough to understand a few of the issues.</p>
<p>In many simple cases&#8211; if you are working only with points, or with small features that do not cover a large part of the globe, or if extreme accuracy is not important&#8211; you may find it easiest to think in latitude and longitude. In those cases, you can just create a <code>:geographic</code> column in the database and convert everything to a geographic coordinate system. Just be aware that there are potential issues whenever you have to convert data from one coordinate system to another. As we saw in our story, lines and polygons that span a large area can change their shape dramatically when switching coordinate systems. So if accuracy is essential, it may be desirable for your database to use the same coordinate system as your data source. You also should consider what type of spatial queries you are likely to run against your data. Remember that coordinates in a query must match the SRID and coordinate system of the data in your database.</p>
<p><span class="Apple-style-span" style="color: #000000; font-size: 22px; line-height: 32px;">Where to go from here</span></p>
<p>Congratulations on making it through this article! Understanding coordinate systems can be tricky, but it is very necessary for doing nontrivial applications. In this discussion, I&#8217;ve deliberately left out a number of more advanced topics that I&#8217;ll probably cover in a later article. But you should now have enough information so you won&#8217;t get lost when people start talking about projections and SRIDs.</p>
<p>If you&#8217;d like to explore more about map projections and geospatial coordinate systems, the articles on Wikipedia are not too bad. I&#8217;m not aware of any good books on this material geared towards web developers, but if you know of any, please send me a line. For reference, you&#8217;ll probably find yourself going to <a title="Spatialreference.org" href="http://www.spatialreference.org/" target="_blank">spatialreference.org</a> frequently when you need information on the coordinate system referenced by a particular SRID.</p>
<p>For the next article, I&#8217;m currently planning on covering common file and data interchange formats used for geospatial data. Stay tuned, and let&#8217;s bring Rails down to earth!</p>
<p><em>This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit <a title="Geo-Rails article series" href="http://www.daniel-azuma.com/blog/archives/category/tech/georails" target="_blank">http://www.daniel-azuma.com/blog/archives/category/tech/georails</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daniel-azuma.com/blog/archives/106/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

