Friday, June 4, 2010

Accessing the Data.gov catalog through an open interface

In its first year, Data.gov has grown from 47 datasets to over 270,000 datasets. These datasets aren’t actually hosted at Data.gov. The government agencies making these datasets available, host the files (or web services), and share them with the community through data.gov. But how did these datasets become discoverable at Data.gov?

Actually, the datasets are registered with Geodata.gov, a national catalog of geospatial resources that has been around for some 7 years and that “serves as a public gateway for improving access to geospatial information and data under the Geospatial One-Stop E-Government initiative”.

Geodata.gov provides access to almost 400,000 geospatial resources from over 300 partner collections from federal, state, and local government, as well as academia and commercial providers. Rather than having to sift through as many web sites, users can go to Geodata.gov and perform searches there. Creators of the geospatial resources can register this content with Geodata.gov if they choose to do so.  From its inception Geodata.gov has aimed to be inclusive in the sense that it doesn’t matter what geospatial technology you use to create or consume geospatial data (or web services) in order to use Geodata.gov or its content.

This design principle of being open and interoperable applies not only to the content but to the site itself as well. Since its launch Geodata.gov has provided a search interface following the Open Geospatial Consortium (OGC) Catalog Service for the Web (CS-W) specification. Later geodata.gov added a RESTful interface that returns search results as GeoRSS, KML, HTML, and GeoJSON. These interfaces are intended to support using the content registered with Geodata.gov without using the website.

The RESTful interface has been used by the Carbon Project to develop a desktop widget that allows for content discovery on Geodata.gov directly on your windows desktop, as well as developers who have extended tools like NASA’s World Wind. ESRI has developed clients for ESRI’s ArcGIS Desktop and Explorer that use the CS-W interface to provide its users with data discovery capabilities. All these are free tools intended to help bring the content registered in Geodata.gov to the users.

So what does this have to do with Data.gov? Well, when Data.gov was in search for content (pun intended), it was just common sense to reuse the effort already put in a catalog of geospatial content: Geodata.gov. Since June 2009, Data.gov has been using the CS-W interface provided by Geodata.gov.

Federal agencies can mark the content they have registered with Geodata.gov for sharing with Data.gov. It is this subset that is discoverable in the Geodata Catalog on Data.gov and you can search this subset using the interfaces mentioned before, allowing you to build your own discovery clients to the content available in the Geodata Catalog of Data.gov and include spatial searching, advances filtering, etc. Features that are not (yet) available at Data.gov itself.

How? In the RESTful interface, simply adding the parameter isPartOf=data.gov will filter Geodata.gov for content that has been marked for sharing with Data.gov. A request for orthoimagery that is discoverable through the Geodata Catalog in Data.gov thus becomes:

http://geo.data.gov/geoportal/rest/find/document?isPartOf=data.gov&searchText=orthoimagery&f=html

Doing this in the CS-W interface means creating an OGC CS-W request like this:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" xmlns:ows="http://www.opengis.net/ows" version="2.0.2" service="csw" xmlns:dc="http://purl.org/dc/elements/1.1/" resultType="results"> 
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>summary</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter xmlns:ogc="http://www.opengis.net/ogc">
        <ogc:And>

          <ogc:PropertyIsLike wildCard="%" escape="" singleChar="">
            <ogc:PropertyName>AnyText</ogc:PropertyName>
            <ogc:Literal>isPartOf:data.gov</ogc:Literal>
          </ogc:PropertyIsLike>

          <ogc:PropertyIsLike wildCard="%" escape="" singleChar="">
            <ogc:PropertyName>AnyText</ogc:PropertyName>
            <ogc:Literal>orthoimagery</ogc:Literal>
          </ogc:PropertyIsLike>

        </ogc:And>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>


More details on these interfaces to use the content of Geodata.gov and Data.gov’s Geodata Catalog is available in the API Documentation.

Whether you want to use the RESTful interface or prefer the CS-W + XML approach, the content in Data.gov and Geodata.gov is yours to discover. Use that content to make a nice map or two. Please don’t use it to plan your strategy to take over the world.

No comments:

Post a Comment