Tuesday, June 8, 2010

Can I Have One NSDI with Some Confusion on the Side Please?

In this age of publish first, then filter, and instant gratification, it is easy to loose some of the real questions out of sight. The merging of Data.gov and Geodata.gov (yes, that is the plan) raises some questions that have gotten lost in the excitement from the last week.

Here are a couple observations on the subject that could be made by anyone who has been following the two sites over the past year(s):
  1. Geodata.gov harvests most of its content from over 300 other catalogs (visit the Geodata.gov Statistics tab and view the information on Partner Collections). Data.gov does not have this capability. These catalogs represent federal, state, and local government, academia, NGO, and commercial providers of geospatial resources (visit the same tab on Geodata.gov and view the information on Publisher Affiliations). Data.gov on the other hand focuses on content from the Executive Branch of the Federal Government. Where would the remaining content of Geodata.gov go? http://www.otherdata.gov?
  2. Geodata.gov focuses on FGDC+ISO metadata with the industry looking at migrating to the new North American Profile of ISO 191xx metadata. Data.gov has developed its own metadata specification and vocabulary that is quite different from this. Just look at a details page on Data.gov to confirm this. What is the position on this subject of FGDC and other federal agencies who have created standards-based metadata for many years?
  3. Geodata.gov has focused on the GIS analysts and first responders (check the original Statement of Work, I'm sure it's online somewhere). Data.gov seems to focus on a different audience (although honestly it's not entirely clear to me if that audience consists of developers or the general public. It’s a bit of both).
  4. Geodata.gov has supported a number of user communities in two ways:
    • by allowing them to create community pages with resources beyond structured metadata that are of interest to those communities. The content in these pages is managed by the communities themselves. How should Data.gov support these communities of interest?
    • by supporting community-oriented collections that group metadata from multiple source catalogs. Examples are RAMONA (the states’ GIS inventory), the Oceans and Coast Working Group (interested in all content in the US coastal zone), and Data.gov (actually, this is also configured as a collection in geodata.gov). These collections are exposed on the Geodata.gov Search tab and in the CS-W and REST interfaces to the catalog.Where would these collections end up after a merger of Geodata.gov and Data.gov?
    • Geodata.gov has created a Marketplace where those who are looking for data and those who have plans to acquire data can discovery each other and collaborate. A dating service of a different kind. While not specifically targeted at the masses, isn't one of the key principles of NSDI to collaborate to reduce redundant investments?
  5. Geodata.gov has created a search widget that has been implemented by several agencies such as the State of Delaware that enables searching geodata.gov directly from the website and thus getting access to state and other geospatial resources covering the area of the state. This widget can mean significant cost savings for agencies as they don't have to create their own clearinghouses. Will Data.gov provide such a role as well?
  6. Through FGDC CAP grants several tools were built that work against the Geodata.gov REST or CSW interfaces. I mentioned some of these capabilities and the links to these tools in my recent blog post. Merging Geodata.gov and data.gov would ideally not break these investments.
It would be nice to see the passion that was expressed over the last week be repeated, but now discussing some of these and other questions that affect the geospatial community at large.

Sunday, June 6, 2010

Building your own ArcGIS.com client

ArcGIS.com provides a great collection of resources and, as Jack explains below, allows other people to discover the work ESRI users are doing.



ArcGIS.com includes a cool website, but as we learned when developing the Geoportal Extension, it also provides a RESTful interface. This meant we could offer users of the Geoportal Extension access to the information others are sharing through ArcGIS.com.

In the Geoportal Extension we allow distributed searches to go to ArcGIS.com. We implemented this early on in our contribution to the Group on Earth Observation

Realizing that many organizations aren't waiting for yet another portal, we developed a simple mechanism to integrate a search widget into any web page that would allow searching Geoportals. This has resulted in an HTML widget that can be embedded with 2 simple lines of HTML. By default this widget searches the Geoportal it is part of. But hold on, there's more!

The Geoportal can search external catalogs, including ones that implement the Open Geospatial Consortium (OGC) Catalog Service for the Web (CS-W), but since 9.3.1 it can also search... ArcGIS.com! Try it at the GEO Portal by going to the search page and selecting ArcGIS.com from the 'search in' dialog. You'll notice it searches ArcGIS.com with the keywords you give. This means any Geoportal 9.3.1+ is a client to ArcGIS.com.

But back to the widget.

Directing the searches from the widget to ArcGIS.com is possible by adding a parameter that instructs the Geoportal to direct the searches to the identified remote site. And thus here is a widget that searches ArcGIS.com. All it took was a minimal HTML like this:

<html>
<body>
<p>Search widget for ArcGIS.com </p>
<script type="text/javascript"  
src="http://serverapi.arcgisonline.com/jsapi/arcgis/?v=1.3" ></script>
<script type="text/javascript" 
src="http://geoss.esri.com/geoportal/widgets/searchjs.jsp?rid=ArcGISOnline" >
</script>
</body>
</html> 
 
Using these lines you could embed ArcGIS.com searches in your own web page. Using this approach, you could build your own ArcGIS.com client. Take a look at databasin.org for a more sophisticated example.

To learn more about the options for the widget, visit the Geoportal Extension help pages.

PS: At version 10, the Geoportal will support federating searches to more than one remote catalog and also include ctalogs of non-structured metadata or even non-spatial content, such as Wikipedia, Flickr, YouTube, or your document management system. Try out the weekly release of Geoportal 10 at out public sandbox. Let me know if you find any issues. We are wrapping up development, but we're open to your feedback.

Friday, June 4, 2010

Accessing the Data.gov catalog through an open interface

In its first year, Data.gov has grown from 47 datasets to over 270,000 datasets. These datasets aren’t actually hosted at Data.gov. The government agencies making these datasets available, host the files (or web services), and share them with the community through data.gov. But how did these datasets become discoverable at Data.gov?

Actually, the datasets are registered with Geodata.gov, a national catalog of geospatial resources that has been around for some 7 years and that “serves as a public gateway for improving access to geospatial information and data under the Geospatial One-Stop E-Government initiative”.

Geodata.gov provides access to almost 400,000 geospatial resources from over 300 partner collections from federal, state, and local government, as well as academia and commercial providers. Rather than having to sift through as many web sites, users can go to Geodata.gov and perform searches there. Creators of the geospatial resources can register this content with Geodata.gov if they choose to do so.  From its inception Geodata.gov has aimed to be inclusive in the sense that it doesn’t matter what geospatial technology you use to create or consume geospatial data (or web services) in order to use Geodata.gov or its content.

This design principle of being open and interoperable applies not only to the content but to the site itself as well. Since its launch Geodata.gov has provided a search interface following the Open Geospatial Consortium (OGC) Catalog Service for the Web (CS-W) specification. Later geodata.gov added a RESTful interface that returns search results as GeoRSS, KML, HTML, and GeoJSON. These interfaces are intended to support using the content registered with Geodata.gov without using the website.

The RESTful interface has been used by the Carbon Project to develop a desktop widget that allows for content discovery on Geodata.gov directly on your windows desktop, as well as developers who have extended tools like NASA’s World Wind. ESRI has developed clients for ESRI’s ArcGIS Desktop and Explorer that use the CS-W interface to provide its users with data discovery capabilities. All these are free tools intended to help bring the content registered in Geodata.gov to the users.

So what does this have to do with Data.gov? Well, when Data.gov was in search for content (pun intended), it was just common sense to reuse the effort already put in a catalog of geospatial content: Geodata.gov. Since June 2009, Data.gov has been using the CS-W interface provided by Geodata.gov.

Federal agencies can mark the content they have registered with Geodata.gov for sharing with Data.gov. It is this subset that is discoverable in the Geodata Catalog on Data.gov and you can search this subset using the interfaces mentioned before, allowing you to build your own discovery clients to the content available in the Geodata Catalog of Data.gov and include spatial searching, advances filtering, etc. Features that are not (yet) available at Data.gov itself.

How? In the RESTful interface, simply adding the parameter isPartOf=data.gov will filter Geodata.gov for content that has been marked for sharing with Data.gov. A request for orthoimagery that is discoverable through the Geodata Catalog in Data.gov thus becomes:

http://geo.data.gov/geoportal/rest/find/document?isPartOf=data.gov&searchText=orthoimagery&f=html

Doing this in the CS-W interface means creating an OGC CS-W request like this:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" xmlns:ows="http://www.opengis.net/ows" version="2.0.2" service="csw" xmlns:dc="http://purl.org/dc/elements/1.1/" resultType="results"> 
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>summary</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter xmlns:ogc="http://www.opengis.net/ogc">
        <ogc:And>

          <ogc:PropertyIsLike wildCard="%" escape="" singleChar="">
            <ogc:PropertyName>AnyText</ogc:PropertyName>
            <ogc:Literal>isPartOf:data.gov</ogc:Literal>
          </ogc:PropertyIsLike>

          <ogc:PropertyIsLike wildCard="%" escape="" singleChar="">
            <ogc:PropertyName>AnyText</ogc:PropertyName>
            <ogc:Literal>orthoimagery</ogc:Literal>
          </ogc:PropertyIsLike>

        </ogc:And>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>


More details on these interfaces to use the content of Geodata.gov and Data.gov’s Geodata Catalog is available in the API Documentation.

Whether you want to use the RESTful interface or prefer the CS-W + XML approach, the content in Data.gov and Geodata.gov is yours to discover. Use that content to make a nice map or two. Please don’t use it to plan your strategy to take over the world.