Government | Data and the Web

Data and the Web

Archive for the ‘government' Category

Further Sunlight on Government Data

Monday, July 20th, 2009

sunbeams1.pngIn a previous post, we discussed some of the interesting things the US government is doing to make its data more widely available, culminating in the website. This website is now up and running and has definitely made some progress since we've last discussed it. is broken down into three main catalogs:

  1. Raw Data Catalog (with data files available in XML, CSV, KML, etc.)
  2. Tools Catalog (list of tools built to work with various open data sets)
  3. Geodata Catalog (links to Federal geospatial data)

They've also tried to make it easier to search for data sets, which like video, is quite reliant on being tagged with good, meaningful descriptions and related meta data. It's a hard nut to crack. For example, government agencies tend to release data sets on an annual basis, so you'll have, say, 5 different data sets (and counting) for the “Public Libraries Survey” from 2004 through 2008. If your search terms aren't specific enough, these repetitious items tend to clutter up the search results. As continues to add more data sets, hopefully they can refine this area further.

But, then again, maybe they won't have to. The folks at Sunlight Labs, whose mission is to build technology that makes government more transparent and accountable, has recently announced a project called The National Data Catalog. It will be a tool that aims to take the concept and improve upon it. From the announcement:

“We think we can add value on top of things like and the municipal data catalogs by autonomously bringing them into one system, manually curating and adding other data sources and providing features that, well, Government just can't do. There'll be community participation so that people can submit their own data sources, and we'll also catalog non-commercial data that is derivative of government data like OpenSecrets. We'll make it so that people can create their own documentation for much of the undocumented data that government puts out and link to external projects that work with the data being provided.”

This should be interesting to watch. As the Sunlight folks say in a later post, they are not out to replicate, but to stand on its shoulders (similar to how, say, relies on and improves upon the National Weather Service). Given the nature of the beast, data sets need to be described really well in order to be both searchable and useful. Hopefully the community aspect, in particular, can help give this data more utility. If any are tech savvy folks interested in either following the project or contributing with code, here's the project page.

Thursday, March 5th, 2009

OMB SealWe recently posted an article about Vivek Kundra, who was named United States CIO this morning by the Obama administration. He's got $71 billion in IT spending under his care. Hmm, that's a lot of data browsers.

One interesting tidbit appeared in this Saul Hansell NY Times article:

Another initiative will be to create a new site,, that will become a repository for all the information the government collects. He pointed to the benefits that have already come from publishing the data from the Human Genome Project by the National Institutes of Health, as well as the information from military satellites that is now used in GPS navigation devices.

"There is a lot of data the federal government has and we need to make sure that all the data that is not private, or restricted for national security reasons, can be made public," [Kundra] said.

In another bit of interesting news, the Jonathan Stein at Mother Jones notes that Mike Honda (D-Calif) added a provision into the recent appropriations bill that requires government entities to make their public available in raw form:

If the Senate passes the bill with the provision intact, citizens seeking information about Congress' activities—such as bill names and numbers, amendments, votes, and committee reports—won't have to rely on government websites, which often filter information, are incomplete, or are difficult to use. Instead, the underlying data will be available to anyone who wants to build a superior site or tool to sift through it. “The language is groundbreaking in that it supports providing unfiltered legislative information to the public,” says Honda's online communications director, Rob Pierson. “Instead of silo-ing the information, and only allowing access through a limited web form, access to the raw data will make it easier for people to learn what their government is doing.”

Kim Zetter from Wired has more on the story here.

Maybe once the data is made more accessible, some clever folks can put an interface on things that improve the complex aftermath of the “laws and sausages” routine. I did my best to search for Honda's three-sentence provision in the latest omnibus bill with no luck. Anyone know what the actual provision stated? [UPDATE: Rob Pierson, Online Communications Director of Congressman Honda's office, provided a link to an O'Reilly post with the full text of the provision. Give the full article a read — it's quite worthwhile.]

And, for posterity, here are some of the data repositories mentioned in the articles above:

Free E-Gov Conference (via webcast) on February 17, 2009

Wednesday, February 11th, 2009

As a follow up to my previous post on e-government, just wanted to let those who are interested know that there's a free conference offered next week that will get much more in-depth about the initiatives for changing the way government uses and disburses information. The conference will also have a particular emphasis on using semantic technologies.

Here are the details:

From E-Gov to Connected Governance: the Role of Cloud Computing, Web 2.0 and Web 3.0 Semantic Technologies

Tuesday, February 17, 2009.

Morning session: 8:30 am EST to 12:00 noon. Afternoon session: 1:00 pm EST to 4:00 pm EST.

Synopsis: “We have a new administration that values transparency, citizen participation, collaboration, information sharing, and internet technology… The purpose of this conference is to operationalize this vision, demonstrate the kinds of changes that are coming to next stage web-based systems in government, and to map the role of information and communication technologies (specifically, cloud computing, Web 2.0, and Web 3.0 semantic technologies) in the evolution of government information systems from e-gov (silos with web front ends) to connected governance (e.g. distributed social computing environments for collaborative work, information sharing, knowledge management, and participatory decision-making.)”

Webcast sign-up here (or, if you are in Washington DC area, you could attend in person)

Further information about the conference can be found here.

More Government Data Coming to a Browser Near You…

Friday, February 6th, 2009

File CatalogIt was intriguing to see how all this newfangled web 2.0 technology was applied during the US presidential campaign this past year (organization, multimedia, etc.). It's also quite interesting to hear about some of the big ideas for how the new administration wants to change how government works. And, not to be outdone, the opposition party is also getting into the Web 2.0 game.

According to Nextgov, it appears that Vivek Kundra, current CTO of the District of Columbia, is going to be given the nod as the next e-government liaison. From the article:

Kundra also is a strong proponent of giving the public access to government data. “Why does the government keep information secret?” he rhetorically asked during an interview with Nextgov. “Why not put it all out in the government domain?” [Since arriving in Washington], I've made all the government databases public. Every 311 call, every abandoned automobile, who has responded, etc. It provides high-level oversight of the daily tasks of government.”

A more in-depth bio of Kundra can be found at this recent Washington Post article. A couple of the more intriguing things that he promoted in the District of Columbia were the DC Data Catalog and “Apps for Democracy.”

The data catalog covers all kinds of DC data from crime statistics to — ahem — most recent roadkill pickups. It's also available in a wide variety of formats. The “Apps for Democracy” was a kind of mashup contest to see what kind of apps could be developed to improve DC resident's access to data. It was highly successful, providing 47 different applications for a fraction of the cost of formally contracting out these projects.

Of course, changing such a huge, bureaucratic system as the Federal government will not happen overnight, but it is encouraging to see more of a focus on making data available in a timely manner (and in usable formats).

For those interested in this sort of thing, I'd also recommend checking out the Sunlight Foundation, which is focused on government transparency. Also, TechPresident and Nextgov are both news sources focused on following all things e-gov.

Got any other interesting links on this topic? Please feel free to post ‘em in the comments below.

A CSV File You Can Believe In

Monday, December 1st, 2008 logoThis is not a blog that delves into political issues, but I happened to notice that the Obama transition team released the names of all their donors today. However, inexplicably, they don't have them in a CSV format for easy slicing and dicing in your favorite data analysis software.

A couple clicks in Kirix Strata™ took care of that pretty quickly. (*.csv, 120 KB)

Some interesting bits of information:

  • Google is the employer with the most total donations at $14,200 (from “Google” and “Google, Inc.”, 8 employees).
  • Microsoft employees only gave $500 (2 employees)
  • 74 different colleges and universities were represented for $25,900 (81 employees)
  • 4 people who defined themselves as “Not Employed” gave a total of $11,250.
  • There are 1,776 donors in the list. Mere coincidence… or more evidence that Obama is truly “that one” (alternatively, the list could have been hacked because he is “the one“)?

The data is a little bit dirty (particularly the “Employer” field), but you might have some fun poking around. Shoot us a message in the comments if you find anything interesting.

P.S. Also, I saw this article about data overload during the campaign… looks like the Federal Election Commission could have used the Kirix Strata government discount. ;)

Update: Also, looks like George Lucas jumped in and we see an employee of the notorious Dewey, Cheetham & Howe


Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.