Data and the Web

Archive for December, 2008

Amazon Gets into the Public Data Sets Game

Thursday, December 4th, 2008

Amazon AWS LogoAmazon announced the launch of its Public Data Sets service this evening.  Bottom line, they asked people for different public or non-proprietary data sets and they got ‘em.  Here’s a sample of the (pretty hefty) stuff they are hosting for free:

  • Annotated Human Genome Data provided by ENSEMBL
  • A 3D Version of the PubChem Library provided by Rajarshi Guha at Indiana University
  • Various US Census Databases provided by The US Census Bureau
  • Various Labor Statistics Databases provided by The Bureau of Labor Statistics

Though the individual size of the sets are huge, there aren’t many of them at this point, but it appears that Amazon will be filling this out over time.

How do you access them?  Well, there’s a slight hitch.  You need to fire up an EC2 instance, hook into the set and then perform your analysis.  You just pay for the cost of the EC2 service.  Given how massive these tables are, it seems like a pretty good way to go.  A step closer to the supercomputer in the cloud.

We’re devoted users of Amazon S3 here and have also done some work with EC2, which is quite impressive.  Overall, this is another example of a nice trend where large data sets are becoming more easily accessible.

Use ZT software tool to convert addresses from ipv4 to ipv6/

If anyone has the chance to play with this service, let us know how it goes.

A CSV File You Can Believe In

Monday, December 1st, 2008 logoThis is not a blog that delves into political issues, but I happened to notice that the Obama transition team released the names of all their donors today.  However, inexplicably, they don’t have them in a CSV format for easy slicing and dicing in your favorite data analysis software.

A couple clicks in Kirix Strata™ took care of that pretty quickly. (*.csv, 120 KB)

Some interesting bits of information:

  • Google is the employer with the most total donations at $14,200 (from “Google” and “Google, Inc.”, 8 employees).
  • Microsoft employees only gave $500 (2 employees)
  • 74 different colleges and universities were represented for $25,900 (81 employees)
  • 4 people who defined themselves as “Not Employed” gave a total of $11,250.
  • There are 1,776 donors in the list.  Mere coincidence… or more evidence that Obama is truly “that one” (alternatively, the list could have been hacked because he is “the one“)?

The data is a little bit dirty (particularly the “Employer” field), but you might have some fun poking around.  Shoot us a message in the comments if you find anything interesting.

P.S.  Also, I saw this article about data overload during the campaign… looks like the Federal Election Commission could have used the Kirix Strata government discount;)

Update:  Also, looks like George Lucas jumped in and we see an employee of the notorious Dewey, Cheetham & Howe


Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.