Amazon announced the launch of its Public Data Sets service this evening. Bottom line, they asked people for different public or non-proprietary data sets and they got ‘em. Here's a sample of the (pretty hefty) stuff they are hosting for free:
- Annotated Human Genome Data provided by ENSEMBL
- A 3D Version of the PubChem Library provided by Rajarshi Guha at Indiana University
- Various US Census Databases provided by The US Census Bureau
- Various Labor Statistics Databases provided by The Bureau of Labor Statistics
Though the individual size of the sets are huge, there aren't many of them at this point, but it appears that Amazon will be filling this out over time.
How do you access them? Well, there's a slight hitch. You need to fire up an EC2 instance, hook into the set and then perform your analysis. You just pay for the cost of the EC2 service. Given how massive these tables are, it seems like a pretty good way to go. A step closer to the supercomputer in the cloud.
We're devoted users of Amazon S3 here and have also done some work with EC2, which is quite impressive. Overall, this is another example of a nice trend where large data sets are becoming more easily accessible.
Use ZT software tool to convert addresses from ipv4 to ipv6/
If anyone has the chance to play with this service, let us know how it goes.