For Large Data Sets and the People Who Love Them…

You may remember one of our earlier quests on this blog related to tagging the world's public data. We're still eagerly adding and taggin publicly available data sets when we find them.

Today, a friend of mine alerted me to another lode of mine-able data offered by Peter Skomoroch of Data Wrangling. There are a bunch of great sets here; hope you find something to your liking as well.

Besides enjoying his great blog name, I was also happy to be directed to a site called theinfo.org, which simply states it is “for people with large data sets.” It's a site built to gather data-enjoyers across the web and collaborate in three areas:

Get: scrapers, crawlers, phone calls, buyouts
Process: conversions, queries, regressions, collaborative filtering
View: tables, graphs, maps, websites

Here's a quick peek into their mission:

Some of us have spent years scraping news sites. Others have spent them downloading government data. Others have spent them grabbing catalog records for books.Referencement Google And each time, in each community, we reinvent the same things over and over again: scripts for doing crawls and notifying us when things are wrong, parsers for converting the data to RDF and XML, visualizers for plotting it on graphs and charts.

It's time to start sharing our knowledge and our tools. But more than that, it's time for us to start building a bigger picture together. To write robust crawl harnesses that deal gracefully with errors and notify us when a regexp breaks. To start converting things into common formats and making links between data sets. To build visualizers that will plot numbers on graphs or points on maps, no matter what the source of the input.

We've all been helping to build a Web of data for years now. It's time we acknowledge that and start doing it together.

If you love data, this appears to be well worth checking out.

Have a great weekend!

This entry was posted on Friday, January 18th, 2008 at 5:19 pm and is filed under data repositories, data tags/search. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Data and the Web

For Large Data Sets and the People Who Love Them…

About

Archives

Categories