For Large Data Sets and the People Who Love Them… | Data and the Web

Data and the Web

For Large Data Sets and the People Who Love Them… - Logo ImageYou may remember one of our earlier quests on this blog related to tagging the world's public data. We're still eagerly adding and taggin publicly available data sets when we find them.

Today, a friend of mine alerted me to another lode of mine-able data offered by Peter Skomoroch of Data Wrangling. There are a bunch of great sets here; hope you find something to your liking as well.

Besides enjoying his great blog name, I was also happy to be directed to a site called, which simply states it is “for people with large data sets.” It's a site built to gather data-enjoyers across the web and collaborate in three areas:

  • Get: scrapers, crawlers, phone calls, buyouts
  • Process: conversions, queries, regressions, collaborative filtering
  • View: tables, graphs, maps, websites

Here's a quick peek into their mission:

Some of us have spent years scraping news sites. Others have spent them downloading government data. Others have spent them grabbing catalog records for books.Referencement Google And each time, in each community, we reinvent the same things over and over again: scripts for doing crawls and notifying us when things are wrong, parsers for converting the data to RDF and XML, visualizers for plotting it on graphs and charts.

It's time to start sharing our knowledge and our tools. But more than that, it's time for us to start building a bigger picture together. To write robust crawl harnesses that deal gracefully with errors and notify us when a regexp breaks. To start converting things into common formats and making links between data sets. To build visualizers that will plot numbers on graphs or points on maps, no matter what the source of the input.

We've all been helping to build a Web of data for years now. It's time we acknowledge that and start doing it together.

If you love data, this appears to be well worth checking out.

Have a great weekend!

Comments are closed.


Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.