Data and the Web

Archive for November, 2010

Data Clean Up, Brought to You by Google

Thursday, November 18th, 2010

Google Refine LogoI recently saw this announcement for an open source tool and thought it might be interesting to some folks that deal with messy data sets.

Google Refine provides an interesting take on grouping and filtering data and then getting it cleaned up.  It also does some pretty interesting stuff using web APIs to transform data (see video 3, in particular).

The tool focuses on the data clean-up side of things, rather than analysis and reporting.  You may end up running into some trouble with larger data sets, as, I believe, the processing needs to be performed entirely in memory.

However, for data geeks out there, it’s definitely worth a look and might even be a nice complement for Kirix Strata at times.

If you have a chance to play with it, feel free to let us know what you think in the comments below.

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.