Thursday, November 18th, 2010
I recently saw this announcement for an open source tool and thought it might be interesting to some folks that deal with messy data sets.
Google Refine provides an interesting take on grouping and filtering data and then getting it cleaned up. It also does some pretty interesting stuff using web APIs to transform data (see video 3, in particular).
The tool focuses on the data clean-up side of things, rather than analysis and reporting. You may end up running into some trouble with larger data sets, as, I believe, the processing needs to be performed entirely in memory.
However, for data geeks out there, it’s definitely worth a look and might even be a nice complement for Kirix Strata at times.
If you have a chance to play with it, feel free to let us know what you think in the comments below.