Data Analysis | Data and the Web - Part 2

Data and the Web

Archive for the ‘data analysis' Category

Spreadsheets, Ltd.

Wednesday, July 25th, 2007

strata gridA friend of mine uses Microsoft Excel quite a bit and recently asked me what Kirix Strata™ can do that Excel can't. This is a very reasonable question to ask.

In fact, as an avid spreadsheet user myself, Excel lets me do all kinds of great things with data like creating budgets or putting together various lists. I can use formulas to create instant calculations and change data on a whim to perform what-if scenarios. Excel even gives me a few “database” tools to use, like sorting and filtering.

However, the strength of a spreadsheet lies in its ability to handle unstructured data really well. When I create a budget, I'm happy to mingle a column heading, my data points and a sum/total in the same column — and Excel is delighted to let me do it (or, at least, so suggests Clippy). It is cell-based, so you can place data wherever you'd like without any concern.

The trouble comes when you start dealing with larger amounts of structured data. We've seen this issue a lot, particularly when working with corporate clients. Excel is the most familiar tool for ad hoc calculations, but when something comes up where a user is presented with 20,000 records (or millions), it gets a little more dicey. Often the only option is to start working with a desktop database like Access. Unfortunately, a desktop database can often be a bit too complex for someone who just wants to quickly use their data like they would with a spreadsheet.

This is where Strata can really help. At its core, it was built to solve the problem of data usability. Basically, we're trying to give people the ability to handle structured data really easily, wherever they may encounter it.

Strata will happily take the tens of thousands or tens of millions of records and let you create calculations instantly across the entire column. Or, just like Excel, you can sort or filter your data, but do so across the entire data set with a single click. Of course, there are plenty of more “database” things you can do too (relationships, queries, reports, scripting, etc.), but the key is being able to quickly and easily use the data however you wish.

A pretty classic business issue came up in a forum post today. In this situation, Greg was trying to identify duplicate inventory items in a 63,000 record file. He created a calculation to remove some “noise” from the data, then he grouped it together and found out which ones were duplicated. From there, he could take the results and remove the duplicated records from the original database to prevent future processing errors.

This process would have taken all of a couple minutes to perform. With a spreadsheet, however, this would have been much more cumbersome because of the file size (it would barely fit in most versions of Excel) and the need to copy a formula over 63,000 rows. I'm actually not sure if Excel could handle the grouping function in the same way.

Excel is a excellent tool for unstructured data, but just wasn't designed for the rigors of handling structured data. One of the many things Strata offers is an easy transition for folks needing to analyze larger amounts of structured data.

Do you have any data issues that seem to be pushing the scope of your spreadsheet? Let us know, we'd be happy to help.

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.