2007 July | Data and the Web

Data and the Web

Archive for July, 2007

Situational Integration

Tuesday, July 31st, 2007

ProgrammableWeb has a nice write up today about some of the challenges in the mashup tools market. It included a link to an excellent write-up of mashup platforms by Dan Hinchliffe of ZDnet. Dan writes:

Mashups could theoretically allow business users to move — when appropriate — from their current so-called "end-user development tools" such as Microsoft Excel that are highly isolated and poorly integrated to much more deeply integrated models that are more Web-based and hence more open, collaborative, reusable, shareable, and in general make better use of existing sources of content and functionality. Remember, business workers still spend a significant amount of time manually integrating together the data in their ever increasing number of business applications. Tools that could let thousands of workers solve their situational software integration problems on the spot themselves, instead of waiting (sometimes forever) for IT to provide a solution, is indeed a potent vision.

We agree.

We've seen time and again how business users need to integrate and work with data from different sources — although usually only with data internal to the company. However, as the web provides more and more useful information, people will also want to include external data as well. And, if normal people can do this on their own without much IT support, the potential for increased productivity and efficiency, not to mention new discovery, skyrockets.

We're currently exploring some of these possibilities with our recently-released beta of Kirix Strata™. What makes Strata unique is its ability to work seamlessly with data wherever it's located — whether a back-end database like Oracle or an Excel file on your desktop or a website, with or without an API. Much of our work is still cooking in our labs, but we'll be providing some concrete examples shortly. Stay tuned!

Spreadsheets, Ltd.

Wednesday, July 25th, 2007

strata gridA friend of mine uses Microsoft Excel quite a bit and recently asked me what Kirix Strata™ can do that Excel can't. This is a very reasonable question to ask.

In fact, as an avid spreadsheet user myself, Excel lets me do all kinds of great things with data like creating budgets or putting together various lists. I can use formulas to create instant calculations and change data on a whim to perform what-if scenarios. Excel even gives me a few “database” tools to use, like sorting and filtering.

However, the strength of a spreadsheet lies in its ability to handle unstructured data really well. When I create a budget, I'm happy to mingle a column heading, my data points and a sum/total in the same column — and Excel is delighted to let me do it (or, at least, so suggests Clippy). It is cell-based, so you can place data wherever you'd like without any concern.

The trouble comes when you start dealing with larger amounts of structured data. We've seen this issue a lot, particularly when working with corporate clients. Excel is the most familiar tool for ad hoc calculations, but when something comes up where a user is presented with 20,000 records (or millions), it gets a little more dicey. Often the only option is to start working with a desktop database like Access. Unfortunately, a desktop database can often be a bit too complex for someone who just wants to quickly use their data like they would with a spreadsheet.

This is where Strata can really help. At its core, it was built to solve the problem of data usability. Basically, we're trying to give people the ability to handle structured data really easily, wherever they may encounter it.

Strata will happily take the tens of thousands or tens of millions of records and let you create calculations instantly across the entire column. Or, just like Excel, you can sort or filter your data, but do so across the entire data set with a single click. Of course, there are plenty of more “database” things you can do too (relationships, queries, reports, scripting, etc.), but the key is being able to quickly and easily use the data however you wish.

A pretty classic business issue came up in a forum post today. In this situation, Greg was trying to identify duplicate inventory items in a 63,000 record file. He created a calculation to remove some “noise” from the data, then he grouped it together and found out which ones were duplicated. From there, he could take the results and remove the duplicated records from the original database to prevent future processing errors.

This process would have taken all of a couple minutes to perform. With a spreadsheet, however, this would have been much more cumbersome because of the file size (it would barely fit in most versions of Excel) and the need to copy a formula over 63,000 rows. I'm actually not sure if Excel could handle the grouping function in the same way.

Excel is a excellent tool for unstructured data, but just wasn't designed for the rigors of handling structured data. One of the many things Strata offers is an easy transition for folks needing to analyze larger amounts of structured data.

Do you have any data issues that seem to be pushing the scope of your spreadsheet? Let us know, we'd be happy to help.

Tagging the World's publicdata

Tuesday, July 17th, 2007

SignpostThere's a surprising amount of publicly available data on the web — government statistics, economic information, sports data, etc. And lots of it is in good ol' fashioned CSV files ripe for analysis.

Jon Udell has recently begun tracking this kind of data using del.icio.us and has asked anyone who is so inclined to follow along. All you have to do to join in the fun is tag your bookmark publicdata.

With Kirix Strata™, we've been interested in identifying public data sources as well and have been jotting bookmarks down as we've come across them. We're quite pleased to finally have a useful, publicly available place to put them:

kirixstrata/publicdata

We've only added a few to start with, but you'll see more added in the coming weeks.

Got any good publicdata to share?

The Birth of a Data Browser

Tuesday, July 17th, 2007

Strata LogoWell, it took a lot more blood, sweat and tears than we expected, but we're really excited to announce our first public beta release of Kirix Strata™, the data browser.

And what, pray tell, is a “data browser”?

Well, Strata is a specialty browser that lets you access and manipulate data from pretty much anywhere on the web. For instance, Strata will let you grab HTML tables or RSS Feeds or even open up CSV files directly from a URL (wow, that's a lot of acronyms).

Then when you've got the data in a table, you can do all sorts of ad hoc analysis. You can create calculations or sort and filter or create queries and reports — similar to the kinds of things you might do with a desktop database or a spreadsheet. In addition to web data, you can still work with data from your desktop or in a database system like Oracle or MySQL Enterprise.

And for those more technically-inclined, Strata also includes an implementation of ECMAScript — so anyone familiar with Javascript should feel right at home. The nice thing about the scripting is that it also includes bindings for SQL and HTTP — which can make for a lot of fun when connecting to Web APIs, creating “desktop mashups” or building extensions. And to boot, it runs on both Windows and Linux (at this moment, only Ubuntu is supported officially).

We also just want to give a quick shout out to the excellent folks at wxWidgets (we use their GUI library) and Mozilla (Strata incorporates the Gecko engine) — without which, Strata would only be a mere twinkle in our eye.

So, without further ado, check out the Kirix Strata introduction video:

Play Video

(And here's an embeddable YouTube version…)

and then

Download and try out the data browser for yourself

We hope you enjoy it!

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.