Data and the Web

Archive for the ‘spreadsheets' Category

Cooking the (Quick)Books

Wednesday, January 14th, 2009

Illinois ST-1 Image Ah, tax season… could there be a more thrilling time of the year?

So, today I was reviewing a sales & use tax form for the State of Illinois. Since our governor really isn't helping matters in our state these days, we felt the least we could do to help was to make sure to pay our taxes on time.

So, I was looking at our sales tax report in Quickbooks and, like a good accountant, just quickly checked to make sure it matched up against the total revenues in the income statement. They didn't match.

Hmm… funny thing about accounting, things really ought to balance.

It was a small discrepancy, but after searching unsuccessfully for the difference, it was clear that the issue involved more than one transaction. And, unfortunately, there were just far too many transactions to try and come up with a solution manually.

So, since I happened to have this data browser laying around, I exported both reports as CSV files and opened them up in Kirix Strata™.

The Quickbooks CSVs were obviously meant for spreadsheet export (as it included subtotals and odd record breaks), so I quickly did some clean up and then did a few gymnastics to compare the tables. Turns out there were a few manual journal entries that weren't mapped to the sales tax codes required by Quickbooks. And here I was hoping to blame Quickbooks… oh well.

Running through this process was a 5 minute affair, but it made me wonder about all these other small data manipulation tasks that are out there. There have got to be millions, nay, billions, of these things — 5 minute one-off, ad hoc data tasks that just can't be solved with the help of a spreadsheet (in this case, grouping or relationships were needed to do this quickly).

What do people normally do in these situations? I fear that they probably spend hours working the problem manually. Got a similar story and/or solution? Feel free to share in the comments section below.

Posted in data analysis, dirty data, spreadsheets | 2 Comments »

The Long Tail of Enterprise Software Demand

Thursday, June 19th, 2008

I was able to attend Dion Hinchliffe's webinar yesterday (sponsored by Snaplogic — three more free seminars to go) called “Bringing Web 2.0 into the Enterprise with Mashups: Drivers, Requirements and Benefits.” The session was a very a nice overview of how mashups have impacted the consumer space and how they are creeping into the enterprise. However, there was one point that struck me as particularly salient… it was something Dion termed “The Long Tail of Enterprise Software Demand.”

I always find it interesting when the concept of the long tail is applied outside of its original scope, and I think Dion nailed it on the head with this analogy. The synopsis is that there is a large demand curve for software in the enterprise, but only the biggest, most global projects get funded and developed. The rest of IT's resources go to maintaining existing systems. However, there is an extremely long tail of other customized software needs at the business unit level, the departmental level, and even at the individual level that never get created.

The point Dion was making was that there is a lot of potential for easy-to-develop mashups to fill this gap — a self-serve model, if you will. Mashup tools would make it easy for individuals to create the specific applications they need with a short turnaround time. In fact, one of Dion's wrap-up points was that mashup tools should be as easy to use as a spreadsheet.

To take a step back for a second, it may be useful to define what a mashup is. I would venture to say that when people think of mashups, the first thing that comes to mind is something that integrates a Google Map with other web data, like housing data. Zillow would be a classic example of this type of mashup. In fact, Programmable Web states that a full 39% of mashups on their site are related one way or another to mapping.

Wikipedia puts it this way:

In technology, a mashup is a web application that combines data from more than one source into a single integrated tool; an example is the use of cartographic data from Google Maps to add location information to real-estate data, thereby creating a new and distinct web service that was not originally provided by either source.

I suppose it is helpful to define mashups solely as web applications in order to create a nice clean line, but I'd argue that it does the genre a disservice, particularly in the realm of Enterprise Mashups. This is because there is a storied, if sordid, history of “mashups” that have existed in the long tail of the enterprise for many years.

At a base level, regardless of IT budget, people need solutions to their issues and are often crafty enough to figure out a way to get things done. These “mashups” often take the form of a duct-taped visual basic script that makes Access do some specialized app for the receivables department. Or maybe someone creates an, ahem, “untidy” Excel macro that goes way beyond anything Microsoft ever envisioned, but it does a perfect job of forecasting inventory for the sales folks. It always seems like there is at least one “guru” at the departmental level that knows just enough “programming” to be dangerous. Dion referred to these types of workers as Prosumers, or folks that have just a bit more technical sophistication than a standard consumer, but are not programmers.

In any event, their circa-1997 Access apps are often cursed by IT. Their franken-spreadsheets are the scourge of management concerned about security. But, in the end, they get the job done. And, they do it with $0 of IT investment. Their important role in the business shouldn't be taken lightly.

Now granted, these ad hoc apps don't currently take advantage of the data in the cloud, but it is this long-tail that has been active for years, mashing up data from different internal systems. It was the dependable (if low-mileage) four-door sedan compared to the efficient hybrid roadster that is currently on the production line.

It is in this realm that a data browser fits in very nicely as a long-tail mashup tool for the prosumer who needs to divine something from their data. Clearly a browser is not in the cloud, but being local does carry some benefits, such as:

Handling as much data as you throw at it, using the power and speed of the PC for processing and manipulation.
Securely mashing up local data, enterprise database data and web data (APIs, CSV bookmarks, RSS feeds, etc) and never needing to push the private business data to an external server.
Being extremely flexible and having an interface that is familiar to existing business users, similar to Access or Excel.
Offering extensibility, such that the long-tail prosumer folks can quickly knock out a JavaScript plug-in for an ad hoc app only needed at the departmental level.

There is a real beauty in the idea of mashups flourishing in the workplace. There is this certain intangible, ad hoc “thing” out there that every business person runs into at one time or another, which just can't be solved by a single over-arching IT project. This is why people still use spreadsheets for everything. And this is why it'll be fascinating to see how mashup tools will be applied by these ingenious long-tail workers to boost productivity and efficiency in the coming years.

P.S. As a quick aside, it is interesting to see the parallels between this discussion and the “last mile of business intelligence” that we talked about previously. Maybe they're just different sides of the same coin. Hmmm, this may require another blog post in the future…

Posted in business intelligence, extensions, mashups, spreadsheets | 1 Comment »

Moving Toward Business Intelligence 2.0

Tuesday, June 3rd, 2008

Elephant Crossing Sign I just read a pretty interesting article by Neil Raden called “Business Intelligence 2.0: Simpler, More Accessible, Inevitable” (HT: Snaplogic) and would recommend giving it a read.

Historically, business intelligence hasn't been all that its cracked up to be. Very expensive data warehousing systems are put in place. Existing reports are re-created and all kinds of new objects/reports are added. Everyone is thoroughly trained on the system. Pretty 3D graphics are added to the dashboard. The project goes over budget. Users revert to using Excel.

Some would say that BI is just a fancy way to do organizational reporting. There's a lot of truth to this; why else do people continue to rely on their spreadsheets when they need to do some quick and dirty analysis? I think the answer is that there is a substantial ad hoc component to the “intelligence” part of business intelligence that will never be captured by a large, centralized system.

Having a few BI gurus setting up reports for everyone just isn't an efficient use of resources. Nor does it capture the collective brain power of the organization. And there is quite a bit of this power ready to be tapped, even in the deepest corners of a company.

For example, we've done a lot of work with folks in the accounts payable industry. AP is not what you'd call a very sexy part of the organization — however, billions of dollars flow through it each year, as the keepers of the company checkbook. There are efficiencies to be gained, analyses to be done and, in our experience, a whole slew of people eager to do a bang-up job. However, when an AP manager needs to get something from the legacy system or just wants to create a new type of report they have one of two options — either go to IT and hope they can get a report created within the next couple weeks or go to mattresses with Excel/Access and do what they need to do themselves.

Neil echoes this when comparing BI 1.0 to BI 2.0:

BI 1.0 Fallacy: Most users want to be spoon-fed information and will never take the initiative to create their own environment or investigate the best way to get the answers they need.

BI 2.0 Reality: The Consumer Web invalidates this idea. When given simple tools to do something that is important and/or useful to them, people find a way to “mash up” what they need.

We've seen people's initiative on display time and again and are really happy that Kirix Strata is playing a part in making this type of ad hoc analysis, integration and reporting easier than ever.

So, give those articles a read and see what you think. Also, please consider joining us on Wednesday at 1 PM EDT for our free hour-long web seminar with Snaplogic called “BI 2.0: Data Visualization and Spreadsheets on Steroids.” All the pertinent details can be found here. Hope to see you then!

Posted in business intelligence, data analysis, spreadsheets | 2 Comments »

A Business Intelligence Browser

Wednesday, April 23rd, 2008

Image - Light Bulb During our journey from the MySQL Conference last week to the Web 2.0 Expo this week, we've continued to see a common pattern. There are a tremendous number of back-end business systems and almost all of them seem to rely on the humble Web browser for the front-end.

Now there is a very good reason for this — the Web is a tremendous distribution platform and companies are taking advantage of the same protocols and tools on their intranet that have already propelled the growth of the Internet. And, bottom line, browsers are really, really easy to use.

Data Integration 2.0

As an example of this situation, we met some folks last week from Snaplogic. Their product makes it really simple to integrate data from back-end systems and enables you to access data from anywhere, mash it up and publish it. However, when they were demoing the software, there wasn't particularly much to show off, since a normal browser can't do much with the data except let you “Save As” to your desktop.

Thankfully, this type of front-end work is perfectly suited for a data browser. So, we decided to demo each other's products at the Web 2.0 Expo this week. Snaplogic is able to easily show the value they bring to the table by actually displaying the data integration they can do. We put together a quick dashboard with Snaplogic data and were able to show off all the cool stuff you can do once you've got a data feed to work with. Indeed, this was like the serendipitous combination of chocolate and peanut butter.

The Last Mile

This reminded me of a great metaphor used in a post by Juice Analytics called “The Last Mile of Business Intelligence.” The idea is that businesses have built lots of large back end systems — from data integration and warehousing to BI and reporting systems. But, in the end, it still seems that actual users are left out in the cold. The “last mile” between the heavy machinery and the actual knowledge worker has not been adequately bridged.

So unfortunately, the typical pattern involves working with a back end system via a Web browser and then, ultimately, exporting a CSV file to work with in Excel.

This is not terribly efficient. A data browser enables you to open up those same intranet pages and perform your normal browsing tasks. However, because it's a data analysis tool at heart, you can actually open up data sets and start working with them directly. Then you can do fun things like bookmark tables, build calculations, create reports and write queries and have it refresh the next day with the latest data. Because of this, it plays very nicely in the worlds of of exploration and reporting per Juice's “data value framework.”

Why Incorporate the Web?

In the past we've shown off some of the fun things Kirix Strata™ does — like open up HTML tables natively or showing RSS feeds in a table view — but often we'll get a response like, “that's cool, but what do I do with it?” There is no question that tools like these can be very useful for some tasks (e.g., integrating data from the web with local tables), but they're really just icing on the cake.

The important thing is how tightly the web is integrated with the data analysis tool. This opens up all kinds of possibilities for direct interaction with both the Web infrastructure and the database systems that power business intelligence in the enterprise. Add some Web APIs into the mix, and now you could have access to the contact information in your ACT database, the corporate newsletter contacts in a MySQL system and all the recent sales information from your Salesforce.com account. Explore and integrate at will.

Given that the meme “Enterprise 2.0″ resounds louder each day, we should only expect to see Web infrastructure reaching deeper and deeper into organizations. In the end, maybe part of the BI “last mile” solution is just giving businesses a browser with a bit more intelligence.

Posted in business intelligence, data analysis, spreadsheets | 1 Comment »

Mr. MacGyver, Meet Kirix Strata

Tuesday, October 16th, 2007

(NOTE: Screencast of this exercise is available below.)

A few days ago, the always datariffic folks at Juice Analytics posted an article about MacGyver-ing call volume data and pushing it into an online mapping application called Mapeteria. Basically, they were doing some ad hoc data visualization comprised of public web data, private phone call data and a web service that provided the visualization (which in turn used the Google Maps API).

Huh… local data, web data and web APIs? Sounds like a perfect application for a data browser (well, it would've been perfect if the web service accepted a POST command, but I digress). A data browser enables you to easily access web data, combine it with local data, perform any required data clean up and then push/pull data from the web — without ever leaving the tool.

It also would've saved Juice a bit of time, particularly with grabbing area codes and prepping that file. Let's look at the four steps they went through and we'll see how Kirix Strata™ might improve the experience:

1. Pull out the area codes.

The data had phone number values like “12345678901″ as well as “2345678901″, so they used the following formula to pull out the area codes using Excel:

=VALUE(IF(LEFT(E7,1)="1",MID(E7,2,3),MID(E7,1,3)))

Strata would use a similar formula:

iif(left(tel,1)="1",substr(tel,2,3),substr(tel,1,3))

The main time savings here (particularly with large files) is that the calculated field populates automatically for every record in Strata, instead of needing to paste formulas. OK… not terribly exciting thus far.

2. Convert area codes into states

This is a multi-part step:

a) Locate a table from the web that has area code data associated with a state ID (while fending off parasitic scammers).
b) Clean up the table as necessary.
c) Do a lookup from the phone call data that adds in the state where the call originated from.

Strata can really cut down the amount of time spent on this step. Because of the website used, the folks at Juice surely had to create his lookup table manually. I went to Delicious, searched for “area codes” and found this very useful website, which had all the data in a nice HTML table. With Strata, I simply right-clicked and selected “Import Data” and immediately had the table I needed for the lookup.

Finally, I created a relationship between my two tables and dragged in the state codes (e.g., CA, IL, NY, etc.) into the phone call data.

3. Create a summary data set

This was done using a pivot table in Excel. Strata doesn't have classic pivot tables in its feature set at this point, but it does have a nice li'l grouping utility. So, once I knew what csv format was required for the Mapeteria web service, I grouped the data accordingly.

4. Create colorized map the of U.S.

This is the “almost perfect” part I referred to above.

Though Mapeteria is a very cool visualization service using Google Maps, it needs to fetch a CSV file embedded in a URL from elsewhere on the web. If the service was able to accept data via a POST command (or something like an “Upload Data” button), Strata would have been able to just take the table we created and push it to the web service, no csv transformation required (in fact, we've got some stuff cooking in our labs that would make this as easy as copy and paste). And, if we were just able to push the data out like this, we would have immediately gotten the map without ever leaving our data browser.

But, like Zach at Juice, I had to save the file in a CSV format and then upload it to a server before I was able to get my map. Here's a screencast of the entire process… once I found the area code data on the web, it took less than 5 minutes to get my map.

(And here's an embeddable YouTube version…)

If anyone wants to try this process out for themselves, please feel free to download Strata and give it a try. This data browser is in beta and completely free to use; we're also giving away free full licenses to anyone who provides feedback during the beta period. Oh, and here is the sample phone call volume data I used for this exercise:

Click here to download Phone Call Volume Sample Data (.csv, 10KB)

This is a pretty simple example of how Strata can be used for ad hoc data access and manipulation with data from the web (or, as one can imagine, within a corporate intranet) and make this kind of analysis very efficient. Throw in some web services, web APIs or very large files into the mix, and you've got the chance to do some fairly interesting things.

As always, if anyone has any questions, either post in the comments below on in our support forums… or just shoot us a support email. Thanks!

Posted in data visualization, examples, spreadsheets, videos | 3 Comments »

Spreadsheets, Ltd.

Wednesday, July 25th, 2007

strata grid A friend of mine uses Microsoft Excel quite a bit and recently asked me what Kirix Strata™ can do that Excel can't. This is a very reasonable question to ask.

In fact, as an avid spreadsheet user myself, Excel lets me do all kinds of great things with data like creating budgets or putting together various lists. I can use formulas to create instant calculations and change data on a whim to perform what-if scenarios. Excel even gives me a few “database” tools to use, like sorting and filtering.

However, the strength of a spreadsheet lies in its ability to handle unstructured data really well. When I create a budget, I'm happy to mingle a column heading, my data points and a sum/total in the same column — and Excel is delighted to let me do it (or, at least, so suggests Clippy). It is cell-based, so you can place data wherever you'd like without any concern.

The trouble comes when you start dealing with larger amounts of structured data. We've seen this issue a lot, particularly when working with corporate clients. Excel is the most familiar tool for ad hoc calculations, but when something comes up where a user is presented with 20,000 records (or millions), it gets a little more dicey. Often the only option is to start working with a desktop database like Access. Unfortunately, a desktop database can often be a bit too complex for someone who just wants to quickly use their data like they would with a spreadsheet.

This is where Strata can really help. At its core, it was built to solve the problem of data usability. Basically, we're trying to give people the ability to handle structured data really easily, wherever they may encounter it.

Strata will happily take the tens of thousands or tens of millions of records and let you create calculations instantly across the entire column. Or, just like Excel, you can sort or filter your data, but do so across the entire data set with a single click. Of course, there are plenty of more “database” things you can do too (relationships, queries, reports, scripting, etc.), but the key is being able to quickly and easily use the data however you wish.

A pretty classic business issue came up in a forum post today. In this situation, Greg was trying to identify duplicate inventory items in a 63,000 record file. He created a calculation to remove some “noise” from the data, then he grouped it together and found out which ones were duplicated. From there, he could take the results and remove the duplicated records from the original database to prevent future processing errors.

This process would have taken all of a couple minutes to perform. With a spreadsheet, however, this would have been much more cumbersome because of the file size (it would barely fit in most versions of Excel) and the need to copy a formula over 63,000 rows. I'm actually not sure if Excel could handle the grouping function in the same way.

Excel is a excellent tool for unstructured data, but just wasn't designed for the rigors of handling structured data. One of the many things Strata offers is an easy transition for folks needing to analyze larger amounts of structured data.

Do you have any data issues that seem to be pushing the scope of your spreadsheet? Let us know, we'd be happy to help.

Posted in data analysis, spreadsheets | Comments Off