2007 October | Data and the Web

Data and the Web

Archive for October, 2007

Mr. MacGyver, Meet Kirix Strata

Tuesday, October 16th, 2007

Map Visualization 2(NOTE: Screencast of this exercise is available below.)

A few days ago, the always datariffic folks at Juice Analytics posted an article about MacGyver-ing call volume data and pushing it into an online mapping application called Mapeteria. Basically, they were doing some ad hoc data visualization comprised of public web data, private phone call data and a web service that provided the visualization (which in turn used the Google Maps API).

Huh… local data, web data and web APIs? Sounds like a perfect application for a data browser (well, it would've been perfect if the web service accepted a POST command, but I digress). A data browser enables you to easily access web data, combine it with local data, perform any required data clean up and then push/pull data from the web — without ever leaving the tool.

It also would've saved Juice a bit of time, particularly with grabbing area codes and prepping that file. Let's look at the four steps they went through and we'll see how Kirix Strata™ might improve the experience:

1. Pull out the area codes.

The data had phone number values like “12345678901″ as well as “2345678901″, so they used the following formula to pull out the area codes using Excel:

=VALUE(IF(LEFT(E7,1)="1",MID(E7,2,3),MID(E7,1,3)))

Strata would use a similar formula:

iif(left(tel,1)="1",substr(tel,2,3),substr(tel,1,3))

The main time savings here (particularly with large files) is that the calculated field populates automatically for every record in Strata, instead of needing to paste formulas. OK… not terribly exciting thus far.

2. Convert area codes into states

This is a multi-part step:

a) Locate a table from the web that has area code data associated with a state ID (while fending off parasitic scammers).
b) Clean up the table as necessary.
c) Do a lookup from the phone call data that adds in the state where the call originated from.

Strata can really cut down the amount of time spent on this step. Because of the website used, the folks at Juice surely had to create his lookup table manually. I went to Delicious, searched for “area codes” and found this very useful website, which had all the data in a nice HTML table. With Strata, I simply right-clicked and selected “Import Data” and immediately had the table I needed for the lookup.

Finally, I created a relationship between my two tables and dragged in the state codes (e.g., CA, IL, NY, etc.) into the phone call data.

3. Create a summary data set

This was done using a pivot table in Excel. Strata doesn't have classic pivot tables in its feature set at this point, but it does have a nice li'l grouping utility. So, once I knew what csv format was required for the Mapeteria web service, I grouped the data accordingly.

4. Create colorized map the of U.S.

This is the “almost perfect” part I referred to above.

Though Mapeteria is a very cool visualization service using Google Maps, it needs to fetch a CSV file embedded in a URL from elsewhere on the web. If the service was able to accept data via a POST command (or something like an “Upload Data” button), Strata would have been able to just take the table we created and push it to the web service, no csv transformation required (in fact, we've got some stuff cooking in our labs that would make this as easy as copy and paste). And, if we were just able to push the data out like this, we would have immediately gotten the map without ever leaving our data browser.

But, like Zach at Juice, I had to save the file in a CSV format and then upload it to a server before I was able to get my map. Here's a screencast of the entire process… once I found the area code data on the web, it took less than 5 minutes to get my map.

Play Video

(And here's an embeddable YouTube version…)

If anyone wants to try this process out for themselves, please feel free to download Strata and give it a try. This data browser is in beta and completely free to use; we're also giving away free full licenses to anyone who provides feedback during the beta period. Oh, and here is the sample phone call volume data I used for this exercise:

Click here to download Phone Call Volume Sample Data (.csv, 10KB)

This is a pretty simple example of how Strata can be used for ad hoc data access and manipulation with data from the web (or, as one can imagine, within a corporate intranet) and make this kind of analysis very efficient. Throw in some web services, web APIs or very large files into the mix, and you've got the chance to do some fairly interesting things.

As always, if anyone has any questions, either post in the comments below on in our support forums… or just shoot us a support email. Thanks!

Playing Nice with Yahoo Pipes

Wednesday, October 10th, 2007

Yahoo Pipes LogoYahoo Pipes is a pretty slick tool that makes it easy to combine and mash up data sources from around the web and then output the data into formats like RSS and JSON. One of the really nice things is its interface, which lets non-programmers lurk and meddle in this otherwise fearsome domain.

Today I came across a post by tagaficionado Jon Udell who was looking for a way to combine multiple feeds (based on a single tag) into a single feed for consumption. Within an hour an a half, a person named engtech created a Yahoo Pipe called Tagosphere to solve the problem. Pop in the tag you want, hit Run and get your results. Very cool.

To digress for a moment, one of the pet projects I've had on my (long) to do list is to use Kirix Strata™ to create an application that alerts me when someone references “kirix” in a blog post, article, or elsewhere on the web. I currently do this by subscribing to feeds from Google News, Google Blog Search, Technorati, Bloglines, Topix, Digg, etc. This is fine, but a bit clunky due to the many duplicate entries. It also is not comprehensive.

So the other thing I want to do is bring in my website referrers from AWstats or Google Analytics (or our raw apache web logs). Lots of times we'll see people coming to our site from blogs, forum posts or websites that never get picked up by those above-referenced feeds. So then, I would just need to combine all the data, remove duplicates, timestamp it… and now I have a pretty comprehensive idea of where the latest buzz is coming from.

So, the Tagosphere Pipe mentioned above is a pretty good start. I can create a feed for “kirix” and get a combined set of data with the duplicates removed. However, because I want to sort and filter this dataset, I need to get it into Strata. I could just manually go to the Tagosphere page in Strata and click on the RSS feed to get my table. However, because I'm looking at actually using this Pipe for a future application, I decided it would be nice to show a how Strata can work directly with the Yahoo Pipe via a script:

1. In Strata, go to File > New > Script.

2. Copy the following text into the script tab:

var t = new TextEntryDialog;
t.setCaption("Pipes Search");
t.setMessage("Please enter search term:");
if (t.showDialog())
{
var s = "http://pipes.yahoo.com/pipes/pipe.run?_id=mFZPs1l33BGJYdGGn0artA&_render=rss&tag=";
s += t.getText();
HostApp.openWeb(s);
}

3. Save the Script then go to Tools > Run Script/Query

As you see, a dialog opens where you can enter your tag. Enter the tag, click OK and up pops the feed in a table format.

This example is obviously very simplistic. But, if I then take it to its logical conclusion and bring in my referrer data, remove duplicates and run it on a regular basis, I've got my own personal Pub Sub. Even better, I can stand on the shoulders of giants by using all the great stuff already written in Yahoo Pipes or Dapper or anything else that exports data as RSS or CSV.

We've got a ton of ideas that we plan on sharing with everyone, but have really been really focused on getting the Strata beta fully functional and stable. Stay tuned though, more fun stuff to come soon…

P.S. If anyone wants to play around with using Yahoo Pipes with Strata and needs any help at all, please just shoot us a support email or post something in the forums. Also, if you come up with a cool app, let us know, we'd be thrilled to hear about it. Thanks!

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.