Data and the Web: The Official Kirix Weblog - Part 5

Data and the Web

Kirix Strata Beta 7: Quick Filter, Data Link Refresh and Report Writer

January 7th, 2008

(NOTE: See screencast video below for a quick look at some of the new features!)

Hope everyone had a lovely holiday season!

We're happy to report that our developers provided lots of shiny new toys in our Strata stocking over this past month, including further work on Data Links, the inclusion of a “Quick Filter” mechanism and the introduction of our new report writer. Please feel free to download Strata Beta 7 and let us know what you think!

Here's more information on what's new in this latest version:

Data Links

The ability to bookmark data files is coming into its own. We've got things working pretty well on CSV and RSS files at the moment, with some more work still to do on HTML tables. Here's a general synopsis:

  1. Open a CSV or RSS table from the web.
  2. Perform your own analysis, using calculated fields or marks.
  3. Save the data URL as a simple bookmark.
  4. Click the Refresh icon or open up the bookmark in the future. Your data (and your calculations) will refresh based upon the new or updated data on the server.

We've been finding this quite useful internally, particularly in relation to analyzing our web log data. Check out the screencast below for further info.

Report Writer

With Beta 7, we are also introducing our new report writer.

You can create your report in a design view (similar to a template) and then toggle to a layout view for a preview of what you'll see when you print. As a bonus, the layout view enables you to manipulate and format your data directly, instead of being bound to a “print preview” mode.

Another cool thing is that, besides creating reports from data in your project, you can also create reports directly from external data, such as local CSVs or MySQL tables. (First go to File > Create Connection, then you can select it as your source data in the report writer). Check out the screencast below for a quick demo of the report writer in action.

Please note that there are a few known bugs with Report Writer in Beta 7. These include:

  • When using groups, the first group does not display properly.
  • The layout view can be extremely slow when using large files. Now that we've got some big features in, optimizations will soon follow.
  • Items in the Report Header in the design view do not display properly on the top of the page.

Other Enhancements

Here are some of the other improvements that have been implemented:

  • Quick filter allows tables to be filtered really easily (see screencast below for a quick demonstration).
  • Quick import for MS Access and Kirix Package file via the File > Open command instead of File > Import.
  • Support for CSV files with Unicode character sets.
  • CSV auto-sensing determines the field delimiter so lots of different delimited files are parsed and opened automatically (e.g. comma, tab, semi-colon, colon, pipe, tilde).
  • A bunch of scripting additions, including functions to access a database table list and table structure information. We've also added functions to encrypt/decrypt strings.
  • Automatic plugin detection (Strata now doesn't need to reinstall programs like Flash plug-ins if you have already downloaded them for other browsers).
  • Streamlined extension installation and uninstallation.
  • A new “loading” icon that appears on tabs while web pages are being downloaded.

Please check out this screencast, which provides an overview of Data Links, Quick Filter and Report Writer:
Play Video

(And here's an embeddable YouTube version…)

NOTE: For those interested, here is the Yahoo URL used in this screencast. Check out Gummy Stuff's extremely useful Yahoo Stock Ticker CSV API site for further information.

Thanks for downloading it and giving it a spin. Please let us know if you run into any bugs or need help with anything!

Kirix Strata Beta 6: Introducing “Data Bookmarks”

November 21st, 2007

We've got the new Strata beta 6 ready and available for download so you can play with the new features before the legendary tryptophan coma kicks in tomorrow (at least for our U.S. readers).

We've got a couple big features that we've added to this beta:

Data Bookmarks

Ever since we've seen web data transformed into nice, usable tables, we've longed for the ability to a) link to the data and b) refresh it.

Data Bookmarks

Today, we've taken a big step in this direction. You can now link directly to data tables on the web. The two formats we support in this version are RSS and CSV, but more will follow in the coming weeks.

So, for a quick example, copy the following URL and paste it into Strata's URL bar:

http://finance.google.com/finance/historical?q=NASDAQ:AAPL

Google Finance provides historical stock information, in this case for Apple. This website also provides a “Download to Spreadsheet” link that, if clicked in Strata, will open up the CSV file in a table. In previous beta versions, you couldn't link to these tables directly. However, in this version, you'll see that it is just a regular ol' URL waiting to be bookmarked:

http://finance.google.com/finance/historical?q=NASDAQ:AAPL&output=csv

And, with regular ol' URLs, you can do things like open them up tomorrow and view updated data. Or, you can share the links with friends and co-workers. You can also do things like create a calculated field that will automagically recalculate tomorrow when the data changes.

We've got these data links up and running for RSS feeds too, say:

http://www.digg.com/rss/index.xml

However, we don't have the saved calculated functionality in for RSS just yet. Also, right now the “refresh” button in the toolbar isn't hooked up to these links, but it will be soon.

Because URLs are such a simple concept that everyone understands, we envision that this type of thing could be quite useful — particularly in a corporate environment. Instead of creating a daily CSV/TXT extract that must constantly be downloaded and loaded into Excel/Access on the desktop, an analyst could just create a data link in Strata and click the bookmark each time they wish to access the data.

Got other ideas for how you might use them? Please let us know what you think.

Automatic Update Notices

We've added a feature that will provide a notification of a new beta version without you ever needing to leave the comforts of your own Strata. Hopefully this will more efficiently keep everyone up to date with the latest and greatest. This doesn't mean you should unsubscribe to this blog though. :)

Other Features/Fixes

  • The Bookmarks Toolbar continues to get enhanced. For instance, now you can drag links from the URL bar directly to the link bar, just like you'd normally do in Firefox or Internet Explorer.
  • Downloads from the web are now hooked up to the Job Manager.
  • If jobs are running, you can now stop them with the Stop button (or will cause the job manager to open if there are multiple jobs running).

We really appreciate all the comments and feedback. Please keep it up, it's fantastic!

Pill Bugs, Potato Bugs or Doodlebugs?

November 14th, 2007

Image - Pill BugIf you haven't checked out what the fine folks at Many Eyes are doing with “community” data visualization, it is well worth a peek. I took a look at one of their recent blog posts today regarding the new map visualizations they are offering. Very nice stuff indeed.

However, the thing that really caught my [many] eye[s] was the mention of a data set denoting “the regional slang for those odd little bugs that curl into balls.” This is definitely not something that keeps me up at night, but I've always wondered about the monikers of these benign little creatures. I grew up calling them “pill bugs.” However, probably due to some deep psychological desire to be accepted in elementary school, I eventually started referring to them as roly-polies, since that is what my friends called them.

I dug up the visualization in question and was pleased to see that I probably wasn't the only kid in Chicagoland that may have struggled with these entomological naming conventions — in fact, there are a bunch of other names given to these li'l guys across the US. For posterity, Illinoisans call this thing a Pill Bug, Roly-Poly, Potato Bug, Sow Bug and Doodlebug 15%, 41% 7%, 6%, and 3% percent of the time, respectively.

The one downside that I've encountered with Many Eyes is they only seem to provide the underlying data set in a .txt file (albeit in tab delimited format). Kirix Strata™ doesn't yet recognize the tsv-in-txt's-clothing just yet, so you need to save the file to the desktop and then open it up in Strata (selecting a text-delimited file type with a tab delimiter). But, once you get it in there, fire up your analytical skills and manipulate away.

Also, please report any bugs while you're at it. ;)

Edit: Per Wikipedia, there are a bunch of other common names associated with these insects, including my favorite, “cheeselog.”

Kirix Strata Beta 5 Now Available

November 6th, 2007

It's been a while since the last beta download was released, but the time has been well spent cleaning up bugs and creating some fairly time-intensive features. There are two big changes, both related to the GUI…

Kirix Strata Link Bar

All Your Links Are Belong To Us

We've been working hard on the concepts of “links” in Strata. In a normal web browser, a link is merely a pointer to a URL. In a data browser, you also want to be able to point to things like HTML tables, Oracle database tables, a CSV file on a network drive and, of course, traditional URLs. The holy grail, of course, is when you can also start “refreshing” these data links so that you always have the latest data to work with.

We don't have the refreshable links just yet, but we do have the very first incarnation of our Links/Bookmarks Toolbar. It's not perfect yet, but we'll be making it even better in the next week.

A few of the bigger known issues:

  • You cannot yet drag a link from the URL bar directly to the Link Toolbar. For now, to create a link, you need to use the “Star” icon next to the URL. However, note that you can drag items from your Project Tree into the Link Toolbar to create shortcuts to tables, queries, scripts, etc.
  • You can create a Folder on the Link Toolbar by right-clicking and selecting “New Folder.” You can then drag individual links into the Folder — but you need to drag it into the white “drop” area for this to work.
  • The right-click menus have been disabled within the folders, which means you can't delete links, move links or create subfolders.

We're glad to finally have this feature in the software. We hope that, despite the rough edges, you'll find them as useful as we do.

Virtual Mr. Potato Head

So you have this thing called a data browser that does both a lot of “data” stuff and a lot of “web” stuff. Unfortunately, when you have all this functionality that traverses these different genres, you have a lot of classic interface elements that are just askin' to be squeezed in. But, alas and alack, if you just combine everything, you end up with a large, flaming ball of complexity. A lot of very worthy interface candidates end up fighting for a limited supply of screen real estate.

But design limitations aren't necessarily a bad thing. In the end, we've tried to make some choices that provided the best bang for the buck. However, we're still tweaking things… so if you have some ideas for improvement, speak now or forever hold your peace.

Here are the main areas we changed around:

Navigation Toolbar

The top toolbar is now used primarily for navigation. Your standard browser controls are on the left, you can search with the find box on the right and you can switch your views by toggling the icon on the far right.

File/Links Toolbar Combo

The second toolbar is mainly used for links and bookmarks, as discussed above. However, we also added in the most common elements for working with files and tables, like you'd encounter in a spreadsheet. This includes the new/open/save icons along with buttons for standard data operations (sorting, create calculated fields, etc).

Status Bar

We've redesigned this to provide a bit more space so we could include easy access to the various “panels” in the software. This area makes it easy to toggle on/off the project tree, the column list, marks, etc.

We're still rounding things out a bit; here are a few of the known issues:

  • Please avoid turning toolbars on/off in the View menu or you'll get a very bad display bug which requires a bit of monkeying around to undo.
  • In the status bar, we've added a “format” panel; however, this is just a placeholder at the moment. The wiring hasn't been hooked up just yet.
  • In the status bar, the toggling doesn't always work correctly.
  • Some of the icons need to be re-worked for clarity.

So, please go ahead and download the new version and let us know what you think via our forums or feedback form. It's really, really appreciated. Thanks!

Finding Data Tables on the Web

November 3rd, 2007

GraphWise LogoI'm slightly (fashionably?) late to this party, but I just came across a new website called GraphWise that sets out to be the search engine for tabular data. In a recent press release, they state, “…if you want to search for videos, you go to YouTube, and if you want music, you go to iTunes. If you're looking for tables of data we aim for users to go to GraphWise.” The comparison may not be entirely accurate since YouTube and iTunes search only their own catalogs, but the vision has some potential if they can pull it off.

Currently, when I look for a data set on the web, I start with these standard tactics:

  1. Google Search, by keyword only
  2. Google Search, by keyword with file type qualifier (e.g., filetype:csv)
  3. Delicious Search, by keyword
  4. Delicious Search, by tag (e.g., publicdata)
  5. Data “Repository” Search, such as Swivel, Data360 or ManyEyes

GraphWise provides an additional option to find data. It apparently spiders data (from HTML tables, CSV files, licensed sources and user uploads), then imports and normalizes the data and, ultimately, develops graphs based on the data (similar to Swivel or Data360). I rarely have need for auto-generated visualizations, but I really like the fact that they provide the URL to the original source table. With Kirix Strata™, it's obviously a piece of cake to just import the raw table and start using it.

I did have some trouble finding useful data sets based on my search queries (forgivable, as the service is still in beta). For instance, in my previous blog post, we needed to find area code data in tabular format. So, I searched for US Area Codes in GraphWise, but got nothing even close to what I was looking for. For a simpler example, I search for Apple's stock price. It looks like GraphWise licenses historic stock information from a company called CSI, but only displayed the data in bite-sized chunks. I know I can easily download the full set of Apple's historical stock data via CSV at Yahoo Finance, but that wasn't listed as a resource.

It appears GraphWise has done well with the spidering technology to identify and capture table information across the web. The next big step will be to make the search queries more relevant. Because HTML and CSV files aren't often linked to directly, it would be really difficult to apply the kind of PageRank algorithm that makes Google so valuable. I can imagine some other issues as well, like trying to separate a table name (if available) and the actual text within a given table. Hopefully they'll be able to overcome these hurdles; it would be great to have a Google-like place to identify tabular data on the web.

(via Swivel)

Mr. MacGyver, Meet Kirix Strata

October 16th, 2007

Map Visualization 2(NOTE: Screencast of this exercise is available below.)

A few days ago, the always datariffic folks at Juice Analytics posted an article about MacGyver-ing call volume data and pushing it into an online mapping application called Mapeteria. Basically, they were doing some ad hoc data visualization comprised of public web data, private phone call data and a web service that provided the visualization (which in turn used the Google Maps API).

Huh… local data, web data and web APIs? Sounds like a perfect application for a data browser (well, it would've been perfect if the web service accepted a POST command, but I digress). A data browser enables you to easily access web data, combine it with local data, perform any required data clean up and then push/pull data from the web — without ever leaving the tool.

It also would've saved Juice a bit of time, particularly with grabbing area codes and prepping that file. Let's look at the four steps they went through and we'll see how Kirix Strata™ might improve the experience:

1. Pull out the area codes.

The data had phone number values like “12345678901″ as well as “2345678901″, so they used the following formula to pull out the area codes using Excel:

=VALUE(IF(LEFT(E7,1)="1",MID(E7,2,3),MID(E7,1,3)))

Strata would use a similar formula:

iif(left(tel,1)="1",substr(tel,2,3),substr(tel,1,3))

The main time savings here (particularly with large files) is that the calculated field populates automatically for every record in Strata, instead of needing to paste formulas. OK… not terribly exciting thus far.

2. Convert area codes into states

This is a multi-part step:

a) Locate a table from the web that has area code data associated with a state ID (while fending off parasitic scammers).
b) Clean up the table as necessary.
c) Do a lookup from the phone call data that adds in the state where the call originated from.

Strata can really cut down the amount of time spent on this step. Because of the website used, the folks at Juice surely had to create his lookup table manually. I went to Delicious, searched for “area codes” and found this very useful website, which had all the data in a nice HTML table. With Strata, I simply right-clicked and selected “Import Data” and immediately had the table I needed for the lookup.

Finally, I created a relationship between my two tables and dragged in the state codes (e.g., CA, IL, NY, etc.) into the phone call data.

3. Create a summary data set

This was done using a pivot table in Excel. Strata doesn't have classic pivot tables in its feature set at this point, but it does have a nice li'l grouping utility. So, once I knew what csv format was required for the Mapeteria web service, I grouped the data accordingly.

4. Create colorized map the of U.S.

This is the “almost perfect” part I referred to above.

Though Mapeteria is a very cool visualization service using Google Maps, it needs to fetch a CSV file embedded in a URL from elsewhere on the web. If the service was able to accept data via a POST command (or something like an “Upload Data” button), Strata would have been able to just take the table we created and push it to the web service, no csv transformation required (in fact, we've got some stuff cooking in our labs that would make this as easy as copy and paste). And, if we were just able to push the data out like this, we would have immediately gotten the map without ever leaving our data browser.

But, like Zach at Juice, I had to save the file in a CSV format and then upload it to a server before I was able to get my map. Here's a screencast of the entire process… once I found the area code data on the web, it took less than 5 minutes to get my map.

Play Video

(And here's an embeddable YouTube version…)

If anyone wants to try this process out for themselves, please feel free to download Strata and give it a try. This data browser is in beta and completely free to use; we're also giving away free full licenses to anyone who provides feedback during the beta period. Oh, and here is the sample phone call volume data I used for this exercise:

Click here to download Phone Call Volume Sample Data (.csv, 10KB)

This is a pretty simple example of how Strata can be used for ad hoc data access and manipulation with data from the web (or, as one can imagine, within a corporate intranet) and make this kind of analysis very efficient. Throw in some web services, web APIs or very large files into the mix, and you've got the chance to do some fairly interesting things.

As always, if anyone has any questions, either post in the comments below on in our support forums… or just shoot us a support email. Thanks!

Playing Nice with Yahoo Pipes

October 10th, 2007

Yahoo Pipes LogoYahoo Pipes is a pretty slick tool that makes it easy to combine and mash up data sources from around the web and then output the data into formats like RSS and JSON. One of the really nice things is its interface, which lets non-programmers lurk and meddle in this otherwise fearsome domain.

Today I came across a post by tagaficionado Jon Udell who was looking for a way to combine multiple feeds (based on a single tag) into a single feed for consumption. Within an hour an a half, a person named engtech created a Yahoo Pipe called Tagosphere to solve the problem. Pop in the tag you want, hit Run and get your results. Very cool.

To digress for a moment, one of the pet projects I've had on my (long) to do list is to use Kirix Strata™ to create an application that alerts me when someone references “kirix” in a blog post, article, or elsewhere on the web. I currently do this by subscribing to feeds from Google News, Google Blog Search, Technorati, Bloglines, Topix, Digg, etc. This is fine, but a bit clunky due to the many duplicate entries. It also is not comprehensive.

So the other thing I want to do is bring in my website referrers from AWstats or Google Analytics (or our raw apache web logs). Lots of times we'll see people coming to our site from blogs, forum posts or websites that never get picked up by those above-referenced feeds. So then, I would just need to combine all the data, remove duplicates, timestamp it… and now I have a pretty comprehensive idea of where the latest buzz is coming from.

So, the Tagosphere Pipe mentioned above is a pretty good start. I can create a feed for “kirix” and get a combined set of data with the duplicates removed. However, because I want to sort and filter this dataset, I need to get it into Strata. I could just manually go to the Tagosphere page in Strata and click on the RSS feed to get my table. However, because I'm looking at actually using this Pipe for a future application, I decided it would be nice to show a how Strata can work directly with the Yahoo Pipe via a script:

1. In Strata, go to File > New > Script.

2. Copy the following text into the script tab:

var t = new TextEntryDialog;
t.setCaption("Pipes Search");
t.setMessage("Please enter search term:");
if (t.showDialog())
{
var s = "http://pipes.yahoo.com/pipes/pipe.run?_id=mFZPs1l33BGJYdGGn0artA&_render=rss&tag=";
s += t.getText();
HostApp.openWeb(s);
}

3. Save the Script then go to Tools > Run Script/Query

As you see, a dialog opens where you can enter your tag. Enter the tag, click OK and up pops the feed in a table format.

This example is obviously very simplistic. But, if I then take it to its logical conclusion and bring in my referrer data, remove duplicates and run it on a regular basis, I've got my own personal Pub Sub. Even better, I can stand on the shoulders of giants by using all the great stuff already written in Yahoo Pipes or Dapper or anything else that exports data as RSS or CSV.

We've got a ton of ideas that we plan on sharing with everyone, but have really been really focused on getting the Strata beta fully functional and stable. Stay tuned though, more fun stuff to come soon…

P.S. If anyone wants to play around with using Yahoo Pipes with Strata and needs any help at all, please just shoot us a support email or post something in the forums. Also, if you come up with a cool app, let us know, we'd be thrilled to hear about it. Thanks!

Kirix Strata Beta 4 Now Available

September 24th, 2007

It's been a few weeks since our last beta release, but the time has been spent well. We've focused on fixing a wide-ranging number of annoying bugs; the new Beta 4 is available for download now.

Here are a sampling of features and bug fixes that have made their way into this latest version:

  • ATOM Support as suggested in this forum post
  • A bunch of RSS compatibility fixes
  • SQL compatibility updates
  • Scripting now includes a DOM API subset
  • Lots of keyboard navigation work
  • The structure editor view now allows field reordering
  • Strata can now handle websites with unsigned certificates
  • mailto: links handled properly as reported in this forum post
  • Tons of other bug fixes, including the notorious appmain shutdown zombie

Keep the bug reports and suggestions a-comin'. Thanks!

Kirix Strata Beta 3 Now Available

August 30th, 2007

Hello all, we're quite happy to announce that we've got our third beta version up and available for download. Here are the major features/bug fixes that made it into this version:

  • Vastly improved Oracle connectivity.
  • A bunch of MySQL connectivity bug fixes.
  • Import wizard refinements, including a better interface for choosing what type of file you wish to import.
  • Loads of other bug fixes based on user feedback.

Also, we've done some work in the area of scripting and extensions that should be highlighted:

  • Extensions can now be enabled/disabled from the extension manager.
  • With scripting, you are now able to load and execute function calls to DLLs and shared libraries — a special shout out to kapex01 for suggesting this feature, we think it adds a lot of really cool possibilities for using Strata as a platform for combining the best of desktop with the best of the web.
  • Also now with Strata's scripting, you can directly access window handles for forms and controls.

Thanks again to everyone for posting bugs and feature suggestions in the forums and via our bug feedback form, it is a tremendous help!

Embedded phpBB Search Terms within Apache Web Logs

August 24th, 2007

This afternoon I was doing some analysis on our web logs and thought it may make for a good screencast and blog post. We currently use a combination of AWstats and Google Analytics for our web stats but are increasingly using Kirix Strata™ to dig deeper into the raw web logs for the more customized things that aren't readily available otherwise.

Also, honestly, it is kind of fun to plow through almost a million records on your own. Hmmm, maybe I should get out more.

The topic of the screencast below are the search terms people enter to find things in our phpBB3 support forums. These terms are embedded in the “request” field of the apache logs and I couldn't find a way to get them without digging into the logs themselves (NOTE: I wouldn't doubt that there is some way to do this via a mod to phpBB or a filter in Google Analytics… but since I couldn't find anything via a quick Google search, using Strata just ended up being a lot faster).

An example of a search string we're dealing with is:

GET /forums/search.php?keywords=proxy HTTP/1.1

So the trick was to parse the search keywords out of the field and then group them together to see what people were searching for… and in turn give us the chance to improve our support area by targeting some of these search terms and expanding our documentation accordingly.

Hope this video proves helpful:

Play Video

(And here's an embeddable YouTube version…)

TECHNICAL NOTE:

I downloaded the Apache logs from the server and, due to the file size, decided to import them into Strata rather than open the file and work with it directly. To import your logs, go to Import, select text-delimited files, and then import as space delimited with quotation marks as the text qualifier. Update: You can now use a handy little log parsing extension to pull in your web log files without having to mess around with a straight text import.

TECHNICAL NOTE 2:

For posterity, here are the functions that were used in this screencast:

STRPART(string, section [, delimiter])
SUBSTR(string, start [, length])
CONTAINS(string, search string)
IIF(boolean test, true value, false value)

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.