2007 November | Data and the Web

Data and the Web

Archive for November, 2007

Kirix Strata Beta 6: Introducing “Data Bookmarks”

Wednesday, November 21st, 2007

We've got the new Strata beta 6 ready and available for download so you can play with the new features before the legendary tryptophan coma kicks in tomorrow (at least for our U.S. readers).

We've got a couple big features that we've added to this beta:

Data Bookmarks

Ever since we've seen web data transformed into nice, usable tables, we've longed for the ability to a) link to the data and b) refresh it.

Data Bookmarks

Today, we've taken a big step in this direction. You can now link directly to data tables on the web. The two formats we support in this version are RSS and CSV, but more will follow in the coming weeks.

So, for a quick example, copy the following URL and paste it into Strata's URL bar:

http://finance.google.com/finance/historical?q=NASDAQ:AAPL

Google Finance provides historical stock information, in this case for Apple. This website also provides a “Download to Spreadsheet” link that, if clicked in Strata, will open up the CSV file in a table. In previous beta versions, you couldn't link to these tables directly. However, in this version, you'll see that it is just a regular ol' URL waiting to be bookmarked:

http://finance.google.com/finance/historical?q=NASDAQ:AAPL&output=csv

And, with regular ol' URLs, you can do things like open them up tomorrow and view updated data. Or, you can share the links with friends and co-workers. You can also do things like create a calculated field that will automagically recalculate tomorrow when the data changes.

We've got these data links up and running for RSS feeds too, say:

http://www.digg.com/rss/index.xml

However, we don't have the saved calculated functionality in for RSS just yet. Also, right now the “refresh” button in the toolbar isn't hooked up to these links, but it will be soon.

Because URLs are such a simple concept that everyone understands, we envision that this type of thing could be quite useful — particularly in a corporate environment. Instead of creating a daily CSV/TXT extract that must constantly be downloaded and loaded into Excel/Access on the desktop, an analyst could just create a data link in Strata and click the bookmark each time they wish to access the data.

Got other ideas for how you might use them? Please let us know what you think.

Automatic Update Notices

We've added a feature that will provide a notification of a new beta version without you ever needing to leave the comforts of your own Strata. Hopefully this will more efficiently keep everyone up to date with the latest and greatest. This doesn't mean you should unsubscribe to this blog though. :)

Other Features/Fixes

  • The Bookmarks Toolbar continues to get enhanced. For instance, now you can drag links from the URL bar directly to the link bar, just like you'd normally do in Firefox or Internet Explorer.
  • Downloads from the web are now hooked up to the Job Manager.
  • If jobs are running, you can now stop them with the Stop button (or will cause the job manager to open if there are multiple jobs running).

We really appreciate all the comments and feedback. Please keep it up, it's fantastic!

Pill Bugs, Potato Bugs or Doodlebugs?

Wednesday, November 14th, 2007

Image - Pill BugIf you haven't checked out what the fine folks at Many Eyes are doing with “community” data visualization, it is well worth a peek. I took a look at one of their recent blog posts today regarding the new map visualizations they are offering. Very nice stuff indeed.

However, the thing that really caught my [many] eye[s] was the mention of a data set denoting “the regional slang for those odd little bugs that curl into balls.” This is definitely not something that keeps me up at night, but I've always wondered about the monikers of these benign little creatures. I grew up calling them “pill bugs.” However, probably due to some deep psychological desire to be accepted in elementary school, I eventually started referring to them as roly-polies, since that is what my friends called them.

I dug up the visualization in question and was pleased to see that I probably wasn't the only kid in Chicagoland that may have struggled with these entomological naming conventions — in fact, there are a bunch of other names given to these li'l guys across the US. For posterity, Illinoisans call this thing a Pill Bug, Roly-Poly, Potato Bug, Sow Bug and Doodlebug 15%, 41% 7%, 6%, and 3% percent of the time, respectively.

The one downside that I've encountered with Many Eyes is they only seem to provide the underlying data set in a .txt file (albeit in tab delimited format). Kirix Strata™ doesn't yet recognize the tsv-in-txt's-clothing just yet, so you need to save the file to the desktop and then open it up in Strata (selecting a text-delimited file type with a tab delimiter). But, once you get it in there, fire up your analytical skills and manipulate away.

Also, please report any bugs while you're at it. ;)

Edit: Per Wikipedia, there are a bunch of other common names associated with these insects, including my favorite, “cheeselog.”

Kirix Strata Beta 5 Now Available

Tuesday, November 6th, 2007

It's been a while since the last beta download was released, but the time has been well spent cleaning up bugs and creating some fairly time-intensive features. There are two big changes, both related to the GUI…

Kirix Strata Link Bar

All Your Links Are Belong To Us

We've been working hard on the concepts of “links” in Strata. In a normal web browser, a link is merely a pointer to a URL. In a data browser, you also want to be able to point to things like HTML tables, Oracle database tables, a CSV file on a network drive and, of course, traditional URLs. The holy grail, of course, is when you can also start “refreshing” these data links so that you always have the latest data to work with.

We don't have the refreshable links just yet, but we do have the very first incarnation of our Links/Bookmarks Toolbar. It's not perfect yet, but we'll be making it even better in the next week.

A few of the bigger known issues:

  • You cannot yet drag a link from the URL bar directly to the Link Toolbar. For now, to create a link, you need to use the “Star” icon next to the URL. However, note that you can drag items from your Project Tree into the Link Toolbar to create shortcuts to tables, queries, scripts, etc.
  • You can create a Folder on the Link Toolbar by right-clicking and selecting “New Folder.” You can then drag individual links into the Folder — but you need to drag it into the white “drop” area for this to work.
  • The right-click menus have been disabled within the folders, which means you can't delete links, move links or create subfolders.

We're glad to finally have this feature in the software. We hope that, despite the rough edges, you'll find them as useful as we do.

Virtual Mr. Potato Head

So you have this thing called a data browser that does both a lot of “data” stuff and a lot of “web” stuff. Unfortunately, when you have all this functionality that traverses these different genres, you have a lot of classic interface elements that are just askin' to be squeezed in. But, alas and alack, if you just combine everything, you end up with a large, flaming ball of complexity. A lot of very worthy interface candidates end up fighting for a limited supply of screen real estate.

But design limitations aren't necessarily a bad thing. In the end, we've tried to make some choices that provided the best bang for the buck. However, we're still tweaking things… so if you have some ideas for improvement, speak now or forever hold your peace.

Here are the main areas we changed around:

Navigation Toolbar

The top toolbar is now used primarily for navigation. Your standard browser controls are on the left, you can search with the find box on the right and you can switch your views by toggling the icon on the far right.

File/Links Toolbar Combo

The second toolbar is mainly used for links and bookmarks, as discussed above. However, we also added in the most common elements for working with files and tables, like you'd encounter in a spreadsheet. This includes the new/open/save icons along with buttons for standard data operations (sorting, create calculated fields, etc).

Status Bar

We've redesigned this to provide a bit more space so we could include easy access to the various “panels” in the software. This area makes it easy to toggle on/off the project tree, the column list, marks, etc.

We're still rounding things out a bit; here are a few of the known issues:

  • Please avoid turning toolbars on/off in the View menu or you'll get a very bad display bug which requires a bit of monkeying around to undo.
  • In the status bar, we've added a “format” panel; however, this is just a placeholder at the moment. The wiring hasn't been hooked up just yet.
  • In the status bar, the toggling doesn't always work correctly.
  • Some of the icons need to be re-worked for clarity.

So, please go ahead and download the new version and let us know what you think via our forums or feedback form. It's really, really appreciated. Thanks!

Finding Data Tables on the Web

Saturday, November 3rd, 2007

GraphWise LogoI'm slightly (fashionably?) late to this party, but I just came across a new website called GraphWise that sets out to be the search engine for tabular data. In a recent press release, they state, “…if you want to search for videos, you go to YouTube, and if you want music, you go to iTunes. If you're looking for tables of data we aim for users to go to GraphWise.” The comparison may not be entirely accurate since YouTube and iTunes search only their own catalogs, but the vision has some potential if they can pull it off.

Currently, when I look for a data set on the web, I start with these standard tactics:

  1. Google Search, by keyword only
  2. Google Search, by keyword with file type qualifier (e.g., filetype:csv)
  3. Delicious Search, by keyword
  4. Delicious Search, by tag (e.g., publicdata)
  5. Data “Repository” Search, such as Swivel, Data360 or ManyEyes

GraphWise provides an additional option to find data. It apparently spiders data (from HTML tables, CSV files, licensed sources and user uploads), then imports and normalizes the data and, ultimately, develops graphs based on the data (similar to Swivel or Data360). I rarely have need for auto-generated visualizations, but I really like the fact that they provide the URL to the original source table. With Kirix Strata™, it's obviously a piece of cake to just import the raw table and start using it.

I did have some trouble finding useful data sets based on my search queries (forgivable, as the service is still in beta). For instance, in my previous blog post, we needed to find area code data in tabular format. So, I searched for US Area Codes in GraphWise, but got nothing even close to what I was looking for. For a simpler example, I search for Apple's stock price. It looks like GraphWise licenses historic stock information from a company called CSI, but only displayed the data in bite-sized chunks. I know I can easily download the full set of Apple's historical stock data via CSV at Yahoo Finance, but that wasn't listed as a resource.

It appears GraphWise has done well with the spidering technology to identify and capture table information across the web. The next big step will be to make the search queries more relevant. Because HTML and CSV files aren't often linked to directly, it would be really difficult to apply the kind of PageRank algorithm that makes Google so valuable. I can imagine some other issues as well, like trying to separate a table name (if available) and the actual text within a given table. Hopefully they'll be able to overcome these hurdles; it would be great to have a Google-like place to identify tabular data on the web.

(via Swivel)

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.