Data and the Web

Archive for the ‘videos’ Category

A Wee Bit of Housekeeping…

Friday, July 17th, 2009

brooms2.pngWe haven’t been doing much regular blogging lately, but we’re hoping this will change in the coming weeks.

In the meantime, we’ve recently done some housekeeping on our website, so if you haven’t visited recently we’d encourage you to do so. We’ve updated many pages with new content, but here are two sections in particular that we’d steer you toward:

  • Examples Section.  This is a long overdue section that puts together some quick examples of how Kirix Strata™ can be applied to common data problems.  The section is still a work in progress with more videos still to be produced.  However, we expect what we have now will prove useful to new and old Strata users alike.  Check it out.
  • Video Tutorials and Archive.  We’ve done a bunch of different videos and screencasts over the past year or so, but they’ve been they’ve been posted all over our website.  This new section wrangles all of the videos together in one place for posterity.  The feature tutorials, in particular, are worth viewing as they help give a more comprehensive look at how to use specific features in Strata.  Take a look.

So, in a nod to the Matrix, where one cannot be told what it is, but one must see for oneself, we’ve tried to make some high quality video documentation available.  Stay tuned for more to come.  Enjoy!

Fun (and Fraud Detection) with Benford’s Law

Tuesday, July 22nd, 2008

Benford Law Graph - smallBenford’s law is one of those things your high school math teacher would break out on a slow, rainy day when the students’ attention span was even lower than usual.

He’d start out by asking the class to look at the leading digits in a list of numbers and then predict how many times each leading digit would appear first in the list.  The students would make some guesses and eventually come to the consensus that the probability would be pretty close — about 11% each.

Then, the teacher would just sit back, smile, and gently shake his head at his simple-minded pupils.  He would then go on to explain Benford’s law, which would blow everyone’s mind — at least through lunchtime.

Play Benford’s Law Video

(Click the image above… or here’s an embeddable YouTube version)

Per Wikipedia:

Benford’s law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is distributed in a specific, non-uniform way.

Specifically, in this way:

Leading Digit     Probability
      1              30.1%
      2              17.6%
      3              12.5%
      4               9.7%
      5               7.9%
      6               6.7%
      7               5.8%
      8               5.1%
      9               4.6%

Again, from Wikipedia:

This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).

Boiling it down, this means that for almost any naturally-occurring data set, the number 1 will appear first about 30% of the time.  And, by naturally occuring, this can mean check amounts or stock prices or website statistics.  Non-naturally occurring data would be pre-assigned numbers like postal codes or UPC numbers.

Besides being fun to play with, Benford’s is used in the accounting profession to detect fraud.  Because data like tax returns and check registers follow Benford’s, auditors can use it as a high-level check of a data set.  If there are anomalies, it may be worth investigating closer as potential fraud.

If you’re interested in further information about fraud detection using Benford’s, definitely give these two articles by Malcolm W. Browne and Mark J. Nigrini a read.

Try It Out for Yourself

Take a look at the demonstration video above to see Benford’s law in action with data sets from the web.  If you’d like to play with it yourself, just install the Benford’s Law extension for Kirix Strata™ and have fun.

Also, please note that I used the following data sets in the video, if you’d like to give those a spin:

Wikipedia List of Lakes in Minnesota
US Census Data Sets
Social Blade - Digg Statistics

And here are a few other worthy ones that didn’t make it in the video:

NASDAQ Historical Stock Price
Wikipedia List of Countries by Population
And plenty more at Delicious here…

Enjoy!

Predict the Future with Some Ad Hoc Time-series Forecasting

Wednesday, July 16th, 2008

Lokad LogoWe’re happy to announce that we’ve teamed up with the good folks from Lokad to create a Kirix Strata™ forecasting plug-in, which you can use with your own time-series data.

Lokad is a company that has created some slick forecasting software and, thankfully, offers it as a web service via their API (you can also upload data directly to their site).  Here’s a link where you can find lots of good information on their technology.  Bottom line, they offer some great business forecasting tools at a cost-effective price.  Their API was a piece of cake to work with and so we were able to quickly put a GUI on it and create the Strata Lokad forecasting extension.

Play Video

(And here’s an embeddable YouTube version…)

Obviously, there’s quite a bit of forecasting that goes on day to day within companies.  When you veer toward the largest companies, you’ll find departments dedicated to forecasting with automated processes built into their ERP systems.  With smaller companies, forecasting is likely performed by someone without the word “forecast” in their job title.  For instance, a warehouse manager may need to forecast inventory to make solid replenishment orders.  Proper forecasting prevents the costly mistake of either overbuying (spoilage, locked-up cost of capital) or underbuying (lost sales).

However, the sweet spot for the Strata Lokad extension is ad hoc forecasting; it’s for people who have various, changing data sets and need their forecasts on-the-fly.  Business consultants who provide forecasts for their clients would fall in this category.  In addition, this extension can benefit sales analysts who don’t have adequate forecasting from their OLAP systems or financial analysts interested in different cash flow forecasts.

The great thing about forecasting algorithms is that they apply to a wide range of circumstances.  So, if you’ve got some historical data to throw at a situation, you can get back some good results.

So, if you’ve got some time-series data and want to predict the future with it, give the Lokad forecasting extension a try.  The installation itself along with all the details can be found here.  If you’ve got questions about the plug-in, send ‘em our way.  And, if you’ve got any questions about Lokad, their technology or forecasting in general, please feel free to give them a shout — they’re quite knowledgeable and helpful.

P.S.  We’re pleased to note that this is the first extension we’ve made public that takes advantage of Strata’s web scripting capabilities that brings a web API to the privacy and comfort of your own desktop.  Got another web API you’d like to see work with Strata?  Let us know.

Beta No More: Say Hello to Kirix Strata…

Wednesday, April 16th, 2008

We’re pleased to announce that Kirix Strata™ is now officially out of beta! A lot of hard work and late nights of coding have been put into this li’l tool and we hope it makes people’s lives a little bit easier when it comes to data analysis.

Image - New Strata Logo

Thanks to everyone who has been involved in the beta process — we’ll get those free licenses rolled out to you within the week.

Also, you’ll notice a big ol’ website redesign too. We want to give a special shout out to Jeff, Benni, Peter and David for the bang up job they did with the design and implementation of the website.  And, here’s our new overview video, check it out:

Play Video

(And here’s an embeddable YouTube version…)

Now that Strata is out, we’re looking forward to adding a lot more content to the site, in terms of more data-centric blog posts, product applications, tips & tricks as well as some open-source projects that we’ve been working on in the background. Oh yes, we also have a load of extensions that we are planning to release too. So much to do, so little time. :)

Thanks again to everyone. Please download the full version of Strata and try it out for 30 days free. If you have any questions, feel free to post a message to the forums or shoot us an email.

Kirix Strata Beta 7: Quick Filter, Data Link Refresh and Report Writer

Monday, January 7th, 2008

(NOTE: See screencast video below for a quick look at some of the new features!)

Hope everyone had a lovely holiday season!

We’re happy to report that our developers provided lots of shiny new toys in our Strata stocking over this past month, including further work on Data Links, the inclusion of a “Quick Filter” mechanism and the introduction of our new report writer. Please feel free to download Strata Beta 7 and let us know what you think!

Here’s more information on what’s new in this latest version:

Data Links

The ability to bookmark data files is coming into its own. We’ve got things working pretty well on CSV and RSS files at the moment, with some more work still to do on HTML tables. Here’s a general synopsis:

  1. Open a CSV or RSS table from the web.
  2. Perform your own analysis, using calculated fields or marks.
  3. Save the data URL as a simple bookmark.
  4. Click the Refresh icon or open up the bookmark in the future. Your data (and your calculations) will refresh based upon the new or updated data on the server.

We’ve been finding this quite useful internally, particularly in relation to analyzing our web log data. Check out the screencast below for further info.

Report Writer

With Beta 7, we are also introducing our new report writer.

You can create your report in a design view (similar to a template) and then toggle to a layout view for a preview of what you’ll see when you print. As a bonus, the layout view enables you to manipulate and format your data directly, instead of being bound to a “print preview” mode.

Another cool thing is that, besides creating reports from data in your project, you can also create reports directly from external data, such as local CSVs or MySQL tables. (First go to File > Create Connection, then you can select it as your source data in the report writer). Check out the screencast below for a quick demo of the report writer in action.

Please note that there are a few known bugs with Report Writer in Beta 7. These include:

  • When using groups, the first group does not display properly.
  • The layout view can be extremely slow when using large files. Now that we’ve got some big features in, optimizations will soon follow.
  • Items in the Report Header in the design view do not display properly on the top of the page.

Other Enhancements

Here are some of the other improvements that have been implemented:

  • Quick filter allows tables to be filtered really easily (see screencast below for a quick demonstration).
  • Quick import for MS Access and Kirix Package file via the File > Open command instead of File > Import.
  • Support for CSV files with Unicode character sets.
  • CSV auto-sensing determines the field delimiter so lots of different delimited files are parsed and opened automatically (e.g. comma, tab, semi-colon, colon, pipe, tilde).
  • A bunch of scripting additions, including functions to access a database table list and table structure information. We’ve also added functions to encrypt/decrypt strings.
  • Automatic plugin detection (Strata now doesn’t need to reinstall programs like Flash plug-ins if you have already downloaded them for other browsers).
  • Streamlined extension installation and uninstallation.
  • A new “loading” icon that appears on tabs while web pages are being downloaded.

Please check out this screencast, which provides an overview of Data Links, Quick Filter and Report Writer:
Play Video

(And here’s an embeddable YouTube version…)

NOTE: For those interested, here is the Yahoo URL used in this screencast. Check out Gummy Stuff’s extremely useful Yahoo Stock Ticker CSV API site for further information.

Thanks for downloading it and giving it a spin. Please let us know if you run into any bugs or need help with anything!

Mr. MacGyver, Meet Kirix Strata

Tuesday, October 16th, 2007

Map Visualization 2(NOTE: Screencast of this exercise is available below.)

A few days ago, the always datariffic folks at Juice Analytics posted an article about MacGyver-ing call volume data and pushing it into an online mapping application called Mapeteria. Basically, they were doing some ad hoc data visualization comprised of public web data, private phone call data and a web service that provided the visualization (which in turn used the Google Maps API).

Huh… local data, web data and web APIs? Sounds like a perfect application for a data browser (well, it would’ve been perfect if the web service accepted a POST command, but I digress). A data browser enables you to easily access web data, combine it with local data, perform any required data clean up and then push/pull data from the web — without ever leaving the tool.

It also would’ve saved Juice a bit of time, particularly with grabbing area codes and prepping that file. Let’s look at the four steps they went through and we’ll see how Kirix Strata™ might improve the experience:

1. Pull out the area codes.

The data had phone number values like “12345678901″ as well as “2345678901″, so they used the following formula to pull out the area codes using Excel:

=VALUE(IF(LEFT(E7,1)="1",MID(E7,2,3),MID(E7,1,3)))

Strata would use a similar formula:

iif(left(tel,1)="1",substr(tel,2,3),substr(tel,1,3))

The main time savings here (particularly with large files) is that the calculated field populates automatically for every record in Strata, instead of needing to paste formulas. OK… not terribly exciting thus far.

2. Convert area codes into states

This is a multi-part step:

a) Locate a table from the web that has area code data associated with a state ID (while fending off parasitic scammers).
b) Clean up the table as necessary.
c) Do a lookup from the phone call data that adds in the state where the call originated from.

Strata can really cut down the amount of time spent on this step. Because of the website used, the folks at Juice surely had to create his lookup table manually. I went to Delicious, searched for “area codes” and found this very useful website, which had all the data in a nice HTML table. With Strata, I simply right-clicked and selected “Import Data” and immediately had the table I needed for the lookup.

Finally, I created a relationship between my two tables and dragged in the state codes (e.g., CA, IL, NY, etc.) into the phone call data.

3. Create a summary data set

This was done using a pivot table in Excel. Strata doesn’t have classic pivot tables in its feature set at this point, but it does have a nice li’l grouping utility. So, once I knew what csv format was required for the Mapeteria web service, I grouped the data accordingly.

4. Create colorized map the of U.S.

This is the “almost perfect” part I referred to above.

Though Mapeteria is a very cool visualization service using Google Maps, it needs to fetch a CSV file embedded in a URL from elsewhere on the web. If the service was able to accept data via a POST command (or something like an “Upload Data” button), Strata would have been able to just take the table we created and push it to the web service, no csv transformation required (in fact, we’ve got some stuff cooking in our labs that would make this as easy as copy and paste). And, if we were just able to push the data out like this, we would have immediately gotten the map without ever leaving our data browser.

But, like Zach at Juice, I had to save the file in a CSV format and then upload it to a server before I was able to get my map. Here’s a screencast of the entire process… once I found the area code data on the web, it took less than 5 minutes to get my map.

Play Video

(And here’s an embeddable YouTube version…)

If anyone wants to try this process out for themselves, please feel free to download Strata and give it a try. This data browser is in beta and completely free to use; we’re also giving away free full licenses to anyone who provides feedback during the beta period. Oh, and here is the sample phone call volume data I used for this exercise:

Click here to download Phone Call Volume Sample Data (.csv, 10KB)

This is a pretty simple example of how Strata can be used for ad hoc data access and manipulation with data from the web (or, as one can imagine, within a corporate intranet) and make this kind of analysis very efficient. Throw in some web services, web APIs or very large files into the mix, and you’ve got the chance to do some fairly interesting things.

As always, if anyone has any questions, either post in the comments below on in our support forums… or just shoot us a support email. Thanks!

Embedded phpBB Search Terms within Apache Web Logs

Friday, August 24th, 2007

This afternoon I was doing some analysis on our web logs and thought it may make for a good screencast and blog post. We currently use a combination of AWstats and Google Analytics for our web stats but are increasingly using Kirix Strata™ to dig deeper into the raw web logs for the more customized things that aren’t readily available otherwise.

Also, honestly, it is kind of fun to plow through almost a million records on your own. Hmmm, maybe I should get out more.

The topic of the screencast below are the search terms people enter to find things in our phpBB3 support forums. These terms are embedded in the “request” field of the apache logs and I couldn’t find a way to get them without digging into the logs themselves (NOTE: I wouldn’t doubt that there is some way to do this via a mod to phpBB or a filter in Google Analytics… but since I couldn’t find anything via a quick Google search, using Strata just ended up being a lot faster).

An example of a search string we’re dealing with is:

GET /forums/search.php?keywords=proxy HTTP/1.1

So the trick was to parse the search keywords out of the field and then group them together to see what people were searching for… and in turn give us the chance to improve our support area by targeting some of these search terms and expanding our documentation accordingly.

Hope this video proves helpful:

Play Video

(And here’s an embeddable YouTube version…)

TECHNICAL NOTE:

I downloaded the Apache logs from the server and, due to the file size, decided to import them into Strata rather than open the file and work with it directly. To import your logs, go to Import, select text-delimited files, and then import as space delimited with quotation marks as the text qualifier.  Update:  You can now use a handy little log parsing extension to pull in your web log files without having to mess around with a straight text import.

TECHNICAL NOTE 2:

For posterity, here are the functions that were used in this screencast:

STRPART(string, section [, delimiter])
SUBSTR(string, start [, length])
CONTAINS(string, search string)
IIF(boolean test, true value, false value)

The Birth of a Data Browser

Tuesday, July 17th, 2007

Strata LogoWell, it took a lot more blood, sweat and tears than we expected, but we’re really excited to announce our first public beta release of Kirix Strata™, the data browser.

And what, pray tell, is a “data browser”?

Well, Strata is a specialty browser that lets you access and manipulate data from pretty much anywhere on the web. For instance, Strata will let you grab HTML tables or RSS Feeds or even open up CSV files directly from a URL (wow, that’s a lot of acronyms).

Then when you’ve got the data in a table, you can do all sorts of ad hoc analysis. You can create calculations or sort and filter or create queries and reports — similar to the kinds of things you might do with a desktop database or a spreadsheet. In addition to web data, you can still work with data from your desktop or in a database system like Oracle or MySQL Enterprise.

And for those more technically-inclined, Strata also includes an implementation of ECMAScript — so anyone familiar with Javascript should feel right at home. The nice thing about the scripting is that it also includes bindings for SQL and HTTP — which can make for a lot of fun when connecting to Web APIs, creating “desktop mashups” or building extensions. And to boot, it runs on both Windows and Linux (at this moment, only Ubuntu is supported officially).

We also just want to give a quick shout out to the excellent folks at wxWidgets (we use their GUI library) and Mozilla (Strata incorporates the Gecko engine) — without which, Strata would only be a mere twinkle in our eye.

So, without further ado, check out the Kirix Strata introduction video:

Play Video

(And here’s an embeddable YouTube version…)

and then

Download and try out the data browser for yourself

We hope you enjoy it!

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.