Extensions | Kirix Strata Blog

Kirix Strata Blog

Archive for the ‘extensions' Category

Checking Date Ranges Prior to Analyzing New Data Sets

Thursday, June 26th, 2008

Date Range Extension ImageSeasoned data analysts know that one of the first things you need to do with a new, unfamiliar data set, is to run some basic tests to determine what kind of animal you're working with. This is particularly important when working with larger data sets that may be amalgamated from multiple systems or appended together from archived files.

One of these tests is a date range check. So, for example, if a client has shipped you all the data from the first 6 months of 2007, you want to make sure you actually have a full, complete 6 months of data to work with. In fact, you'd like to see something like this:

12/2006 - 43 records
01/2007 - 255 records
02/2007 - 249 records
03/2007 - 265 records
04/2007 - 287 records
05/2007 - 259 records
06/2007 - 263 records
07/2007 - 53 records

The outlying dates on the end (12/2006 and 7/2007) do provide some comfort that the data set is truly complete. However, it is surprising how often you'll actually see something like this:

01/1999 - 196 records
12/2006 - 43 records
01/2007 - 255 records
02/2007 - 249 records
03/2007 - 96 records
04/2007 - 287 records
05/2007 - 259 records
06/2007 - 263 records
07/2007 - 53 records

This second example is a dirtier data set; there is a strange, high-count outlier from 1999 and we also see that there was a significant drop in the record count during March 2007.

Before you actually start performing your analysis, you'd want to investigate the items from 1999, which could just be empty records that can be ignored or, worse, could be something wrong with the formatting of these records. The precipitous drop in March 2007 is a little more worrisome. Was it because sales dipped drastically that month or was it because there was an error when the IT department appended this data set together?

Whatever the cause, it's better to get your data in order and make sure you have a complete set before jumping into your analysis and providing that client with incorrect or skewed results. In order to help you to do this, we've created a simple date range analysis extension. Running this utility on a new data set from the get-go can save you a lot of time and hassle later on.

You can install the date range analysis extension and learn how to use it here. Got some other data utilities you'd like in your toolkit? Let us know.

Researching Problems in your Apache Web Log Activity

Monday, June 23rd, 2008

So I came into work the other day and the first thing one of our web admins says to me is, “Were we Slashdotted yesterday?” I had just been reviewing our web activity and didn't think that was the case. However, I did a quick check on our Google Analytics account and, as expected, nothing was out of the ordinary.

The reason he asked the question was that our Apache log file that day was over 10 times the size of the file from the previous day. It sure looked like the server was getting hammered.

So, I decided to take a look and see what the problem was. I pulled down the Apache log and imported it into Strata. See the video below for a step-by-step look:

Play Video

(And here's an embeddable YouTube version…)

Now, as an aside, if you've ever tried to look at a raw Apache log in Excel or notepad, you'll see that it is space-delimited and the date/time format is not trivial to deal with. Not only that, but the sheer size of a log file makes them almost impossible to handle in a spreadsheet. The one I was dealing with was over 100,000 records long — and that was just one day.

Strata can easily handle the data size, but the format is enough to give any software fits. So, we wrote a quick Apache log parser extension that makes it really simple to just point the software to your Apache log and import it. The resulting table is nicely formatted and everything is ready to go (including those pesky date fields). You can get the extension here.

So, back to the issue at hand… after I imported it, I played around with the data to identify what was causing the problem. I grouped the IP addresses together to see if I could pinpoint a few culprits. And, indeed, I found two:

  • An unknown bot
  • Our own server

After a little more research, I found out that the bot was searching for all kinds of non-existent URLs and was basically appending one path to another to get some really bizarre URLs:

/labs/wxaui/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js

I then took a look at the records from our own server and saw that for each of these non-existent URLs, we were serving up a “Not Found” page, thus doubling the trouble this bot was causing.

In the end, I had our web admin look into the problem. It turns out we were poorly formatting some of the URL paths on the site. Most bots can handle both absolute and relative paths, but some can't. These bots that can't handle the relative paths end up going a little nuts as they spider the website. (I couldn't find a really nice, clean explanation of this issue via Google, but this thread is close enough for those who are interested.)

Anyway, it was nice to be able to just pull out Kirix Strata and, within a few minutes, figure out what the issue was. For those of you who are interested in your web logs, give the Apache Web Log import extension a spin and let us know what you think.

Hello, World Wide Web

Tuesday, June 17th, 2008

Welcome to the first post on the new Strata blog. This is the place where we'll be sharing the latest news about Strata, as well as tips and tricks, case studies, highlights of interesting extensions and various examples.

To kick things off, we're happy to let everyone know that we've just created a new Extensions section. The Strata Extensions section contains an Extension Library, where we'll be adding our own creations as well as applications developed by the community. It also contains some help for developers, including an Extension Wizard, which creates extension packaging and sample scripts, and a Developer Resources section, which provides useful information about developing scripts/extensions in Strata.

For our first extension, we thought it would be fitting to have Strata politely introduce itself with the classic phrase, “Hello, World”. Rather than just display the text “Hello, World”, though, we've added bit of international flavor and web connectivity — you can search Google, Yahoo, and Wikipedia for this phrase in several languages. It's called “Hello, World Wide Web.”

Hello, World Wide Web

As a sample application, “Hello, World Wide Web” provides a basic example of some of the hybrid web/desktop options available with Strata's interface controls and highlights how you can embed a browser control in a form with just a few lines of code.

To see this, you'll just need to take a peek at the code:

  1. Download the Hello World Wide Web extension from the Extension Library.
  2. Convert it to a ZIP file by changing the file name from “hello_world_wide_web.kxt” to “hello_world_wide_web.zip”.
  3. Extract the contents of the ZIP file to a new folder.
  4. In Strata, select Create Connection from the File menu and click the Browse button to find and select this folder and it will appear as a connected folder in the Project Panel.
  5. Expand the folder in the Project Panel and double-click the “hello_world_wide_web_form.js” file.

On line 54 is the command that creates the browser control, and on line 73 is the spot where this control gets added to the form. Of course, if you're not familiar with JavaScript already, this might look a bit cryptic (they call it “code” for a reason). But overall, it's kind of nice to be able to include a web browser in custom application with just two lines.

If you would like to explore the code for this extension a bit more, you can get variations of this extension as well as other individual script components from the Extension Wizard. This wizard gives you a quick way to grab smaller chunks of code to play around with or generate simple templates for different functions that you can modify or build upon.

We hope you have fun playing around with “Hello, World Wide Web”. If you would like to improve on it and share the results with us, please do so. We'd love to hear from you. You can submit new extensions or any improvements via our extension submission form.

As for future versions of “Hello, World Wide Web”, we certainly would welcome having more language options. Heck, we'd even take languages that don't exist in Wikipedia today — how does one say “Hello, World” in Klingon?