Kirix Strata Blog: Tips & Tricks, Case Studies, Highlights and Examples - Part 2

Kirix Strata Blog

Using Relationships to Compare Lists of Emails

July 31st, 2008

In a previous article, we were concerned with identifying and removing duplicate emails within a single table. But let's say that email addresses are located in multiple tables — how do we find which email addresses are duplicated in both lists?

To answer this question, we have to break out one of Strata's more powerful features called relationships.

In Strata, relationships allow you to match records in one table with records in another table based on a common value. In this example, the common value is a specific email address found in both tables. For instance, let's suppose we have two lists of email addresses stored in two tables, email_list1 and email_list2.

In email_list1, we have the following values:

email_list1
--------------------
second@email.com
third@email.com
fourth@email.com
fifth@email.com
sixth@email.com
seventh@email.com

Read the rest of this entry »

Updating and Replacing Values in Cells

July 18th, 2008

Strata Tips and TricksWe had a really good question come up in our forums regarding Strata's ability to replace values within cells.

Bottom line, if you have a value in one field and want to auto-replace it with a different value, you can easily do this using the Update Records tool (Data > Update Records…). In a nutshell, it works like this:

  1. Choose the field you want to update.
  2. Choose the new value you want to update with.
  3. Create the logic to let Strata know when to update a cell and when not to update a cell.

Both #2 and #3 enable you to get pretty complex, since you can use formulas in both of these areas. The “update with” (#2) area can be particularly tricky, since it gives you the ability to transform values on the fly.

However, it had been a while since I'd personally gone through the various options one can use to replace records, but as I did, I quickly realized our documentation wasn't comprehensive enough.

So, I added the “replace” rules to the post here.

Hope this helps. Now, if I only could figure out a way to replace Friday with Saturday… :) Hope everyone has a good weekend!

P.S. Got a question about Strata that you've been curious about? Please post a note on the support forums and we'll be happy to help!

Watching Reruns: Strata Tutorial Videos from the Archives

July 8th, 2008

Movie IconIt's been almost one year since we released the beta version of Kirix Strata to the public. During that beta cycle, we provided several videos and screencasts via our blog to emphasize different things the software could do.

Thankfully, even though the videos show the beta version in action, almost all of the content is extremely relevant for the final version of Strata as well. The only variance really has to do with the user interface; we ended up moving around icons and toolbars and menu items quite a bit until we got something that seemed to work best. Oh, and you may see the original Strata logo that we threw together for the beta.

So, maybe you can consider this blog post your Tivo or on-demand video page for “Season 1″ of Kirix Strata. Here are the five links, with details and highlights of each one below:

Read the rest of this entry »

Checking Date Ranges Prior to Analyzing New Data Sets

June 26th, 2008

Date Range Extension ImageSeasoned data analysts know that one of the first things you need to do with a new, unfamiliar data set, is to run some basic tests to determine what kind of animal you're working with. This is particularly important when working with larger data sets that may be amalgamated from multiple systems or appended together from archived files.

One of these tests is a date range check. So, for example, if a client has shipped you all the data from the first 6 months of 2007, you want to make sure you actually have a full, complete 6 months of data to work with. In fact, you'd like to see something like this:

12/2006 - 43 records
01/2007 - 255 records
02/2007 - 249 records
03/2007 - 265 records
04/2007 - 287 records
05/2007 - 259 records
06/2007 - 263 records
07/2007 - 53 records

The outlying dates on the end (12/2006 and 7/2007) do provide some comfort that the data set is truly complete. However, it is surprising how often you'll actually see something like this:

01/1999 - 196 records
12/2006 - 43 records
01/2007 - 255 records
02/2007 - 249 records
03/2007 - 96 records
04/2007 - 287 records
05/2007 - 259 records
06/2007 - 263 records
07/2007 - 53 records

This second example is a dirtier data set; there is a strange, high-count outlier from 1999 and we also see that there was a significant drop in the record count during March 2007.

Before you actually start performing your analysis, you'd want to investigate the items from 1999, which could just be empty records that can be ignored or, worse, could be something wrong with the formatting of these records. The precipitous drop in March 2007 is a little more worrisome. Was it because sales dipped drastically that month or was it because there was an error when the IT department appended this data set together?

Whatever the cause, it's better to get your data in order and make sure you have a complete set before jumping into your analysis and providing that client with incorrect or skewed results. In order to help you to do this, we've created a simple date range analysis extension. Running this utility on a new data set from the get-go can save you a lot of time and hassle later on.

You can install the date range analysis extension and learn how to use it here. Got some other data utilities you'd like in your toolkit? Let us know.

Researching Problems in your Apache Web Log Activity

June 23rd, 2008

So I came into work the other day and the first thing one of our web admins says to me is, “Were we Slashdotted yesterday?” I had just been reviewing our web activity and didn't think that was the case. However, I did a quick check on our Google Analytics account and, as expected, nothing was out of the ordinary.

The reason he asked the question was that our Apache log file that day was over 10 times the size of the file from the previous day. It sure looked like the server was getting hammered.

So, I decided to take a look and see what the problem was. I pulled down the Apache log and imported it into Strata. See the video below for a step-by-step look:

Play Video

(And here's an embeddable YouTube version…)

Now, as an aside, if you've ever tried to look at a raw Apache log in Excel or notepad, you'll see that it is space-delimited and the date/time format is not trivial to deal with. Not only that, but the sheer size of a log file makes them almost impossible to handle in a spreadsheet. The one I was dealing with was over 100,000 records long — and that was just one day.

Strata can easily handle the data size, but the format is enough to give any software fits. So, we wrote a quick Apache log parser extension that makes it really simple to just point the software to your Apache log and import it. The resulting table is nicely formatted and everything is ready to go (including those pesky date fields). You can get the extension here.

So, back to the issue at hand… after I imported it, I played around with the data to identify what was causing the problem. I grouped the IP addresses together to see if I could pinpoint a few culprits. And, indeed, I found two:

  • An unknown bot
  • Our own server

After a little more research, I found out that the bot was searching for all kinds of non-existent URLs and was basically appending one path to another to get some really bizarre URLs:

/labs/wxaui/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js

I then took a look at the records from our own server and saw that for each of these non-existent URLs, we were serving up a “Not Found” page, thus doubling the trouble this bot was causing.

In the end, I had our web admin look into the problem. It turns out we were poorly formatting some of the URL paths on the site. Most bots can handle both absolute and relative paths, but some can't. These bots that can't handle the relative paths end up going a little nuts as they spider the website. (I couldn't find a really nice, clean explanation of this issue via Google, but this thread is close enough for those who are interested.)

Anyway, it was nice to be able to just pull out Kirix Strata and, within a few minutes, figure out what the issue was. For those of you who are interested in your web logs, give the Apache Web Log import extension a spin and let us know what you think.

Removing or Consolidating Duplicate Email Addresses from Website Form Data

June 19th, 2008

If you have text entry forms on your website, you probably have data with duplicate email addresses. These duplicates can either be from data entered twice into the same form or are duplicates from merging data from multiple applications/forms.

So, two questions arise:

  1. How do you identify these duplicates?
  2. How do you either remove them or group them together to track the related information?

Using Kirix Strata's grouping functionality, it's actually pretty easy. You can quickly identify duplicates from your website data and then either remove the duplicates or consolidate the different records into groups of related records. Let's look at the problem more closely.

Suppose you have a web page that asks your visitors for the following feedback information:

Dupformdata1

Visitors will enter their information, including their email address, which allows you to respond to them. However, if the visitor stops by again in the future, you'll have multiple records from the same person and therefore duplicated email addresses:

Screenshot - Dupform2

Read the rest of this entry »

Concatenate Columns Into a Single Field

June 18th, 2008

Strata Tips and TricksWe're just warming up the Strata blog right now, but we going to be adding various tips and tricks on how to use the software more effectively. One aspect of this will be to discuss general support questions we receive that have relevance to many users.

So, along those lines, we received a question the other day about concatenating fields together into a single string. Excel has a CONCATENATE function that will let you take two non-numeric columns and place them together. Here's an example from the Excel help manual:

CONCATENATE("Total ", "Value") equals "Total Value"

Strata makes this even easier, since no function is required to concatenate or join strings together. You simply need to add them together:

"Total " + "Value"  equals "Total Value"

So, say you had a table with a first name field (”firstname”) and a last name field (”lastname”) and wanted to put these together. You would insert a new calculated field and enter the following formula:

firstname + lastname

So, if a record had “John” in the firstname field and “Smith” in the lastname field, you would get a result of:

JohnSmith

The spacing is obviously problematic here, so we just need to add in a space for formatting purposes:

firstname + " " + lastname

which would result in:

John Smith

Concatenate Fields

So, to sum up, no concatenate function is necessary when using Strata… just add your strings together.

Hello, World Wide Web

June 17th, 2008

Welcome to the first post on the new Strata blog. This is the place where we'll be sharing the latest news about Strata, as well as tips and tricks, case studies, highlights of interesting extensions and various examples.

To kick things off, we're happy to let everyone know that we've just created a new Extensions section. The Strata Extensions section contains an Extension Library, where we'll be adding our own creations as well as applications developed by the community. It also contains some help for developers, including an Extension Wizard, which creates extension packaging and sample scripts, and a Developer Resources section, which provides useful information about developing scripts/extensions in Strata.

For our first extension, we thought it would be fitting to have Strata politely introduce itself with the classic phrase, “Hello, World”. Rather than just display the text “Hello, World”, though, we've added bit of international flavor and web connectivity — you can search Google, Yahoo, and Wikipedia for this phrase in several languages. It's called “Hello, World Wide Web.”

Hello, World Wide Web

As a sample application, “Hello, World Wide Web” provides a basic example of some of the hybrid web/desktop options available with Strata's interface controls and highlights how you can embed a browser control in a form with just a few lines of code.

To see this, you'll just need to take a peek at the code:

  1. Download the Hello World Wide Web extension from the Extension Library.
  2. Convert it to a ZIP file by changing the file name from “hello_world_wide_web.kxt” to “hello_world_wide_web.zip”.
  3. Extract the contents of the ZIP file to a new folder.
  4. In Strata, select Create Connection from the File menu and click the Browse button to find and select this folder and it will appear as a connected folder in the Project Panel.
  5. Expand the folder in the Project Panel and double-click the “hello_world_wide_web_form.js” file.

On line 54 is the command that creates the browser control, and on line 73 is the spot where this control gets added to the form. Of course, if you're not familiar with JavaScript already, this might look a bit cryptic (they call it “code” for a reason). But overall, it's kind of nice to be able to include a web browser in custom application with just two lines.

If you would like to explore the code for this extension a bit more, you can get variations of this extension as well as other individual script components from the Extension Wizard. This wizard gives you a quick way to grab smaller chunks of code to play around with or generate simple templates for different functions that you can modify or build upon.

We hope you have fun playing around with “Hello, World Wide Web”. If you would like to improve on it and share the results with us, please do so. We'd love to hear from you. You can submit new extensions or any improvements via our extension submission form.

As for future versions of “Hello, World Wide Web”, we certainly would welcome having more language options. Heck, we'd even take languages that don't exist in Wikipedia today — how does one say “Hello, World” in Klingon?