Kirix Strata Blog

Checking Date Ranges Prior to Analyzing New Data Sets

Date Range Extension ImageSeasoned data analysts know that one of the first things you need to do with a new, unfamiliar data set, is to run some basic tests to determine what kind of animal you’re working with.  This is particularly important when working with larger data sets that may be amalgamated from multiple systems or appended together from archived files.

One of these tests is a date range check.  So, for example, if a client has shipped you all the data from the first 6 months of 2007, you want to make sure you actually have a full, complete 6 months of data to work with.  In fact, you’d like to see something like this:

12/2006 -  43 records
01/2007 - 255 records
02/2007 - 249 records
03/2007 - 265 records
04/2007 - 287 records
05/2007 - 259 records
06/2007 - 263 records
07/2007 -  53 records

The outlying dates on the end (12/2006 and 7/2007) do provide some comfort that the data set is truly complete.  However, it is surprising how often you’ll actually see something like this:

01/1999 - 196 records
12/2006 -  43 records
01/2007 - 255 records
02/2007 - 249 records
03/2007 -  96 records
04/2007 - 287 records
05/2007 - 259 records
06/2007 - 263 records
07/2007 -  53 records

This second example is a dirtier data set; there is a strange, high-count outlier from 1999 and we also see that there was a significant drop in the record count during March 2007.

Before you actually start performing your analysis, you’d want to investigate the items from 1999, which could just be empty records that can be ignored or, worse, could be something wrong with the formatting of these records.  The precipitous drop in March 2007 is a little more worrisome.  Was it because sales dipped drastically that month or was it because there was an error when the IT department appended this data set together?

Whatever the cause, it’s better to get your data in order and make sure you have a complete set before jumping into your analysis and providing that client with incorrect or skewed results.  In order to help you to do this, we’ve created a simple date range analysis extension.  Running this utility on a new data set from the get-go can save you a lot of time and hassle later on.

You can install the date range analysis extension and learn how to use it here.  Got some other data utilities you’d like in your toolkit?  Let us know.

Comments are closed.