Kirix Strata Blog

Archive for the ‘web logs’ Category

Watching Reruns: Strata Tutorial Videos from the Archives

Tuesday, July 8th, 2008

Movie IconIt’s been almost one year since we released the beta version of Kirix Strata to the public.  During that beta cycle, we provided several videos and screencasts via our blog to emphasize different things the software could do.

Thankfully, even though the videos show the beta version in action, almost all of the content is extremely relevant for the final version of Strata as well.  The only variance really has to do with the user interface; we ended up moving around icons and toolbars and menu items quite a bit until we got something that seemed to work best.  Oh, and you may see the original Strata logo that we threw together for the beta.

So, maybe you can consider this blog post your Tivo or on-demand video page for “Season 1″ of Kirix Strata.  Here are the five links, with details and highlights of each one below:

(more…)

Researching Problems in your Apache Web Log Activity

Monday, June 23rd, 2008

So I came into work the other day and the first thing one of our web admins says to me is, “Were we Slashdotted yesterday?”  I had just been reviewing our web activity and didn’t think that was the case.  However, I did a quick check on our Google Analytics account and, as expected, nothing was out of the ordinary.

The reason he asked the question was that our Apache log file that day was over 10 times the size of the file from the previous day.  It sure looked like the server was getting hammered.

So, I decided to take a look and see what the problem was.  I pulled down the Apache log and imported it into Strata.  See the video below for a step-by-step look:

Play Video

(And here’s an embeddable YouTube version…)

Now, as an aside, if you’ve ever tried to look at a raw Apache log in Excel or notepad, you’ll see that it is space-delimited and the date/time format is not trivial to deal with.  Not only that, but the sheer size of a log file makes them almost impossible to handle in a spreadsheet.  The one I was dealing with was over 100,000 records long — and that was just one day.

Strata can easily handle the data size, but the format is enough to give any software fits.  So, we wrote a quick Apache log parser extension that makes it really simple to just point the software to your Apache log and import it.  The resulting table is nicely formatted and everything is ready to go (including those pesky date fields).  You can get the extension here.

So, back to the issue at hand… after I imported it, I played around with the data to identify what was causing the problem.  I grouped the IP addresses together to see if I could pinpoint a few culprits.  And, indeed, I found two:

  • An unknown bot
  • Our own server

After a little more research, I found out that the bot was searching for all kinds of non-existent URLs and was basically appending one path to another to get some really bizarre URLs:

/labs/wxaui/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js
/labs/wxaui/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/fileadmin/js/swfobject.js

I then took a look at the records from our own server and saw that for each of these non-existent URLs, we were serving up a “Not Found” page, thus doubling the trouble this bot was causing.

In the end, I had our web admin look into the problem.  It turns out we were poorly formatting some of the URL paths on the site.  Most bots can handle both absolute and relative paths, but some can’t.  These bots that can’t handle the relative paths end up going a little nuts as they spider the website.  (I couldn’t find  a really nice, clean explanation of this issue via Google, but this thread is close enough for those who are interested.)

Anyway, it was nice to be able to just pull out Kirix Strata and, within a few minutes, figure out what the issue was.  For those of you who are interested in your web logs, give the Apache Web Log import extension a spin and let us know what you think.