Data and the Web

Archive for the ‘web log analysis’ Category

Embedded phpBB Search Terms within Apache Web Logs

Friday, August 24th, 2007

This afternoon I was doing some analysis on our web logs and thought it may make for a good screencast and blog post. We currently use a combination of AWstats and Google Analytics for our web stats but are increasingly using Kirix Strata™ to dig deeper into the raw web logs for the more customized things that aren’t readily available otherwise.

Also, honestly, it is kind of fun to plow through almost a million records on your own. Hmmm, maybe I should get out more.

The topic of the screencast below are the search terms people enter to find things in our phpBB3 support forums. These terms are embedded in the “request” field of the apache logs and I couldn’t find a way to get them without digging into the logs themselves (NOTE: I wouldn’t doubt that there is some way to do this via a mod to phpBB or a filter in Google Analytics… but since I couldn’t find anything via a quick Google search, using Strata just ended up being a lot faster).

An example of a search string we’re dealing with is:

GET /forums/search.php?keywords=proxy HTTP/1.1

So the trick was to parse the search keywords out of the field and then group them together to see what people were searching for… and in turn give us the chance to improve our support area by targeting some of these search terms and expanding our documentation accordingly.

Hope this video proves helpful:

Play Video

(And here’s an embeddable YouTube version…)

TECHNICAL NOTE:

I downloaded the Apache logs from the server and, due to the file size, decided to import them into Strata rather than open the file and work with it directly. To import your logs, go to Import, select text-delimited files, and then import as space delimited with quotation marks as the text qualifier.  Update:  You can now use a handy little log parsing extension to pull in your web log files without having to mess around with a straight text import.

TECHNICAL NOTE 2:

For posterity, here are the functions that were used in this screencast:

STRPART(string, section [, delimiter])
SUBSTR(string, start [, length])
CONTAINS(string, search string)
IIF(boolean test, true value, false value)

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.