Data and the Web

Archive for August, 2007

Kirix Strata Beta 3 Now Available

Thursday, August 30th, 2007

Hello all, we're quite happy to announce that we've got our third beta version up and available for download. Here are the major features/bug fixes that made it into this version:

Vastly improved Oracle connectivity.
A bunch of MySQL connectivity bug fixes.
Import wizard refinements, including a better interface for choosing what type of file you wish to import.
Loads of other bug fixes based on user feedback.

Also, we've done some work in the area of scripting and extensions that should be highlighted:

Extensions can now be enabled/disabled from the extension manager.
With scripting, you are now able to load and execute function calls to DLLs and shared libraries — a special shout out to kapex01 for suggesting this feature, we think it adds a lot of really cool possibilities for using Strata as a platform for combining the best of desktop with the best of the web.
Also now with Strata's scripting, you can directly access window handles for forms and controls.

Thanks again to everyone for posting bugs and feature suggestions in the forums and via our bug feedback form, it is a tremendous help!

Posted in news/announcements | 1 Comment »

Embedded phpBB Search Terms within Apache Web Logs

Friday, August 24th, 2007

This afternoon I was doing some analysis on our web logs and thought it may make for a good screencast and blog post. We currently use a combination of AWstats and Google Analytics for our web stats but are increasingly using Kirix Strata™ to dig deeper into the raw web logs for the more customized things that aren't readily available otherwise.

Also, honestly, it is kind of fun to plow through almost a million records on your own. Hmmm, maybe I should get out more.

The topic of the screencast below are the search terms people enter to find things in our phpBB3 support forums. These terms are embedded in the “request” field of the apache logs and I couldn't find a way to get them without digging into the logs themselves (NOTE: I wouldn't doubt that there is some way to do this via a mod to phpBB or a filter in Google Analytics… but since I couldn't find anything via a quick Google search, using Strata just ended up being a lot faster).

An example of a search string we're dealing with is:

GET /forums/search.php?keywords=proxy HTTP/1.1

So the trick was to parse the search keywords out of the field and then group them together to see what people were searching for… and in turn give us the chance to improve our support area by targeting some of these search terms and expanding our documentation accordingly.

Hope this video proves helpful:

(And here's an embeddable YouTube version…)

TECHNICAL NOTE:

I downloaded the Apache logs from the server and, due to the file size, decided to import them into Strata rather than open the file and work with it directly. To import your logs, go to Import, select text-delimited files, and then import as space delimited with quotation marks as the text qualifier. Update: You can now use a handy little log parsing extension to pull in your web log files without having to mess around with a straight text import.

TECHNICAL NOTE 2:

For posterity, here are the functions that were used in this screencast:

STRPART(string, section [, delimiter])
SUBSTR(string, start [, length])
CONTAINS(string, search string)
IIF(boolean test, true value, false value)

Posted in data analysis, examples, videos, web log analysis | 1 Comment »

Kirix Strata Beta 2 Now Available

Friday, August 17th, 2007

Hi everyone, we've now got a new beta iteration up and available for download.

Proxy Configuration Screenshot

Strata Beta 2 includes the following new features and bug fixes:

Added proxy configuration support (see screenshot above) as requested here. To adjust your settings, go to Tools>Options>Internet Tab.
Fixed the overly-eager numeric auto-sensing for things such as ip addresses (12.233.132.33) and English Premier League scores (1-2) first reported here.
Fixed the web page source view issue as reported here. Note that you can now toggle between views by either using the “View” menu, by clicking on the “Toggle View” icon from the toolbar or via the right-click menu on web pages.
Fixed the Ubuntu installation issues as reported in many places.
Fixed the MySQL DateTime issue reported here.
Also fixed the MySQL off-by-one issue as reported here.
Fixed the column break issue reported here.
Fixed a whole bunch of additional nickel and dime issues such as adding new hot keys, fixing some menu issues, and cleaning up some scripting bugs.
We also have updated our documentation related to Strata's scripting. Still a lot more to go, but it's something we'll be building on in the coming weeks.

We've also upgraded our build process so hopefully we'll be able to turnaround new beta iterations more frequently from here on out. Thanks to everyone who has contributed to this beta effort (and, as an aside, have earned themselves a free license when Strata is released), either via the forum, the bug report form or via support emails. Please keep ‘em coming!

Posted in news/announcements | Comments Off

Horizontal Tab Groups Make Bug Entry Fun!

Friday, August 10th, 2007

Bug Entry is Fun! (screenshot)

Thanks for all the bug reports this week; we're working hard to sort them out.

We're hoping to have a new beta for everyone early next week. In addition to a lot of nickel and dime fixes, we'll definitely be adding a configuration page for proxy settings, our most requested feature.

Have a good weekend!

Posted in examples, news/announcements | Comments Off

Everyone Loves a Prequel

Wednesday, August 8th, 2007

popcorn Yesterday, Kirix Strata™ received a nice write up by The Register. Unfortunately we had a little rough sailing in the morning with the ensuing web traffic. We apologize to anyone who had to suffer through worse-than-dial-up speeds while downloading the beta. The problem was fixed and so hopefully it won't occur again.

The article hinted at the origins of Strata and I thought it may be useful to fill this story out a little bit more. Thankfully, this prequel does not involve midichlorians.

We introduced Strata at LinuxWorld a couple years ago as a “dynamic database” — sort of a cross between a desktop database and a spreadsheet — that made it really easy to use, manipulate and analyze structured data. In addition to its ease of use, Strata also had tremendous data capacity and speed, bringing the difficult world of databases a step closer to those who would otherwise shiver at the sight of SQL. These traits actually helped it win the LinuxWorld “Best in Show” award for desktop/productivity/business applications.

Unfortunately, there were a couple issues that limited its mass appeal. The first issue was connectivity: users needed to import all their data into the project. This isn't a problem if the data is static, historical data, but it becomes a bigger problem if the data requires regular updating. The second problem was repeatability: users couldn't easily replicate logic without performing a set of manual steps. So, unfortunately, for any type of repeated analysis, use on a daily basis could become burdensome.

These two areas became the primary focus of the new and improved Strata. We wanted 1) to enable people to work with data outside of a Strata “project” and 2) to provide a way to code their logic into scripts that could be run on a regular basis. For the former, we added the ability to open up files and manipulate them directly, like a CSV file on your desktop or a MySQL table on a server. For the latter, we implemented a scripting language (ECMAScript) with both a database and interface API, enabling developers to create a repeatable process (in embedded SQL) that could easily be deployed as an extension.

But then, as we looked at data accessibility and how to connect to various data sources, we started thinking about the web as a database. Although the web contains large amounts of information in HTML, quite a bit of information can be interpreted in a structured way with just a little bit of work. And with other data available as CSVs, RSS feeds, or through APIs, we thought it would also be useful to allow users to access some of these web-based data resources more directly.

That meant we needed to embed a browser, and after investigating several options, we settled on Mozilla's Gecko layout engine. Of course, the real trick was not just to let people browse web pages, but to let them interact with the content in a more data-oriented manner — a “trick” we're still exploring, implementing and refining. So, in many ways, Strata is a bit more of a “Data Interactor” than a “Data Browser.” But, I suppose, the former doesn't roll off the tongue as nicely…

So, in the end, this beta version of Strata builds really builds on a history of database power and analytics. And now we've got a chance to see what happens when we apply these things to the web too.