Data and the Web

Announcing Kirix Strata 4.5

October 10th, 2012

Image - Light Bulb Yikes. It's been a while. We've been pulled off on myriad other important projects, but happy to announce a new upgrade to Kirix Strata. Better late than never, right… right?

In addition to lots and lots of under-the-hood stuff fixed, stabilized and optimized, we've got some nice features people have been asking for. Here's a list of the highlights:

XLSX and ACCDB support. It's been a long time coming, but you can now import directly from the newer file formats from Excel and Access instead of using CSV or XLS/MDB workarounds.
Report formatting. Reports now support cell borders/lines as well as vertical alignment. Yay!

In a bit of sad news, with this upgrade we say goodbye to the Linux version of Kirix Strata. Historically, there hasn't been enough commercial traction for the version and, unfortunately, supporting it further is cost prohibitive.

In addition to the things above, we've made many improvements to various bits and bobs throughout the package, which should lead to better overall performance. Download the Windows 32-bit version here.

Enjoy!

P.S. Mea Culpa. We've experienced some unrelenting spam issues on the blog comments and forum as of late and we're trying to clean that all up; sorry 'bout that and thanks for your patience.

Posted by Ken Kaczmarek in news/announcements, strata release | Comments Off

Announcing Kirix Strata 4.4.2

January 19th, 2011

Kirix Strata Logo - World We're pleased to announce a minor upgrade to Kirix Strata today, version 4.2.2.

There are a ton of small tweaks and fixes under the hood, but here are a few of the noticeable highlights:

Features

Find In Files: For developers out there, this is a really useful one. We've added the ability to search through the directories within a project for text files containing text patterns. Matching text files are then displayed in the console panel and can be opened by double-clicking on them. For example, if you had text files in a folder on your desktop, you can connect to the folder and then do a Find in Files (Edit > Find) to search for a specific text string.
KPG (Kirix Package File) Export: We've added the ability to export the directory structure to a package file (KPG), allowing projects, or portions of projects, to be exported and re-imported with the original hierarchy.
Memory Usage Optimizations: We've updated the index memory usage function so it performs well with current hardware capabilities.

Fixes

Report Writer: We fixed various issues, including a sorting problem as well as an occurrence where multiple jobs were created when toggling between the design and layout view.
Linux: We've updated our Linux build and fixed various problems relating to the user interface in the Linux distribution.

Other

Connectivity: Connections to MySQL now utilize ODBC rather than the libmysql client library. Connecting to MySQL requires installation of the MySQL ODBC library, available here.

Click here to download the latest and greatest. If you run into any issues or need help with anything, please just let us know.

Posted by Ken Kaczmarek in news/announcements | 1 Comment »

Data Clean Up, Brought to You by Google

November 18th, 2010

I recently saw this announcement for an open source tool and thought it might be interesting to some folks that deal with messy data sets.

Google Refine provides an interesting take on grouping and filtering data and then getting it cleaned up. It also does some pretty interesting stuff using web APIs to transform data (see video 3, in particular).

The tool focuses on the data clean-up side of things, rather than analysis and reporting. You may end up running into some trouble with larger data sets, as, I believe, the processing needs to be performed entirely in memory.

However, for data geeks out there, it's definitely worth a look and might even be a nice complement for Kirix Strata at times.

If you have a chance to play with it, feel free to let us know what you think in the comments below.

Posted by Ken Kaczmarek in data analysis, dirty data | Comments Off

Announcing Kirix Strata 4.4

February 23rd, 2010

We're pleased to announce a long-awaited new upgrade to Kirix Strata today.

Here are a few highlights of note:

1. Back-end Stuff. This version includes a TON of work on back-end processing, scripting, CSVs, etc. So, a lot of the changes will be “invisible” to the average user, but will benefit everyone with speed improvements and a more robust data engine.

2. Expanded Relationship Filtering. In the previous version, we had two options for filtering related records — either “leave filter off” or “filter all child records.” In this version, we've added a third option to “mark filtered records” within the context of the entire table. So, now, when you tile your two related tables horizontally and select Tools > Related Records > Mark Related Records (or select it from the icon dropdown on the toolbar), your related records in the child set will be highlighted in yellow. In addition, we've also added a cursor marker to the parent table so you can track where you are in the child set.

3. Additional Aggregate Functions. In addition to the existing aggregate functions (e.g., SUM(), AVG(), etc.), we have included new options for Standard Deviation, STDDEV(), and Variance, VARIANCE(). As with other aggregate functions, you'll be able to use these in areas such as the query builder, relationships and grouping.

4. Expanded Table Statistics. When you select Data > Summarize, you'll now also get the minimum and maximum field length of each field.

5. Import Templates. We've added the ability to save import templates (File > Import) to your project, which will save a few steps if you have complex imports.

6. Bug Fixes. We've been able to knock out loads of small fixes throughout the software.

You can download the latest and greatest here. If you run into any issues or need help with anything, please just let us know.

Posted by Ken Kaczmarek in news/announcements | Comments Off

Further Sunlight on Government Data

July 20th, 2009

In a previous post, we discussed some of the interesting things the US government is doing to make its data more widely available, culminating in the Data.gov website. This website is now up and running and has definitely made some progress since we've last discussed it.

Data.gov is broken down into three main catalogs:

Raw Data Catalog (with data files available in XML, CSV, KML, etc.)
Tools Catalog (list of tools built to work with various open data sets)
Geodata Catalog (links to Federal geospatial data)

They've also tried to make it easier to search for data sets, which like video, is quite reliant on being tagged with good, meaningful descriptions and related meta data. It's a hard nut to crack. For example, government agencies tend to release data sets on an annual basis, so you'll have, say, 5 different data sets (and counting) for the “Public Libraries Survey” from 2004 through 2008. If your search terms aren't specific enough, these repetitious items tend to clutter up the search results. As Data.gov continues to add more data sets, hopefully they can refine this area further.

But, then again, maybe they won't have to. The folks at Sunlight Labs, whose mission is to build technology that makes government more transparent and accountable, has recently announced a project called The National Data Catalog. It will be a tool that aims to take the Data.gov concept and improve upon it. From the announcement:

“We think we can add value on top of things like Data.gov and the municipal data catalogs by autonomously bringing them into one system, manually curating and adding other data sources and providing features that, well, Government just can't do. There'll be community participation so that people can submit their own data sources, and we'll also catalog non-commercial data that is derivative of government data like OpenSecrets. We'll make it so that people can create their own documentation for much of the undocumented data that government puts out and link to external projects that work with the data being provided.”

This should be interesting to watch. As the Sunlight folks say in a later post, they are not out to replicate Data.gov, but to stand on its shoulders (similar to how, say, Weather.com relies on and improves upon the National Weather Service). Given the nature of the beast, data sets need to be described really well in order to be both searchable and useful. Hopefully the community aspect, in particular, can help give this data more utility. If any are tech savvy folks interested in either following the project or contributing with code, here's the project page.

Posted by Ken Kaczmarek in data repositories, government | 1 Comment »

A Wee Bit of Housekeeping…

July 17th, 2009

We haven't been doing much regular blogging lately, but we're hoping this will change in the coming weeks.

In the meantime, we've recently done some housekeeping on our website, so if you haven't visited recently we'd encourage you to do so. We've updated many pages with new content, but here are two sections in particular that we'd steer you toward:

Examples Section. This is a long overdue section that puts together some quick examples of how Kirix Strata™ can be applied to common data problems. The section is still a work in progress with more videos still to be produced. However, we expect what we have now will prove useful to new and old Strata users alike. Check it out.
Video Tutorials and Archive. We've done a bunch of different videos and screencasts over the past year or so, but they've been they've been posted all over our website. This new section wrangles all of the videos together in one place for posterity. The feature tutorials, in particular, are worth viewing as they help give a more comprehensive look at how to use specific features in Strata. Take a look.

So, in a nod to the Matrix, where one cannot be told what it is, but one must see for oneself, we've tried to make some high quality video documentation available. Stay tuned for more to come. Enjoy!

Posted by Ken Kaczmarek in examples, news/announcements, videos | Comments Off

wxWebConnect: Open-source Browser Library for wxWidgets

July 8th, 2009

This is sort of out of the scope of this particular blog, but I thought I'd pass along the news that we just released another open-source library for wxWidgets users. This one is called wxWebConnect and it's a library for wxWidgets that enables developers to quickly integrate advanced web browser capabilities.

Basically, it wraps up functionality exposed by the Mozilla Foundation's Gecko engine (XULRunner) into a set of user-friendly classes to: embed browser controls, search web content, print web pages, interact with the DOM, implement custom content handling for different MIME types, issue POST calls using the current browser state, etc. Notably, with this library you can also embed all of your favorite Firefox browser plug-ins into your application. We've also gone out of our way to make sure that getting a browser control up and running in your application is as easy as possible.

More information can be found at the wxWebConnect project page. Also, feel free to view some screenshots and a short video demonstration too. If you're a wxWidgets developer, give it a whirl and let us know what you think.

Posted by Ken Kaczmarek in browsers, news/announcements | Comments Off

Announcing Kirix Strata 4.3

April 28th, 2009

We're pleased to announce that we just released a new upgrade to Kirix Strata, version 4.3! Kudos to our developers for adding a lot of nice features and bug fixes. The full list of notes to this release is below the jump, but here are a few of the bigger changes:

External Database Connectivity

We've really improved the way that Strata works with external databases by optimizing our pass-through queries for databases like Oracle, SQL Server and MySQL. In addition, queries in the query builder that reference external database tables pass the query through to the external database, significantly increasing the speed of queries on external databases. Furthermore, you can now edit individual cells in Strata and have them update in your external database table. This is very welcome news to folks that want to use Strata as a front-end to their external database tables.

UPDATE (04/30/2009): Just a quick note of clarification, on the “read” side, you can work with external databases for things like sorting, filtering, marks, calculated fields, grouping and copying. On the “write” side, we currently only have cell editing available, but will work on adding other features in the future such as append (i.e., insert record), delete, update, and some modify structure operations. If you need these additional “write” features, please send us a note to let us know how you would plan on using them to help us prioritize our development efforts. Thanks dedicated servers!

Improved SQL Support

We added a console panel to allow direct querying of internal and external databases with SQL commands, as well as to provide feedback for database operations and scripts. You can learn more here.

EBCDIC Conversion

Strata now handles EBCDIC. We haven't added copybook support just yet, but you can either manually set your breaks using the text-import or create scripts to convert the EBCDIC file to ASCII format. You can learn more here.

Fixed Length and Delimited Table Export

We've also added Fixed-length export (this also works when using File > Save As External). In addition, we've expanded the text-delimited export so that you can specify your own delimiters, such as pipe-delimited and semi-colon delimited. You can learn more about the new text-delimited functionality here.

Handling Tablenames & Fieldnames with Spaces

One of our most common support questions relates to spaces in a fieldname (like “my field” instead of “my_field”). We've now solved this issue by allowing spaces to be used by enclosing the name in brackets. So, for example, these are now all valid expressions:

[Field  1] * [Field  2]
Field1 * [Field  2]
[Table 1].[Field 1]*[Table 2].Field2

You can learn more here.

Much Much More…

There are plenty of other upgrades like project handling, new keyboard shortcuts, auto-fill group and sort dialogs, new script classes, etc. You can check out all the changes below the jump.

Please download the latest Strata (or just click “Check for Updates” in the Help menu), give it a whirl and let us know what you think!

Read the rest of this entry »

Posted by Ken Kaczmarek in news/announcements | Comments Off

Data.gov

March 5th, 2009

We recently posted an article about Vivek Kundra, who was named United States CIO this morning by the Obama administration. He's got $71 billion in IT spending under his care. Hmm, that's a lot of data browsers.

One interesting tidbit appeared in this Saul Hansell NY Times article:

Another initiative will be to create a new site, Data.gov, that will become a repository for all the information the government collects. He pointed to the benefits that have already come from publishing the data from the Human Genome Project by the National Institutes of Health, as well as the information from military satellites that is now used in GPS navigation devices.

"There is a lot of data the federal government has and we need to make sure that all the data that is not private, or restricted for national security reasons, can be made public," [Kundra] said.

In another bit of interesting news, the Jonathan Stein at Mother Jones notes that Mike Honda (D-Calif) added a provision into the recent appropriations bill that requires government entities to make their public available in raw form:

If the Senate passes the bill with the provision intact, citizens seeking information about Congress' activities—such as bill names and numbers, amendments, votes, and committee reports—won't have to rely on government websites, which often filter information, are incomplete, or are difficult to use. Instead, the underlying data will be available to anyone who wants to build a superior site or tool to sift through it. “The language is groundbreaking in that it supports providing unfiltered legislative information to the public,” says Honda's online communications director, Rob Pierson. “Instead of silo-ing the information, and only allowing access through a limited web form, access to the raw data will make it easier for people to learn what their government is doing.”

Kim Zetter from Wired has more on the story here.

Maybe once the data is made more accessible, some clever folks can put an interface on things that improve the complex aftermath of the “laws and sausages” routine. I did my best to search for Honda's three-sentence provision in the latest omnibus bill with no luck. Anyone know what the actual provision stated? [UPDATE: Rob Pierson, Online Communications Director of Congressman Honda's office, provided a link to an O'Reilly post with the full text of the provision. Give the full article a read — it's quite worthwhile.]

And, for posterity, here are some of the data repositories mentioned in the articles above:

Posted by Ken Kaczmarek in data mining, data repositories, government | 3 Comments »

AWS Public Data Sets Continues to Expand

February 25th, 2009

Previously, we posted some information on Amazon's foray into making huge public data sets available to users of their web services. Yesterday they announced the addition of some very sizable additions:

US Bureau of Transportation Statistics
DBpedia Knowledge Base (67 GB)
Freebase Data Dump (66 GB)
Genbank Genetic Sequence Database(250 GB)

If you use AWS, the announcement provides more info on these datasets as well as how to access them. If you don't use AWS, you can still access much of this data directly from the websites linked above.

Posted by Ken Kaczmarek in data analysis, data mining, data repositories | Comments Off