Data and the Web

Archive for the ‘extensions’ Category

Fun (and Fraud Detection) with Benford’s Law

Tuesday, July 22nd, 2008

Benford Law Graph - smallBenford’s law is one of those things your high school math teacher would break out on a slow, rainy day when the students’ attention span was even lower than usual.

He’d start out by asking the class to look at the leading digits in a list of numbers and then predict how many times each leading digit would appear first in the list.  The students would make some guesses and eventually come to the consensus that the probability would be pretty close — about 11% each.

Then, the teacher would just sit back, smile, and gently shake his head at his simple-minded pupils.  He would then go on to explain Benford’s law, which would blow everyone’s mind — at least through lunchtime.

Play Benford’s Law Video

(Click the image above… or here’s an embeddable YouTube version)

Per Wikipedia:

Benford’s law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is distributed in a specific, non-uniform way.

Specifically, in this way:

Leading Digit     Probability
      1              30.1%
      2              17.6%
      3              12.5%
      4               9.7%
      5               7.9%
      6               6.7%
      7               5.8%
      8               5.1%
      9               4.6%

Again, from Wikipedia:

This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).

Boiling it down, this means that for almost any naturally-occurring data set, the number 1 will appear first about 30% of the time.  And, by naturally occuring, this can mean check amounts or stock prices or website statistics.  Non-naturally occurring data would be pre-assigned numbers like postal codes or UPC numbers.

Besides being fun to play with, Benford’s is used in the accounting profession to detect fraud.  Because data like tax returns and check registers follow Benford’s, auditors can use it as a high-level check of a data set.  If there are anomalies, it may be worth investigating closer as potential fraud.

If you’re interested in further information about fraud detection using Benford’s, definitely give these two articles by Malcolm W. Browne and Mark J. Nigrini a read.

Try It Out for Yourself

Take a look at the demonstration video above to see Benford’s law in action with data sets from the web.  If you’d like to play with it yourself, just install the Benford’s Law extension for Kirix Strata™ and have fun.

Also, please note that I used the following data sets in the video, if you’d like to give those a spin:

Wikipedia List of Lakes in Minnesota
US Census Data Sets
Social Blade - Digg Statistics

And here are a few other worthy ones that didn’t make it in the video:

NASDAQ Historical Stock Price
Wikipedia List of Countries by Population
And plenty more at Delicious here…

Enjoy!

Predict the Future with Some Ad Hoc Time-series Forecasting

Wednesday, July 16th, 2008

Lokad LogoWe’re happy to announce that we’ve teamed up with the good folks from Lokad to create a Kirix Strata™ forecasting plug-in, which you can use with your own time-series data.

Lokad is a company that has created some slick forecasting software and, thankfully, offers it as a web service via their API (you can also upload data directly to their site).  Here’s a link where you can find lots of good information on their technology.  Bottom line, they offer some great business forecasting tools at a cost-effective price.  Their API was a piece of cake to work with and so we were able to quickly put a GUI on it and create the Strata Lokad forecasting extension.

Play Video

(And here’s an embeddable YouTube version…)

Obviously, there’s quite a bit of forecasting that goes on day to day within companies.  When you veer toward the largest companies, you’ll find departments dedicated to forecasting with automated processes built into their ERP systems.  With smaller companies, forecasting is likely performed by someone without the word “forecast” in their job title.  For instance, a warehouse manager may need to forecast inventory to make solid replenishment orders.  Proper forecasting prevents the costly mistake of either overbuying (spoilage, locked-up cost of capital) or underbuying (lost sales).

However, the sweet spot for the Strata Lokad extension is ad hoc forecasting; it’s for people who have various, changing data sets and need their forecasts on-the-fly.  Business consultants who provide forecasts for their clients would fall in this category.  In addition, this extension can benefit sales analysts who don’t have adequate forecasting from their OLAP systems or financial analysts interested in different cash flow forecasts.

The great thing about forecasting algorithms is that they apply to a wide range of circumstances.  So, if you’ve got some historical data to throw at a situation, you can get back some good results.

So, if you’ve got some time-series data and want to predict the future with it, give the Lokad forecasting extension a try.  The installation itself along with all the details can be found here.  If you’ve got questions about the plug-in, send ‘em our way.  And, if you’ve got any questions about Lokad, their technology or forecasting in general, please feel free to give them a shout — they’re quite knowledgeable and helpful.

P.S.  We’re pleased to note that this is the first extension we’ve made public that takes advantage of Strata’s web scripting capabilities that brings a web API to the privacy and comfort of your own desktop.  Got another web API you’d like to see work with Strata?  Let us know.

The Long Tail of Enterprise Software Demand

Thursday, June 19th, 2008

I was able to attend Dion Hinchliffe’s webinar yesterday (sponsored by Snaplogicthree more free seminars to go) called “Bringing Web 2.0 into the Enterprise with Mashups: Drivers, Requirements and Benefits.”  The session was a very a nice overview of how mashups have impacted the consumer space and how they are creeping into the enterprise.  However, there was one point that struck me as particularly salient… it was something Dion termed “The Long Tail of Enterprise Software Demand.”

Image - Long Tail of SoftwareDemand (source: Hinchcliffe & Company)

I always find it interesting when the concept of the long tail is applied outside of its original scope, and I think Dion nailed it on the head with this analogy.  The synopsis is that there is a large demand curve for software in the enterprise, but only the biggest, most global projects get funded and developed.  The rest of IT’s resources go to maintaining existing systems.  However, there is an extremely long tail of other customized software needs at the business unit level, the departmental level, and even at the individual level that never get created.

The point Dion was making was that there is a lot of potential for easy-to-develop mashups to fill this gap — a self-serve model, if you will.  Mashup tools would make it easy for individuals to create the specific applications they need with a short turnaround time.  In fact, one of Dion’s wrap-up points was that mashup tools should be as easy to use as a spreadsheet.

To take a step back for a second, it may be useful to define what a mashup is.  I would venture to say that when people think of mashups, the first thing that comes to mind is something that integrates a Google Map with other web data, like housing data.  Zillow would be a classic example of this type of mashup.  In fact, Programmable Web states that a full 39% of mashups on their site are related one way or another to mapping.

Wikipedia puts it this way:

In technology, a mashup is a web application that combines data from more than one source into a single integrated tool; an example is the use of cartographic data from Google Maps to add location information to real-estate data, thereby creating a new and distinct web service that was not originally provided by either source.

I suppose it is helpful to define mashups solely as web applications in order to create a nice clean line, but I’d argue that it does the genre a disservice, particularly in the realm of Enterprise Mashups.  This is because there is a storied, if sordid, history of “mashups” that have existed in the long tail of the enterprise for many years.

At a base level, regardless of IT budget, people need solutions to their issues and are often crafty enough to figure out a way to get things done.  These “mashups” often take the form of a duct-taped visual basic script that makes Access do some specialized app for the receivables department.  Or maybe someone creates an, ahem, “untidy” Excel macro that goes way beyond anything Microsoft ever envisioned, but it does a perfect job of forecasting inventory for the sales folks.  It always seems like there is at least one “guru” at the departmental level that knows just enough “programming” to be dangerous.  Dion referred to these types of workers as Prosumers, or folks that have just a bit more technical sophistication than a standard consumer, but are not programmers.

In any event, their circa-1997 Access apps are often cursed by IT.  Their franken-spreadsheets are the scourge of management concerned about security.  But, in the end, they get the job done.  And, they do it with $0 of IT investment.  Their important role in the business shouldn’t be taken lightly.

Now granted, these ad hoc apps don’t currently take advantage of the data in the cloud, but it is this long-tail that has been active for years, mashing up data from different internal systems.  It was the dependable (if low-mileage) four-door sedan compared to the efficient hybrid roadster that is currently on the production line.

It is in this realm that a data browser fits in very nicely as a long-tail mashup tool for the prosumer who needs to divine something from their data.  Clearly a browser is not in the cloud, but being local does carry some benefits, such as:

  • Handling as much data as you throw at it, using the power and speed of the PC for processing and manipulation.
  • Securely mashing up local data, enterprise database data and web data (APIs, CSV bookmarks, RSS feeds, etc) and never needing to push the private business data to an external server.
  • Being extremely flexible and having an interface that is familiar to existing business users, similar to Access or Excel.
  • Offering extensibility, such that the long-tail prosumer folks can quickly knock out a JavaScript plug-in for an ad hoc app only needed at the departmental level.

There is a real beauty in the idea of mashups flourishing in the workplace.  There is this certain intangible, ad hoc “thing” out there that every business person runs into at one time or another, which  just can’t be solved by a single over-arching IT project.  This is why people still use spreadsheets for everything.  And this is why it’ll be fascinating to see how mashup tools will be applied by these ingenious long-tail workers to boost productivity and efficiency in the coming years.

P.S.  As a quick aside, it is interesting to see the parallels between this discussion and the “last mile of business intelligence” that we talked about previously.  Maybe they’re just different sides of the same coin.  Hmmm, this may require another blog post in the future…

Everything You Wanted to Know About Kirix Strata and More…

Tuesday, June 17th, 2008

RSS FeedIn addition to today’s announcement about the Extensions section, we’ve also released an equally important new part of our website — the Kirix Strata™ Blog.

If you use Kirix Strata, we would recommend you subscribe to the Strata Blog feed (or get a subscription via email).  This will be the place where we  post examples, tips & tricks, case studies and interesting links.  There are lots of things that Strata can do to make your data tasks more efficient or let you discover new things within your data; you just need to know how to use the tools in the toolkit.

If you have data questions or would like us to demonstrate a particular concept, please let us know and we may be able to create a Strata Blog post for you and let everyone join in on the education.

Also, while we’re talking about feed subscriptions, the other feed that may interest you is our Extensions feed (again, this can be subscribed to via email).  This feed will alert you whenever a new extension has been posted to the Extension Library so you can try out the various things that interest you most.

As with any new blogs, we are obviously starting with a humble first post, but we plan to expand rapidly from here.  And, if you have any feedback on these two new sections, please let us know how we can best serve you.  Enjoy!

Extend and Conquer: Introducing Kirix Strata Extensions and Developer Resources

Tuesday, June 17th, 2008

Puzzle PieceWe’re really excited to announce our Kirix Strata™ Extensions section along with a load of resources for developers.

As with other browsers (like Firefox, which is making some news of its own today), Kirix Strata is extensible and supports a plug-in architecture.  It uses the JavaScript syntax, so any developers who are familiar with Javascript should find creating Strata scripts pretty easy.

Unlike other browsers though, Strata packs a full database engine for the journey. Combine this power with the ability to access stuff on the web (web content, APIs, DOM manipulation) as well as local files, and you’ve got yourself a highly customizable rich internet application for data-centric tasks.

We’ve been scripting a lot for client projects and it has pleasantly surprised us how much one can do with Strata’s engine.  We’ve also been creating a bunch of extensions in-house that we’ll be rolling out in the coming weeks for everyone to use.

Here’s a full list of the stuff we’ve added to the website today:

Extension Library

The library is the place where we’ll be listing all new extensions.  We’ll be rolling these out as we create them, but we’d also be really happy to publish any extensions developed by the community that may be useful to a wide range of people.  Got an extension to share? Please submit it and we’ll post it.

Extension Wizard

The Extension Wizard makes some of the mundane tasks of creating scripts and extensions a little less painful.  There are three things it offers:

  1. Extension Packaging:  Create the appropriate “packaging” for an extension.  Just write your script and let the wizard package it up for you to distribute.
  2. Script Templates and Components: This area provides a number of pre-packaged scripts that you can use in your own development.  It has scripts for such things as form controls, form layouts, database/SQL examples and API examples (e.g., FTP requests and an RSS feed parser)
  3. Sample Applications:  You can also create variations of a full application which is helpful when you want to take an already-built extension, open it up and see what makes it tick.

Build Your Own Extension

The build-your-own page provides a high-level view of creating an extension for Strata.

Developer Resources

With the Developer Resources section, we’ve finally put some meat around the skeleton API documentation that we previously made available on our website.  This section provides an overview of working with scripts in Strata as well as information about the syntax and API.

Submit an Extension

Got an extension that you’d like to share with the world?  Submit it here and we’ll post it to the library.

Kirix Strata does a lot of great stuff out of the box for working with and reporting on data.  But scripting and extensions offer power users an opportunity to develop customized applications for themselves and their co-workers.  If you are a developer, we hope you dig into the documentation and find it valuable.  If you aren’t a developer, we just hope that the extensions library will  prove useful to you over time.

Lastly, please let us know if you have any questions about scripting or extensions and we’ll be happy to help!

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.