Mashups | Data and the Web

Data and the Web

Archive for the ‘mashups' Category

More Government Data Coming to a Browser Near You…

Friday, February 6th, 2009

File CatalogIt was intriguing to see how all this newfangled web 2.0 technology was applied during the US presidential campaign this past year (organization, multimedia, etc.). It's also quite interesting to hear about some of the big ideas for how the new administration wants to change how government works. And, not to be outdone, the opposition party is also getting into the Web 2.0 game.

According to Nextgov, it appears that Vivek Kundra, current CTO of the District of Columbia, is going to be given the nod as the next e-government liaison. From the article:

Kundra also is a strong proponent of giving the public access to government data. “Why does the government keep information secret?” he rhetorically asked during an interview with Nextgov. “Why not put it all out in the government domain?” [Since arriving in Washington], I've made all the government databases public. Every 311 call, every abandoned automobile, who has responded, etc. It provides high-level oversight of the daily tasks of government.”

A more in-depth bio of Kundra can be found at this recent Washington Post article. A couple of the more intriguing things that he promoted in the District of Columbia were the DC Data Catalog and “Apps for Democracy.”

The data catalog covers all kinds of DC data from crime statistics to — ahem — most recent roadkill pickups. It's also available in a wide variety of formats. The “Apps for Democracy” was a kind of mashup contest to see what kind of apps could be developed to improve DC resident's access to data. It was highly successful, providing 47 different applications for a fraction of the cost of formally contracting out these projects.

Of course, changing such a huge, bureaucratic system as the Federal government will not happen overnight, but it is encouraging to see more of a focus on making data available in a timely manner (and in usable formats).

For those interested in this sort of thing, I'd also recommend checking out the Sunlight Foundation, which is focused on government transparency. Also, TechPresident and Nextgov are both news sources focused on following all things e-gov.

Got any other interesting links on this topic? Please feel free to post ‘em in the comments below.

Predict the Future with Some Ad Hoc Time-series Forecasting

Wednesday, July 16th, 2008

Lokad LogoWe're happy to announce that we've teamed up with the good folks from Lokad to create a Kirix Strata™ forecasting plug-in, which you can use with your own time-series data.

Lokad is a company that has created some slick forecasting software and, thankfully, offers it as a web service via their API (you can also upload data directly to their site). Here's a link where you can find lots of good information on their technology. Bottom line, they offer some great business forecasting tools at a cost-effective price. Their API was a piece of cake to work with and so we were able to quickly put a GUI on it and create the Strata Lokad forecasting extension.

Play Video

(And here's an embeddable YouTube version…)

Obviously, there's quite a bit of forecasting that goes on day to day within companies. When you veer toward the largest companies, you'll find departments dedicated to forecasting with automated processes built into their ERP systems. With smaller companies, forecasting is likely performed by someone without the word “forecast” in their job title. For instance, a warehouse manager may need to forecast inventory to make solid replenishment orders. Proper forecasting prevents the costly mistake of either overbuying (spoilage, locked-up cost of capital) or underbuying (lost sales).

However, the sweet spot for the Strata Lokad extension is ad hoc forecasting; it's for people who have various, changing data sets and need their forecasts on-the-fly. Business consultants who provide forecasts for their clients would fall in this category. In addition, this extension can benefit sales analysts who don't have adequate forecasting from their OLAP systems or financial analysts interested in different cash flow forecasts.

The great thing about forecasting algorithms is that they apply to a wide range of circumstances. So, if you've got some historical data to throw at a situation, you can get back some good results.

So, if you've got some time-series data and want to predict the future with it, give the Lokad forecasting extension a try. The installation itself along with all the details can be found here. If you've got questions about the plug-in, send ‘em our way. And, if you've got any questions about Lokad, their technology or forecasting in general, please feel free to give them a shout — they're quite knowledgeable and helpful.

P.S. We're pleased to note that this is the first extension we've made public that takes advantage of Strata's web scripting capabilities that brings a web API to the privacy and comfort of your own desktop. Got another web API you'd like to see work with Strata? Let us know.

The Long Tail of Enterprise Software Demand

Thursday, June 19th, 2008

I was able to attend Dion Hinchliffe's webinar yesterday (sponsored by Snaplogicthree more free seminars to go) called “Bringing Web 2.0 into the Enterprise with Mashups: Drivers, Requirements and Benefits.” The session was a very a nice overview of how mashups have impacted the consumer space and how they are creeping into the enterprise. However, there was one point that struck me as particularly salient… it was something Dion termed “The Long Tail of Enterprise Software Demand.”

Image - Long Tail of SoftwareDemand (source: Hinchcliffe & Company)

I always find it interesting when the concept of the long tail is applied outside of its original scope, and I think Dion nailed it on the head with this analogy. The synopsis is that there is a large demand curve for software in the enterprise, but only the biggest, most global projects get funded and developed. The rest of IT's resources go to maintaining existing systems. However, there is an extremely long tail of other customized software needs at the business unit level, the departmental level, and even at the individual level that never get created.

The point Dion was making was that there is a lot of potential for easy-to-develop mashups to fill this gap — a self-serve model, if you will. Mashup tools would make it easy for individuals to create the specific applications they need with a short turnaround time. In fact, one of Dion's wrap-up points was that mashup tools should be as easy to use as a spreadsheet.

To take a step back for a second, it may be useful to define what a mashup is. I would venture to say that when people think of mashups, the first thing that comes to mind is something that integrates a Google Map with other web data, like housing data. Zillow would be a classic example of this type of mashup. In fact, Programmable Web states that a full 39% of mashups on their site are related one way or another to mapping.

Wikipedia puts it this way:

In technology, a mashup is a web application that combines data from more than one source into a single integrated tool; an example is the use of cartographic data from Google Maps to add location information to real-estate data, thereby creating a new and distinct web service that was not originally provided by either source.

I suppose it is helpful to define mashups solely as web applications in order to create a nice clean line, but I'd argue that it does the genre a disservice, particularly in the realm of Enterprise Mashups. This is because there is a storied, if sordid, history of “mashups” that have existed in the long tail of the enterprise for many years.

At a base level, regardless of IT budget, people need solutions to their issues and are often crafty enough to figure out a way to get things done. These “mashups” often take the form of a duct-taped visual basic script that makes Access do some specialized app for the receivables department. Or maybe someone creates an, ahem, “untidy” Excel macro that goes way beyond anything Microsoft ever envisioned, but it does a perfect job of forecasting inventory for the sales folks. It always seems like there is at least one “guru” at the departmental level that knows just enough “programming” to be dangerous. Dion referred to these types of workers as Prosumers, or folks that have just a bit more technical sophistication than a standard consumer, but are not programmers.

In any event, their circa-1997 Access apps are often cursed by IT. Their franken-spreadsheets are the scourge of management concerned about security. But, in the end, they get the job done. And, they do it with $0 of IT investment. Their important role in the business shouldn't be taken lightly.

Now granted, these ad hoc apps don't currently take advantage of the data in the cloud, but it is this long-tail that has been active for years, mashing up data from different internal systems. It was the dependable (if low-mileage) four-door sedan compared to the efficient hybrid roadster that is currently on the production line.

It is in this realm that a data browser fits in very nicely as a long-tail mashup tool for the prosumer who needs to divine something from their data. Clearly a browser is not in the cloud, but being local does carry some benefits, such as:

  • Handling as much data as you throw at it, using the power and speed of the PC for processing and manipulation.
  • Securely mashing up local data, enterprise database data and web data (APIs, CSV bookmarks, RSS feeds, etc) and never needing to push the private business data to an external server.
  • Being extremely flexible and having an interface that is familiar to existing business users, similar to Access or Excel.
  • Offering extensibility, such that the long-tail prosumer folks can quickly knock out a JavaScript plug-in for an ad hoc app only needed at the departmental level.

There is a real beauty in the idea of mashups flourishing in the workplace. There is this certain intangible, ad hoc “thing” out there that every business person runs into at one time or another, which just can't be solved by a single over-arching IT project. This is why people still use spreadsheets for everything. And this is why it'll be fascinating to see how mashup tools will be applied by these ingenious long-tail workers to boost productivity and efficiency in the coming years.

P.S. As a quick aside, it is interesting to see the parallels between this discussion and the “last mile of business intelligence” that we talked about previously. Maybe they're just different sides of the same coin. Hmmm, this may require another blog post in the future…

Playing Nice with Yahoo Pipes

Wednesday, October 10th, 2007

Yahoo Pipes LogoYahoo Pipes is a pretty slick tool that makes it easy to combine and mash up data sources from around the web and then output the data into formats like RSS and JSON. One of the really nice things is its interface, which lets non-programmers lurk and meddle in this otherwise fearsome domain.

Today I came across a post by tagaficionado Jon Udell who was looking for a way to combine multiple feeds (based on a single tag) into a single feed for consumption. Within an hour an a half, a person named engtech created a Yahoo Pipe called Tagosphere to solve the problem. Pop in the tag you want, hit Run and get your results. Very cool.

To digress for a moment, one of the pet projects I've had on my (long) to do list is to use Kirix Strata™ to create an application that alerts me when someone references “kirix” in a blog post, article, or elsewhere on the web. I currently do this by subscribing to feeds from Google News, Google Blog Search, Technorati, Bloglines, Topix, Digg, etc. This is fine, but a bit clunky due to the many duplicate entries. It also is not comprehensive.

So the other thing I want to do is bring in my website referrers from AWstats or Google Analytics (or our raw apache web logs). Lots of times we'll see people coming to our site from blogs, forum posts or websites that never get picked up by those above-referenced feeds. So then, I would just need to combine all the data, remove duplicates, timestamp it… and now I have a pretty comprehensive idea of where the latest buzz is coming from.

So, the Tagosphere Pipe mentioned above is a pretty good start. I can create a feed for “kirix” and get a combined set of data with the duplicates removed. However, because I want to sort and filter this dataset, I need to get it into Strata. I could just manually go to the Tagosphere page in Strata and click on the RSS feed to get my table. However, because I'm looking at actually using this Pipe for a future application, I decided it would be nice to show a how Strata can work directly with the Yahoo Pipe via a script:

1. In Strata, go to File > New > Script.

2. Copy the following text into the script tab:

var t = new TextEntryDialog;
t.setCaption("Pipes Search");
t.setMessage("Please enter search term:");
if (t.showDialog())
var s = "";
s += t.getText();

3. Save the Script then go to Tools > Run Script/Query

As you see, a dialog opens where you can enter your tag. Enter the tag, click OK and up pops the feed in a table format.

This example is obviously very simplistic. But, if I then take it to its logical conclusion and bring in my referrer data, remove duplicates and run it on a regular basis, I've got my own personal Pub Sub. Even better, I can stand on the shoulders of giants by using all the great stuff already written in Yahoo Pipes or Dapper or anything else that exports data as RSS or CSV.

We've got a ton of ideas that we plan on sharing with everyone, but have really been really focused on getting the Strata beta fully functional and stable. Stay tuned though, more fun stuff to come soon…

P.S. If anyone wants to play around with using Yahoo Pipes with Strata and needs any help at all, please just shoot us a support email or post something in the forums. Also, if you come up with a cool app, let us know, we'd be thrilled to hear about it. Thanks!

Situational Integration

Tuesday, July 31st, 2007

ProgrammableWeb has a nice write up today about some of the challenges in the mashup tools market. It included a link to an excellent write-up of mashup platforms by Dan Hinchliffe of ZDnet. Dan writes:

Mashups could theoretically allow business users to move — when appropriate — from their current so-called "end-user development tools" such as Microsoft Excel that are highly isolated and poorly integrated to much more deeply integrated models that are more Web-based and hence more open, collaborative, reusable, shareable, and in general make better use of existing sources of content and functionality. Remember, business workers still spend a significant amount of time manually integrating together the data in their ever increasing number of business applications. Tools that could let thousands of workers solve their situational software integration problems on the spot themselves, instead of waiting (sometimes forever) for IT to provide a solution, is indeed a potent vision.

We agree.

We've seen time and again how business users need to integrate and work with data from different sources — although usually only with data internal to the company. However, as the web provides more and more useful information, people will also want to include external data as well. And, if normal people can do this on their own without much IT support, the potential for increased productivity and efficiency, not to mention new discovery, skyrockets.

We're currently exploring some of these possibilities with our recently-released beta of Kirix Strata™. What makes Strata unique is its ability to work seamlessly with data wherever it's located — whether a back-end database like Oracle or an Excel file on your desktop or a website, with or without an API. Much of our work is still cooking in our labs, but we'll be providing some concrete examples shortly. Stay tuned!


Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.