Playing Nice with Yahoo Pipes | Data and the Web

Data and the Web

Playing Nice with Yahoo Pipes

Yahoo Pipes LogoYahoo Pipes is a pretty slick tool that makes it easy to combine and mash up data sources from around the web and then output the data into formats like RSS and JSON. One of the really nice things is its interface, which lets non-programmers lurk and meddle in this otherwise fearsome domain.

Today I came across a post by tagaficionado Jon Udell who was looking for a way to combine multiple feeds (based on a single tag) into a single feed for consumption. Within an hour an a half, a person named engtech created a Yahoo Pipe called Tagosphere to solve the problem. Pop in the tag you want, hit Run and get your results. Very cool.

To digress for a moment, one of the pet projects I've had on my (long) to do list is to use Kirix Strata™ to create an application that alerts me when someone references “kirix” in a blog post, article, or elsewhere on the web. I currently do this by subscribing to feeds from Google News, Google Blog Search, Technorati, Bloglines, Topix, Digg, etc. This is fine, but a bit clunky due to the many duplicate entries. It also is not comprehensive.

So the other thing I want to do is bring in my website referrers from AWstats or Google Analytics (or our raw apache web logs). Lots of times we'll see people coming to our site from blogs, forum posts or websites that never get picked up by those above-referenced feeds. So then, I would just need to combine all the data, remove duplicates, timestamp it… and now I have a pretty comprehensive idea of where the latest buzz is coming from.

So, the Tagosphere Pipe mentioned above is a pretty good start. I can create a feed for “kirix” and get a combined set of data with the duplicates removed. However, because I want to sort and filter this dataset, I need to get it into Strata. I could just manually go to the Tagosphere page in Strata and click on the RSS feed to get my table. However, because I'm looking at actually using this Pipe for a future application, I decided it would be nice to show a how Strata can work directly with the Yahoo Pipe via a script:

1. In Strata, go to File > New > Script.

2. Copy the following text into the script tab:

var t = new TextEntryDialog;
t.setCaption("Pipes Search");
t.setMessage("Please enter search term:");
if (t.showDialog())
var s = "";
s += t.getText();

3. Save the Script then go to Tools > Run Script/Query

As you see, a dialog opens where you can enter your tag. Enter the tag, click OK and up pops the feed in a table format.

This example is obviously very simplistic. But, if I then take it to its logical conclusion and bring in my referrer data, remove duplicates and run it on a regular basis, I've got my own personal Pub Sub. Even better, I can stand on the shoulders of giants by using all the great stuff already written in Yahoo Pipes or Dapper or anything else that exports data as RSS or CSV.

We've got a ton of ideas that we plan on sharing with everyone, but have really been really focused on getting the Strata beta fully functional and stable. Stay tuned though, more fun stuff to come soon…

P.S. If anyone wants to play around with using Yahoo Pipes with Strata and needs any help at all, please just shoot us a support email or post something in the forums. Also, if you come up with a cool app, let us know, we'd be thrilled to hear about it. Thanks!

4 Responses to “Playing Nice with Yahoo Pipes”

  1. engtech says:

    You can do sorting with Yahoo Pipes itself. They have a sort module so you can sort by any item of data.

    A typical thing to do would to be extract key pieces of data from certain fields in the RSS, build a new field with the data to sort by, and then sort it.

  2. Ken Kaczmarek says:

    Unfortunately, I haven't played with Pipes very much at this point — thanks for the tip on sorting. It's quite an interesting tool for gathering data. As far as filtering/sorting, I'll be able to use Strata to do more of the ad hoc manipulation/analysis once I've got the data in hand. Thanks for putting the Tagosphere together, I was fun playing with it today.

  3. engtech says:

    Glad you liked it Ken. It was fun seeing my name pop-up in a blog I was already subscribed to.

  4. Ken Kaczmarek says:

    Cool! Hopefully we'll get you a lot more interesting stuff to read in the near future. We've just been focused on trying to hammer this software out, so we haven't gotten everyone here to talk about (and demonstrate) their crazy ideas… yet.


Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.