Archives for category: Uncategorized

Last year on the Jazz project, I helped design and implement a simple REST protocol to implement long-running operations, or long-ops. I’ve explained the idea enough times in random conversations that I thought it would make sense to write it down.

I’ll first write about the concrete problem we solved and then talk about the more abstract class of problems that the solution supports.

Example: Jazz Lifecycle Project Creation

Rational sells three particular team products that deal with requirements management, development, and test management, respectively. These products must work individually but also together if more than one is present in a customer environment. Each product has a notion of “project”. In the case where a customer has more than one product installed in their environment, we wanted to be able to let a customer press a button and create a “lifecycle project” that is basically a lightweight aggregation of the concrete projects (e.g. the requirements project, the development project, and the test project).

So we created a rather simple web application called “Lifecycle Project Administration” that logically and physically sits outside the products and gives a customer the ability to press a button and create a lifecycle project, create the underlying projects, and link everything together.

This presented a couple of problems, but I want to focus on the UI problem that pushed us towards the RESTy long-op protocol. Creating a project area can take between 30 seconds to a minute, depending on the complexity of the initialization routine. Since the lifecycle project creation operation aggregated several project creation operations plus some other stuff, it could take several minutes. A crude way to implement this UI would be to just show a “Creating lifecycle project area, please wait” message and perhaps a fakey progress monitor for several minutes until all of the tasks complete. In a desktop UI operating on local resources, you would use a rather fine-grained progress monitor that provides feedback on the set of tasks that need to run, the current running tasks, and the current percent complete of the total task.

We brainstormed on a way that we could come up with something like a progress monitor that could show fine-grained progress while running the set of remote operations required to create a lifecycle project and its subtasks. The solution was the RESTy long-op protocol. First I’ll talk about how one would typically do “normal, simple RESTful creation”.

Simple RESTy Creation

A common creation pattern in RESTful web services is to POST to a collection. It goes something like this:

Request

POST /people HTTP/1.1
Host: example.com

{
    "name": "Bill Higgins",
    "userId": "billh"
}

Response

HTTP/1.1 201 Created
Location: http://example.com/people/billh

The 201 status code of course indicates that the operation resulted in the creation of a resource and the Location header provides the URI for the new resource.

From a UI point of view, this works fine for a creation operation that takes a few seconds, but not so well for a creation operation that takes several minutes, like the lifecycle project administration case. So let’s look at the RESTy long-op protocol.

The RESTy Long-op Protocol

In this example, I’ll use a simplified form of lifecycle project creation:

Creation Request

POST /lifecycle-projects HTTP/1.1
Host: example.com

{
    "name": "Bill's Lifecycle Project",
    "template": "com.ibm.team.alm.req-dev-test"
}

Just to explain the request body, the name is simply the display name and the template is the ID of a template that defines the set of concrete projects that should be created and how they should be linked together.

Here’s what the response looks like:

Response

HTTP/1.1 202 Accepted
Location: http://example.com/jobs/5933

Rather than responding with a URL for a resource that was created, the server responds with a 202 'Accepted' status, and the location of a “Job” resource, that basically reports on the status of the long-running task of creating (or updating) the resource.

Now the client polls the location of the “job”; the job is a hierarchal resource representing the state and resolution of the top level job and the sub-jobs (called ‘steps” below). It also includes a top-level property called resource that will eventually point to the URI of the resource that you are trying to create or update (in this case the lifecycle project).

Job Polling Request

GET /jobs/5933 HTTP/1.1
Host: example.com

Job Polling Response

HTTP/1.1 200 OK

{
    "title": "Creating lifecycle project 'Bill's Lifecycle Project'",
    "state": "IN_PROGRESS",
    "resolution": null,
    "resource": null,
    "steps": [
        {
            "title": "Creating requirements project",
            "state": "COMPLETE",
            "resolution": "SUCCESS"
        },
        {
            "title": "Creating development project",
            "state": "IN_PROGRESS",
            "resolution": null
        },
        {
            "title": "Creating project linkages",
            "state": "NOT_STARTED",
            "resolution": null
        },
        {
            "title": "Creating lifecycle project",
            "state": "NOT_STARTED",
            "resolution": null
        }
    ]
}

At some point the top-level task has a non-null resolution and a non-null resource, at which point the client can GET the resource, which is the complete URI for the original thing you tried to create/update (in this case the lifecycle project).

GET /lifecycle-projects/bills-lifecycle-project HTTP/1.1
Host: example.com

(I’ll omit the structure of the lifecycle project, as it’s not relevant to this discussion.)

Demo

Here’s a demo I recorded of an early version of Lifecycle Project Administration last year, that shows this protocol in action:

Uses

This protocol supports a set of related patterns:

  • Long-running operations
  • Asynchronous operations
  • Composite tasks

You can use this protocol to support one or a combination of these patterns. E.g. you could have a single task (i.e. not a composite) that takes a long time and therefore you still want to use an asynchronous user experience.

Critique

Here are a few good things about this protocol:

  • Facilitates better feedback to people who invoke long-running, perhaps composite operations, through your UI.
  • Decouples the monitoring of a long-running composite operation from its execution and implementation; for all you know the composite task could be running in parallel across a server farm or it could be running on a single node.
  • Supports a flexible user experience; you could implement a number of different progress monitor UIs based on the information above.

Here are a few not-so-nice things about this protocol:

  • Not based on a standard.
  • Requires some expectation that the original create/update request might result in a long-running operation, and the only way you have to know that it’s a job resource (vs. the actual created or updated resource) is by the 202 Accepted response code (which could be ambiguous) and/or by content sniffing.
  • Doesn’t help much with recovering from complete or partial failure, retrying, cancelation, etc. though I’m sure you can see ways of achieving these things with a few additions to the protocol. We just didn’t need/want the additional complexity.

Implementation Notes

I would like to write a bit about some of the implementation patterns, but I think this entry is long enough, so I’ll just jot down some important points quickly.

  • Your primary client for polling the jobs should be a simple headless client library type thing that allows higher level code to register to be notified of updates. In most cases you’ll have more than one observer (e.g. the progress widget itself that redraws with any step update and the page that updates when the ultimate resource becomes available).
  • Your backend should persist the job entries as it creates and updates them. This allows you to decouple where the tasks in the composite execute from where the front-end can fetch the current status. This also allows you to run analytics over your job data over time to understand better what’s happening.
  • The persistent form of the job should store additional data (e.g. the durations for each task to complete) for additional analytics and perhaps better feedback to the user (e.g. time estimate for the overall job and steps based on historical data).
  • Of course you’ll want to cache all over the place on the job resources since you poll them and in most cases the status won’t have changed.

In Closing

I don’t think this protocol is perfect, and I’m sure I’m not the first one to come up with such a protocol, but we’ve found it useful and you might too. I’d be interested if anyone has suggestions for improvement and/or pointers to similar protocols. I remember I first learned about some of these basic patterns from a JavaRanch article my IBM colleague Kyle Brown wrote way back in 2004. 🙂

Updates

Pretty much as soon as I published this, several folks on Twitter cited similar protocols:

Thanks very much William, Sam, and Dims.

I have never been a big fan of making public predictions about what might happen with the industry, a company, or a technology. I strongly agree with Alan Kay‘s famous quote: “The best way to predict the future is to invent it.” Of course, inventing the future is hard, especially if you’re spending precious time writing articles stating in unequivocal terms what will happen in the future (e.g. “Why and how the mobile web will win”).

Of course, none of us know precisely what will happen in the future (tired clichés aside), especially for things as complex and volatile as the economy or the technology industry. I frankly am baffled why people continue to make such confident and sometimes smug predictions on top of shaky or non-existent logical foundations. Luckily the web makes it easy to record these predictions and compare them to what really happened in the fullness of time, so there is some measure of accountability.

Of course, this doesn’t mean that it isn’t worth reasoning about potential futures – as long as you follow some simple guidelines:

  • State your evidence – What historical examples, current data, trends, and published plans lead you to your conclusions?
  • State your assumptions – What things have to happen for the potential future to become reality? What makes you think these things will happen?
  • State your conflicts of interest – Do you have something to gain if your predicted future becomes reality?
  • State your confidence level – Where are you on the continuum from wild-ass guess to high-probability outcome?

Another question to ask yourself is “Should you prognosticate publicly or privately?” I believe it’s very helpful to prognosticate privately (e.g. within a company) to help drive product strategy and semi-publicly to help customers chart their course (though in this case stating conflicts of interests is very important, for the obvious ethical reason and for the pragmatic goal of building customer trust). What I personally despise is predicting some future that aligns with your financial and/or philosophical interests and not stating the conflict of interest. It’s fine to advocate for some preferred future, but if you do so please be honest about your motivations – don’t dress up advocacy as prognostication.

Finally, if you have made prognostications, you should periodically perform an honest assessment of what you got right, what you got wrong, and why. Your retrospective should be at least as public as your prediction and you should be brutally honest – for one thing it’s unethical not to be brutally honest and for another thing people will quickly detect if you’re being honest or hedging which will obviously cause them to trust you more or less, respectively.

I originally planned to link to some of the prognosticating articles that put me in this obviously grumpy mood, but I’ve decided not to because A) Why promote trash? and B) I assume other people can think of plenty of examples of this sort of thing. Instead I will point to someone who I think does a great job of doing the reasoned, data-driven prognostication that I find incredibly valuable, Horace Dediu and his web site covering mobile information technology.

I don’t get angry very much. I’m usually pretty upbeat and when I hit something weird, scary, or uncool, I get stoic rather than upset. Yesterday a set of emails within IBM led me to get upset and I let the person know I was upset.

The unfortunate and somewhat funny thing (in hindsight) is that I had actually misinterpreted the person’s statements and intent so I actually got upset about something they didn’t really say.

The good news is that we worked it out, laughed about it, and I apologized for losing my cool. We reflected that the root cause was simply that email remains a crappy communications mechanism for anything but simple conversations, especially when you don’t know the people with whom you’re emailing very well.

This led me to make a personal resolution to myself: I won’t ever again react in anger to something I read in email (or a bug report or any other form of quickly composed written media). If I read something that makes me upset I will give the other person the benefit of the doubt and get on the phone or if possible walk over to their desk.

Then maybe I will find something to really get angry about 🙂

But then again, maybe I’ll find it was just a misunderstanding.

Hi. I normally don’t write directly about something I’m going to be working on because I hate vaporware, but in this case it’s necessary.

I am going to be running an IBM Extreme Blue project this summer where the intent is to build some technology to learn about how we can use Big Data techniques to analyze DevOps data. We will be using the Hadoop family of technologies for the data crunching and REST / Linked Data to help gather the data to be crunched.

It’s a twelve week project, I’ll be mentoring but the students really drive it, and we’ll also be getting a little help from my friends – my Tivoli colleagues Don Cronin and Andrew Trossman (who created some schweet new cloud technology we’ll be using), the Jazz team, Rod Smith and David Sink’s JStart team, and possibly some folks from the Yorktown Research lab.

Speaking of being crunched, we have a compressed schedule to find candidates and we already have some great candidates interviewing, so if you’re interested and you fit the profile below, please feel free to apply.

Here are the details:

  • When: May 23rd to Aug 12 2011
  • Where: Raleigh, NC
  • Requirements: Formal requirements here; my requirement: must be super passionate about technology and building great software
  • Pay etc.: Competitive pay, furnished apartment, travel/relocation, trip to IBM HQ in Armonk for a Big Demo, IBM swag (because you know you want it)

Basically I plan on making it a very intense and fun project. We will all learn a lot and have a lot of fun.

Again, if this sounds good to you and you fit the requirements, please feel free to apply … soon.

PS – I’ve asked the Extreme Blue folks to order Macs for the interns’ computers (if that’s important to you).

I’ve been working on a little Java web framework [1] for an exploratory work project. I am building the framework and a sample app as a set of OSGi bundles to drastically reduce API surface area between components [2]. This also makes it easy to run my sample app directly within base Eclipse, using Eclipse’s built-in support for OSGi framework launches and a bundle-ized version of Jetty.

This configuration raises an interesting problem though, how do you inject the application code into the framework, since the framework obviously can’t statically depend on the application code, and OSGi will only let Java classes “see each other” [3] if one bundle makes a package (or packages) visible via an Export-Package manifest declaration (e.g. Export-Package: my.api.package) and another bundle declares an explicit dependency on that package via an Import-Package declaration (e.g. Import-Package: my.api.package)? In other words, how will you avoid hitting java.lang.NoClassDefFoundErrors when trying to load the application code via reflection?

I sure didn’t know. Luckily I have a good buddy here at IBM in Research Triangle Park named Simon Archer who is an OSGi and Equinox expert [4], so I ran the problem by him. He told me about an OSGi manifest declaration I had never heard of called DynamicImport-Package. My assumption that you can only get at code via explicit Import-Package declarations was actually wrong.

Simon explained that the way DynamicImport-Package works is that it basically allows a bundle to say “I want to be able to access any class that is part of an exported package in the runtime environment. So let’s say I have two bundles: bill.framework and bill.sampleapp. I want the code in bill.sampleapp to run inside the web framework implemented in bill.framework, but I obviously don’t want the bill.framework code to have a static (class-level) dependency on the bill.sampleapp code since the whole reason I’ve designed it as a framework is to allow build-time composition of arbitrary applications built on the framework [5]. So I put the following in bill.framework‘s MANIFEST.MF file:

DynamicImport-Package: *

Then in my Sample App bundle’s MANIFEST.MF file, I put my application class in a package [6] that I export to the OSGi environment:

Export-Package: bill.sampleapp.app

Now the framework is able to dynamically load the sample app via reflection:

// MyFramework.java
String appClassName = System.getProperty("bill.app.classname");
IApplication app = (IApplication)Class.forName(appClassName).newInstance();

Voilà!

Footnotes:

[1] I know, because what the world needs now is another Java web framework. But as I observed in a journal entry, every framework is evil, except mine.

[2] Note that the framework itself doesn’t use or depend on OSGi. I build the bundles into a set of simple JARs that can run as part of a JEE web app or as standalone Java application again using an embedded Jetty web server.

[3] For a great primer on building modular Java applications with OSGi, see the recent book “OSGi and Equinox: Creating Highly Modular Java Systems” by McAffer, VanderLei, and Archer.

[4] E.g. Simon co-wrote the book mentioned in [3]. He is “Archer” 🙂

[5] Yes, I know. Most people call this pattern “dependency injection”. For the full treatise, see Fowler.

[6] The fact that you have to export the package for the code that you want to dynamically load wasn’t immediate obvious and Simon and I spent approximately twenty minutes staring at the screen wondering why we were getting java.lang.NoClassDefFoundError even though we were using DynamicImport-Package: *. After some unfruitful Googling, we decided to check out some bundle details using the OSGi console in Eclipse. As we were looking at the details for the sample app, I got the at the time unintuitive idea to try exporting the Sample App package. Sure enough this fixed it. Simon and I had a bit of a debate about whether or not it made sense to have to export application code since this effectively declares the application code to be API, which seems wrong – i.e. typically an application sits on top of a stack of code and depends on lots of stuff, but nothing depends on it.

But eventually we came to a reason that makes perfect sense for exporting the application code: If you didn’t have to explicitly export the code, theoretically any OSGi bundle’s code could get access to any other bundle’s code simply by declaring DynamicImport-Package: * and loading random classes via reflection, defeating the whole purpose of the OSGi modularity system. So by requiring that the to-be-dynamically-loaded class be available to the environment via an explicit Export-Package declaration you are still playing by the “normal rules” and just using reflection rather than static instantiation to poof up objects.

Of course this means that you should minimize your API surface area for the application class, so I put mine in its own package and its only public methods are those from the framework interface that it implements.

Good fences FTW!

I’ve started a new blog, which I am simply calling “Bill’s Journal”. It will contain daily ramblings and half-formed thoughts but hopefully will also feature much more frequent updates vs. this blog.

I’ve written a first entry just now: “my commonplace book“.

It depends.

I have a lot of friends who read this blog who are quite smart technically. When they read this, they are bound to say “Whoa Bill, you really did some really stupid stuff!”. To this I will preemptively respond “You are correct”.

Anyway, here is my tale of data woe and stupidity.

Recently, work was kind enough to buy me a new SSD drive, so I took out the slower SATA drive and replaced it with the much faster SSD drive. Because the SSD drive is somewhat smaller, I decided to move my iTunes media library from my laptop to a 2TB external hard disk attached to my home iMac [1]. Because I am OCD about my system being as cruft-free as possible, I installed a fresh version of Snow Leopard on my new SSD rather than doing some sort of system restore (e.g. from Time Machine) of my prior system.

Because I was reconfiguring everything and because I was pretty sure that all my data was present either on my laptop or my iMac, I blew away the Time Machine backups that were also on the same external hard drive. This was mistake #1.

A couple of days later I realized that I had forgotten to copy over a couple of movies that I had bought from iTunes recently from my old laptop hard drive, which was now collecting dust on an office shelf since I had replaced it with the new SSD. One evening – a few days into Christmas vacation - at approximately 12:30 in the morning, right before I was about to go to sleep, it for some reason became very important that I get the movies off of the old hard drive. So I put the laptop to sleep, took out the SSD, put in the old hard drive again, and woke up the laptop.

Did you spot mistake #2? This was a doozy. I’ll repeat part of what I said above:

So I put the laptop to sleep…

That’s right, I hadn’t shut down the computer when I swapped in the old hard drive. Can you guess what happened? I don’t really recall because I was sleepy and I had had a glass of wine (or two) that night. I said it was vacation, so nightly wine consumption is inferred. I’ll call this sort of work in this sort of mental state mistake #3. All I recall was that OS X was quite confused, and so was I for a minute, until I realized that I had effectively performed a brain transplant on my Mac without shutting it down. I said “Oops” and shut down the computer and tried to start it up.

It wouldn’t start. I can’t recall the error but I’m pretty sure I saw a terminal instead of a GUI, which any semi-technical Mac user will understand implies that you’ve managed to fuck things up real good.

At this point I remember suddenly feeling significantly more awake and sober.

I shut down the computer again and put the new SSD back in. I got lucky and the only fallout was that it had to do some data integrity checking but was otherwise fine. At this point I decided to cut my losses and put the old hard disk back on the shelf and went to sleep.

Interestingly at this point, I hadn’t realized that I had just irreparably harmed my old hard drive. For reasons I don’t fully understand, I came to the half-assed conclusion that I had probably just not connected it 100% right and this is why the system wouldn’t boot with it, even after the proper shutdown/restart cycle.

A few days later my son asked me “Daddy, where is <episode such and such> of ‘The Clone Wars‘”? I looked and low and behold, it was gone. So were 10 other Clone Wars episodes. So were just about every movie and music album I had bought from iTunes in the previous several months [2]. Normally at this point I would look in either 1. the Time Machine backup, or 2. my laptop hard drive but I had blown away 1. and 2. was sitting back on my shelf in an unknown but bad state.

So I did some research and ordered a SATA-to-USB enclosure [3] for the old hard drive so that I could take another crack at its data without having to either 1. use it to drive my laptop, or 2. to have to crack open my MacBook Pro again [4].

When the SATA-to-USB enclosure arrived, I took the old hard drive off the shelf and stuck it into the enclosure and plugged it into my iMac. It didn’t auto-mount as you would expect from a healthy external drive, so I opened OS X’s Disk Utility. Basically Disk Utility could tell it was a hard drive formatted as HFS+, but that’s about all it knew. It bombed out on any disk operation. I momentarily anthropomorphized Disk Utility and imagined that it was looking down on me and my sorry hard disk with a mixture of scorn and pity. “I’ll do what I can, but for God’s sake Jim, I’m a doctor, not a …”

I digress.

I did some Googling on using Disk Utility to fix messed up disks, and discovered the “Repair” button. I immediately clicked this button. It performed some serious grindage, but ultimately said something like (paraphrased) “Your disk is b0rked. You should try to salvage anything you can and then format it.” Mistake #4 (which continues in the next paragraph) was that I started operating on the b0rked disk without understanding what I was doing, or why.

It was at this point that it finally dawned on me that when I had swapped in the old hard drive after only putting the laptop to sleep (vs. shutting down), it had received some pretty severe brain damage. My immediate (and current) theory was that when OS X woke up, it got terribly confused by finding a different hard drive that had a very similar on-disk OS configuration and did it’s best to fix things, but ended up corrupting things, because it was running in a state its developers would never imagine someone would be stupid enough to enter.

Anyhow, at this point I started to consider my options. I did a quick calculation and guessed I was missing about $200 worth of re-purchasable iTunes purchases, which is frankly pretty minor in terms of data loss – I could have lost something truly valuable like pictures of my kids being born. Also, I was pretty sure that the hard drive was actually fine from a hardware perspective so the loss was pretty minimal. At this point the data recovery operation became more of a personal challenge than a necessity.

At this point I also started to realize how careless I had been and how lucky that I hadn’t lost actually valuable data nor damaged my beloved new $700 SSD. So I tweeted a twoparter observing my silliness [5] and asked the Twitterverse for advice on advanced data recovery utilities. My Jazz buddy Jason Wagner immediately called out DiskWarrior, which upon examination, had a strong testimonial from Mac Übermensch John Gruber, which was enough to get me to fork over $99 for the software.

You might wonder if I performed some sort of cost benefit analysis before choosing to buy DiskWarrior. The answer is “no”; my rationale at this point was simply that I had become fixated on defeating the b0rkage and salvaging some data. Dammit. One sort of weird thing I noticed while purchasing DiskWarrior that would make sense about ten minutes later was its very strong terms and conditions along the lines of “I ACKNOWLEDGE THAT ALL SALES OF DISKWARRIOR ARE FINAL AND IN NO CIRCUMSTANCES WILL I BE GRANTED A REFUND”.

So I bought and installed DiskWarrior (painless), glanced at the manual, and ran it. Just like Disk Utility, it told me my disk was FUBAR and suggested contacting Alsoft technical support for more options [6]. This is obviously why all DiskWarrior sales terms and conditions are so strict on finality of sale – you only buy the thing if you’re desperate with a data loss situation, you probably only ever use it once and it’s pretty deeply unsatisfying if it doesn’t work! But I didn’t actually get upset at all – it was more a feeling of resignation. I knew I had screwed things up real good and I had heard from enough smart people at this point that DiskWarrior was good software to figure that the data on this disk just wasn’t going to be salvageable without probably pulling in data recovery experts, which wasn’t worth it for $200 worth of re-purchasable iTunes content.

I threw one last hail mary by sending a note to Alsoft technical support, but in the two days of high-latency email exchanges [7] I discovered a path to recovery for all of my missing iTunes content.

Over the Christmas break I had a quite long support email chain with a very helpful iTunes support person named Marlee on a peculiar iBooks problem. Marlee had been so helpful and friendly on the iBooks issue that I thought I would piggy-back my data loss issue on the iBooks email thread. I was under the impression that you have to repurchase any iTunes music/movies that you lose, so I thought Marlee would be doing me a favor if she did anything, on the basis that I was very helpful and friendly on my side of the iBooks issue.

ANYWAY, I replied to the iBooks email thread with a very friendly and self-deprecating admission of data management stupidity, and embarrassed request for a special favor to help me get back at least the Clone Wars episodes that my son was missing greatly [9]. For reasons I don’t understand Marlee didn’t respond but rather a very friendly Apple employee named Raghavendra responded telling me that if I would just tell him my order number for the purchase, he would post it once again to my account [10]. I was a bit embarrassed given that I had sent a somewhat personal note to one iTunes support person and received a response from an entirely different iTunes support person but this embarrassment was trumped by my glee and surprise that I was apparently completely wrong in my assumption that I had to repurchase all of the lost iTunes content.

Short end of an overly-long story: I formatted the old laptop hard drive (no problems), sent additional emails to iTunes support to get my other lost content (no problems), and ordered an additional 2 TB external backup drive for a to-be-formulated-but-surely-more-rigorous data backup strategy.

I guess these are the takeaways for me:

  • Even if you’re fairly technical, err on the side of extreme caution whenever data is involved, especially if you’re messing with your data backups.
  • If you realize you have a data situation, do nothing until you think it through, despite the strong urge to “do something”.

Footnotes:

  1. Yes, my iTunes media library theoretically should have always been on a home computer, but that’s another story.
  2. I still don’t understand why only the last several months of iTunes content was missing. It should have been with the rest of the media on my external hard disk, but I guess that’s the sort of reasons that causes us to ensure we have multiple copies of things as part of our backup strategy. It probably has something to do with the overly complicated move of my media library from one location to another. iTunes is definitely not optimized for this operation.
  3. I ordered this SATA-to-USB adapter/enclosure. It worked well.
  4. A MacBook Pro is only slightly easier to open than a bank vault.
  5. I try hard not to do silly things but am quite wiling to laugh at myself when I do.
  6. To be completely fair to Alsoft’s DiskWarrior product, there is a chance that had I not mindlessly run the Disk Utility “Repair” function on the b0rked disk, DiskWarrior might have been able to fix it. Based on later reading of DiskWarrior manuals, it became clear that the more you try to fix your b0rked disk before you let DiskWarrior take a shot at it, the more likely you’ll do additional damage. Bottom line: When you’re trying to recover data, think very carefully about your options before acting as your odds of successfully recovering your data might drop with each successive flail.
  7. The latency in the email exchange was a function of them being six time zones ahead of me (I think) and also my being very busy with an OSLC presentation [8].
  8. More on the topic of presentations in a future entry.
  9. Not because he actually really wanted to watch them badly, more just because he didn’t like the thought that they were missing.
  10. For reasons I don’t understand, iTunes support’s only failing is lack of a “Search” function through customer order histories, instead relying on the customer being able to track down order numbers themselves. I can only assume that Apple doesn’t think such a function is important in the grand scheme of things, otherwise they could have implemented it like eight years ago.

After the recent fuss over Yahoo potentially shutting down delcious, I thought it might be wise to automatically back up my delicious bookmarks on a regular basis. Here’s what I did:

1. Created a shell script to download delicious bookmark data to my Dropbox folder

cat $HOME/Development/Scripts/backup-delicious.sh
curl -s https://BillHiggins:MyPassword@api.del.icio.us/v1/posts/all > $HOME/Dropbox/delicious-bookmarks.xml

This sucks down an XML file of current bookmark data and drops it in my Dropbox folder that is automagically backed up to the Dropbox cloud.

2. Created a cron job to automatically run said script each week

crontab -e
# backup bookmarks every Friday at 2pm
0 14 * * 5 $HOME/Development/Scripts/backup-delicious.sh

7 April 2011 Update: My friend and former colleague Richard Backhouse has written an excellent companion to this blog entry talking about how he actually implemented many of these patterns in Jazz and Zazl.

Warning: The following blog entry will definitely be unintelligible to readers who are not software developers.

It may possibly be unintelligible to readers who are software developers. 🙂

Motivation

Between mid-2005 and the end of 2009, I worked in IBM Rational on the software infrastructure for the Rational Jazz browser-based user interfaces, hence “web UIs”. For various reasons I decided that we should provide an “extreme Ajax” architecture [1] which required us to have to load a large amount of JavaScript and CSS. Since people don’t like UIs that load slowly, we [2] spent a lot of time exploring patterns and techniques that would cause the JavaScript and CSS code to load as quickly as possible.

Recently an IBM Software Group Architecture Board workgroup asked me to document some of these techniques, and based on the positive response my internal write-up received, I thought I would tweak the write-up and publish it externally.

Preamble: Modular Software Development

In the “Motivation” section I mentioned that the design decision to build “Extreme Ajax” UIs led to a technical problem of needing to load a large amount of JavaScript and CSS code quickly. Users of course don’t care how you load code, they just care that the UI loads quickly. Theoretically you could solve the code loading problem by coding a single large JavaScript file and a single large CSS file, but of course developing in this way would eliminate all of the benefits of modular software development, which have been discussed in depth elsewhere [3].

The only reason I mention this point is that the practice of developing modular Ajax software complicates the task of making the code load quickly, as the following sections will show.

Overview of Optimization Techniques

Fundamentally, there are only a small number of things you can do to make Ajax code load faster:

  • Deliver Less Code
  • Concatenate a Large Number of Files into a Smaller Number of Files
  • Compress Code Content
  • Cache Code Content

The following sections address each of these techniques in some detail.

Deliver Less Code

Though it may sound trite, the simplest way to improve code loading performance is to deliver less code. This can be accomplished in two basic ways:

  1. Design simpler web applications that require less code
  2. Do not load code until it is needed by the UI

It’s outside the scope of this topic to discuss the pros and cons of simpler vs. more complicated web UIs, so I won’t say much about 1. other than to re-observe that speed is a feature and that little bits (or large bits) of functionality tend to add up and make a web page slower to load.

The second topic is a bit more mechanical and thus fits better into this blog entry. In a nutshell, the code required to power a particular UI is often much smaller than the total product code base, especially for feature-rich products like Rational Team Concert [4]. A common optimization technique in any system that has this characteristic [5] is to defer loading code until you know you need it. This technique is usually referred to as “lazy loading”.

One implements lazy loading by first understanding the relationship between a UI part (e.g. a web page, a dashboard widget, an Eclipse view, etc.) and the code required for that part to run. This requires that your programming language or framework have some notion of modules and module dependencies. Although the JavaScript language proper has no such construct, in IBM we use a JavaScript toolkit called Dojo that provides a module construct. Basically each JavaScript file can declare a module name (like “com.ibm.team.MyUIPart”) and also can declare dependencies on a number of other modules (like “com.ibm.team.MyUIPartController”). The set of all modules and their dependencies allow you to build an internal representation of the modules’ dependency graph [6]. Once you have the dependency graph, you only need a mechanism for defining UI parts and their top-level dependencies and then you can quickly and easily walk the dependency graph calculate the complete set of modules required to execute the UI part.

For a fuller explanation of computing JavaScript and CSS dependency graphs, see Appendix B below.

There is some complexity involved with the dependency calculations to implement lazy loading, so it’s only worth pursuing if you can defer loading a large amount of unnecessary code. This was true in Rational Team Concert where I estimate a given UI (like the bug submission form) probably contains less than 5% of the total code base.

The simplest lazy loading approach is to consider the web page itself the “UI part” and thus load all of the JavaScript code that the page needs as a simple <script> tag when the page loads. However sometimes it’s necessary to load additional code later in a page’s lifecycle. This leads to some more advanced lazy loading techniques.

Consider a “dashboard UI” that aggregates a bunch of little UI widgets in a single web page. Using the simple lazy loading approach described above, we could calculate the total code needed by the dashboard by considering the dashboard framework and the set of all dashboard widgets to be the UI parts, and unioning the JavaScript modules transitively required by each of these.

However a common feature of a dashboard is to allow a user to add new dashboard widgets to the page. If we have a large number of possible widgets we probably don’t want to load these whenever we load the dashboard, especially when you consider that probably 95% of the time the user will not modify the dashboard. So how do we lazily load the code for the new dashboard widget? Well, as you’ve probably guessed, it’s just a matter of performing set subtraction between the currently loaded set of modules and the set required by the new dashboard widget, then loading the difference (i.e. the modules that are needed but not yet loaded).

Although this deferred lazy loading technique is easy to describe, it’s a bit tricky to implement, so I recommend you stick with the single file loaded as part of the top-level page. But this of course begs the question “How to go from a set of fine-grained modules to a single monolithic file?” This is the topic of the next section.

Concatenate a Large Number of Files into a Smaller Number of Files

Each JavaScript file or CSS file that is loaded requires its own HTTP request and load processing by the browser. Therefore each load of a JavaScript or CSS file introduces a non-trivial amount of latency, bandwidth, and local CPU overhead that delays the user from actually using the web application. A common technique for reducing this sort of overhead is to batch – that is turn a large set of small resources into a small set of large resources. As with the “lazy loading” technique, the most common approach to concatenate files is to first determine the subset of all files needed, and then to append them one after another after another into a single file. Though concatenation is conceptually simple, there are several design considerations.

The first consideration is “What should be the granularity of the concatenated file?” An extreme answer to this question is “A single file containing only the modules required by the user’s immediate UI”. This is the approach that the Jazz Web UI framework takes. Another approach would be “Several logical layers of file sets, in the hopes that the layers can be reused by multiple UIs and therefore benefit from cache hits”. This is the approach taken by the Dojo build system. It is hard to judge the pros and cons since the efficacy of each depends on other factors like the nature of the UI, the nature of the layers, and the usage of layers across different UIs.

A second consideration is concatenation order. This is important because one module might require the presence of another module to function correctly. This can be easily solved by concatenating (and therefore loading) modules in reverse dependency order. I.e. the module that depends on nothing but is depended upon by everything loads first while the module that depends on everything but is depended upon by nothing loads last. This is also important in a web UI that uses Dojo since the Dojo module manager will make expensive synchronous XHR requests if it determines that a module requires another module that is not yet loaded. If you load your modules in reverse dependency order, each dojo.require statement becomes a noop.

Reducing the number of module HTTP requests via concatenation will reduce the bandwidth overhead incurred by many HTTP requests, it does not help with the bandwidth required to load a very large file. However, it is possible to significantly reduce the raw bytes of required code by compressing the code content. I cover this in the next section.

Compress Code Content

Imagine that you’ve taken 50 JavaScript files and 40 CSS files and concatenated them down to a single JavaScript file and a single CSS file. Each of these files may still be huge because of raw code size plus the size of whitespace and comments. There are two ways to makes these files smaller:

  1. Gzipping
  2. Minification (JavaScript only)

Enabling gzip when serving JavaScript and CSS probably provides the best return on investment of any of these techniques. In Jazz we often see files get 90% smaller simply by running them through Gzip. When you’re delivering hundreds of kilobytes of code, this can make a large difference. There is some overhead required to zip on the server-side and unzip on the client-side, but this is relatively cheap vs. the bandwidth benefits of the Gzip compression. Finally, Gzipping has the nice characteristic that it is a simple transformation that has no impact on code content, as observed by the ultimate recipient (the browser runtime).

Minification is the process of stripping unnecessary code content (e.g. whitespace and comments) and renaming variables to shorter names in a self-consistent way. Unlike Gzipping, minification is a unidirectional transformation (you never “unminify”). The following code demonstrates minification.

// A simple adder function
function add(firstNumber, secondNumber) {
    var sum = firstNumber + secondNumber;
    return sum;
}

… might be transformed into …

function add(_a, _b) { var _c = _a + _b; return _c; }

Note that you can only minify tokens that meet the following criteria:

  • Their semantics don’t change when you rename them. By this rule you cannot rename ‘function’ since it is a JavaScript programming language keyword.
  • You can consistently rename all instances of the token across the entire set of loaded code. Because of this it’s dangerous to rename API names like ‘dojo’ or CSS class names since it is hard to find all references to these names. Therefore usually only local variables are renamed, but this can still yield a non-trivial savings.

Although it should be obvious, it is possible to use both minification and gzipping together, though you must always minify before gzipping.

Cache Code Content

A final high-yield technique to improve code loading performance is to cache responses to reduce either bandwidth used or latency. There are two basic caching techniques [7]:

  1. Validation-based caching – Where you check each time to see if you have the most recent version of some document, and if you already have the most recent version you load it from your cache rather than fetching the document again.
  2. Expiration-based caching – Where the document server tells you that a certain document is good to use for some specified period of time. If you need to load the document and its expiration date is later than the current time, you may load the locally cached version of the document without even asking if you have the most recent version.

Obviously expiration-based caching is going to be faster than validation-based caching [8] (because you don’t even have to ask) but it is trickier to implement because you have to know when code is going to change. Consider the following scenario:

BigCo updates its web site every six weeks and therefore sets expiration dates on all of its web code from +6 weeks from time of last deployment, which means that each user only has to load new code once after each deployment. However, two weeks after a certain deployment, BigCo discovers a nasty security bug in its code that forces an unexpected patch deployment. However, any customer that accessed BigCo’s web site since the previous planned deployment does not received the patch because their browsers have been told not to load the new code for another four weeks.

The solution we found to this problem for Jazz was to use validation-based caching on web pages, and expiration-based caching with versioned URLs on JavaScript and CSS files referenced within the web page. Here’s an example:

<html>
<head>
    <script type="text/javascript" src="../code.js/en-us/I20101005-1700"></script>
    <link rel="stylesheet" type="text/css" href="../code.css/en-us/I20101005-1700" />
</head>
</html>

In this example, the HTML page includes a script and CSS reference to files with versioned URLs where the version ID corresponds to the build in which the code was last change. Each code request responds with an expiration header of plus one year. If the code were updated, the value of the URLs would change which would prompt the browser to fetch the new file which again will contain an expires header of plus one year. By driving the state off of the URL rather than a last-modified header, we are able to use expiration-based caching safely and since in our Jazz applications the size of JavaScript and CSS are much larger than the size of the HTML pages, the vast majority of our code will be loaded from the user’s disk on subsequent visits to a web UI.

You can also take advantage of caching with CSS references to images. Basically you can rewrite any ‘background’ image URL to use a similar versioned URL and a long expires header. This can be an especially big win since some images get loaded many times.

Appendix A: Debug-ability

A related issue to optimization is the ability to debug the web application. Several techniques above (concatenation, minification) change the code content vs. how it appears in a developer’s workspace. In fact, raw optimized code is effectively impossible to debug because of its inscrutability. The basic solution to this problem is to enable a “debug mode” where the code is loaded in a less optimized form so that the code matches what the developer sees in his or her development workspace. In the Jazz Web UIs, we’ve made debug-ability a first class concern since day one, and you can enable it simply by appending ?debug=true to any web page. The page may load significantly slower, but this is tolerable since you only experience this when you explicitly ask for it (i.e. a user would never see it) and since you would otherwise not be able to debug the code.

Appendix B: Determining Dependency Graphs in JavaScript and CSS

Several of the techniques above (lazy loading and concatenation) depend on understanding the dependency relationships across all sets of code. This section describes the basic mechanics of building the dependency graph for JavaScript and CSS.

The abstract solution for any dependency graph is straightforward: determine what each module depends on via inline dependency statements or external dependency definitions and then compute the union of each module’s dependencies to build the dependency graph.

It is straightforward to compute a JavaScript dependency graph with Dojo. Simply treat each dojo.require statement as defining a unidirectional dependency relationship between the module containing the dojo.require statement, and the module referenced by the dojo.require statement. Each of these “A depends on B” dependency statements is an arc in the overall dependency graph, so once you’ve analyzed all of the Dojo-based modules, you have your JavaScript dependency graph. Obviously a similar technique could be used with non-Dojo modules either via comparable inline statements (foo.dependsOn(“bar”)) or via external dependency definitions (foo.js depends on bar.js).

Once you have the JavaScript dependency graph, you simply need to know the root nodes needed by any particular UI. Consider the following simple dependency graph (the arrow direction indicates the “depends on” relationship).

A -> B

B -> C

B -> D

C -> E

If a UI knows that it has a root dependency on module C (“Jazz Work Item Editor”), then it needs only load C and E. However, if a UI depends on module A (“Team Concert Web UI”),t hen it needs to load all of the modules. This introduces a secondary dependency relationship; the dependency relationship between some logical notion of “a UI” and “the set of top-level JavaScript modules required to run the UI”. In Jazz we express this relationship via server-side Eclipse extension points (e.g. “the page at /work-items depends on the module com.ibm.team.WorkItem.js and all of the modules depended upon by com.ibm.team.WorkItem.js”), however the relationship could be specified in any number of ways.

There is no common solution to build a CSS dependency graph, even with Dojo. Theoretically you could just externally define a bunch of dependencies between CSS files.

A.css -> B.css

B.css -> C.css

… and then use a technique similar to the one described above where each logical UI describes its dependency on top-level CSS module (or modules) and then you simply walk the dependency graph for the top level module (or modules) until you have all of the CSS.

In Jazz we took a slightly different approach. Rather than declaring CSS to CSS dependencies, we declare JavaScript to CSS dependencies and then derive the CSS dependency graph from the JavaScript dependency graph:

A.js -> B.js

A.js -> A.css

B.js -> B.css

Using this example we know that whenever a UI requires A.js, then it transitively requires the JavaScript A.js and B.js and the CSS A.css and B.css. The reason why we chose this approach is because it allowed us to only surface JavaScript as a top-level API and CSS dependencies can remain an internal implementation detail and therefore change without breaking anyone. For instance in the example above we can imagine B.js is a shared library delivered by team B and A.js is an application delivered by team A. When A gets loaded, all of the JavaScript and CSS provided by both teams A and B will be loaded, but team A need not care about the CSS delivered by team B.

A final issue of course is circular dependencies. Basically you have to decide how tolerant your system will be of circular dependencies and how it will try to recover from them. In Jazz I believe we try to tolerate them but loudly complain about them via WARN-level log messages so people usually eliminate them pretty quickly. In my view, a circular dependency is either a symptom of poor design, an implemnetation bug, or both.

Footnotes

[1] In hindsight I believe we went a bit too far with the “extreme Ajax” approach, but that’s for another blog entry.

[2] When I say “we” in this article I basically mean myself and Richard Backhouse, who collaborated on the design of the Jazz Web UI code loading infrastructure between 2005 and 2009. Though I was quite involved with the design, Richard implemented everything. Randy Hudson took over this code in 2010 and has added quite a few of his own ideas and improvements.

[3] My favorite writing on modular software development is Clemens Szyperski’s “Component Software, 2nd Ed.

[4] The very first Ajax code we wrote starting in early 2006 evolved into the Rational Team Concert web UI. Later we factored out a subset of this code to be the Jazz Foundation web UI frameworks and common components which were then used by a number of other Rational products. You can actually see Rational’s self-hosting instance of Rational Team Concert at Jazz.net, though this requires you to first register with Jazz.net (frowny face).

[5] E.g. the Eclipse Platform, from which I learned about lazy loading, and many other design patterns.

[6] It is obviously necessary to avoid circular dependencies between modules.

[7] I’ve written up a longer article on validation-based caching vs. expiration-based caching for anyone interested.

[8] It’s a bit trickier than it should be to make expiration-based caching work consistently across all browsers. The short version of the solution is to use every possible directive you can to tell the browser to use expiration-based caching; e.g. expires, cache-control, etc. Mark Nottingham has a blog entry with some more detail on this topic.