I have a three year-old son named Alexander who’s one hundred percent boy. The other night I was getting him ready for his bath. As soon as I took off his last piece of clothing, he ran away, waving his arms madly and yelling “Look everybody, I’m a naked boy!”

He ran to where his mom was sitting, put his hands on his hips, stuck his little tummy out as far as he could, and shouted “Look Mama, I have fat belly!”

I laughed but quickly realized the joke was on me when he looked my way and with all seriousness finished his thought “…like Daddy”.

There’s a theory called ‘The Uncanny Valley’ regarding humans’ emotional response to human-like robots. From The Wikipedia entry:

The Uncanny Valley is a hypothesis about robotics concerning the emotional response of humans to robots and other non-human entities. It was introduced by Japanese roboticist Masahiro Mori in 1970 […]

Mori’s hypothesis states that as a robot is made more humanlike in its appearance and motion, the emotional response from a human being to the robot will become increasingly positive and empathic, until a point is reached beyond which the response quickly becomes strongly repulsive. However, as the appearance and motion continue to become less distinguishable from a human being’s, the emotional response becomes positive once more and approaches human-human empathy levels.

This area of repulsive response aroused by a robot with appearance and motion between a “barely-human” and “fully human” entity is called the Uncanny Valley. The name captures the idea that a robot which is “almost human” will seem overly “strange” to a human being and thus will fail to evoke the requisite empathetic response required for productive human-robot interaction.

While most of us don’t interact with human-like robots frequently enough to accept or reject this theory, many of us have seen a movie like The Polar Express or Final Fantasy: The Spirit Within, which use realistic – as opposed to cartoonish – computer-generated human characters. Although the filmmakers take great care to make the characters’ expressions and movements replicate those of real human actors, many viewers find these almost-but-not-quite-human characters to be unsettling or even creepy.

The problem is that our minds have a model of how humans should behave and the pseudo-humans, whether robotic or computer-generated images, don’t quite fit this model, producing a sense of unease – in other words, we know that something’s not right – even if we can’t precisely articulate what’s wrong.

Why don’t we feel a similar sense of unease when we watch a cartoon like The Simpsons, where the characters are even further away from our concept of humanness? Because in the cartoon environment, we accept that the characters are not really human at all – they’re cartoon characters and are self-consistent within their animated environment. Conversely, it would be jarring if a real human entered the frame and interacted with the Simpsons, because eighteen years of Simspons cartoons and eighty years of cartoons in general have conditioned us not to expect this [Footnote 1].

There’s a lesson here for software designers, and one that I’ve talked about recently – we must ensure that we design our applications to remain consistent with the environment in which our software runs. In more concrete terms: a Windows application should look and feel like a Windows application, a Mac application should look and feel like a Mac application, and a web application should look and feel like a web application.

Obvious, you say? I’d agree that software designers and developers generally observe this rule except in the midst of a technological paradigm shift. During periods of rapid innovation and exploration, it’s tempting and more acceptable to violate the expectations of a particular environment. I know this is a sweeping and abstract claim, so let me back it up with a few examples.

Does anyone remember Active Desktop? When Bill Gates realized that the web was a big deal, he directed all of Microsoft to web-enable all Microsoft software products. Active Desktop was a feature that made the Windows desktop look like a web page and allowed users to initiate the default action on a file or folder via a hyperlink-like single-click rather than the traditional double-click. One of the problems with Active Desktop was that it broke all of users expectations about interacting with files and folders. Changing from the double-click to single-click model subtley changed other interactions, like drag and drop, select, and rename. The only reason I remember this feature is because so many non-technical friends at Penn State asked me to help them turn it off.

Another game-changing technology of the 1990s was the Java platform. Java’s attraction was that the language’s syntax looked and felt a lot like C and C++ (which many programmers knew) but it was (in theory) ‘write once, run anywhere’ – in other words, multiplatform. Although Java took hold on the server-side, it never took off on the desktop as many predicted it would. Why didn’t it take off on the desktop? My own experience with using Java GUI apps of the late 1990s was that they were slow and they looked and behaved weirdly vs. standard Windows (or Mac or Linux) applications. That’s because they weren’t true Windows/Mac/Linux apps. They were Java Swing apps which emulated Windows/Mac/Linux apps. Despite the herculean efforts of the Swing designers and implementers, they couldn’t escape the Uncanny Valley of emulated user interfaces.

Eclipse and SWT took a different approach to Java-based desktop apps [Footnote 2]. Rather than emulating native desktop widgets, SWT favor direct delegation to native desktop widgets [Footnote 3], resulting in applications that look like Windows/Mac/Linux applications rather than Java Swing applications. The downside of this design decision is that SWT widget developers must manually port a new widget to each supported desktop environment. This development-time and maintenance pain point only serves to emphasize how important the Eclipse/SWT designers judged native look and feel to be.

Just like Windows/Mac/Linux apps have a native look and feel, so too do browser-based applications. The native widgets of the web are the standard HTML elements – hyperlinks, tables, buttons, text inputs, select boxes, and colored spans and divs. We’ve had the tools to create richer web applications ever since pre-standards DOMs and Javascript 1.0, but it’s only been the combination of DOM (semi-)standardization, XHR de-facto standardization, emerging libraries, and exemplary next-gen apps like Google Suggest and Gmail that have led to a non-trivial segment of the software community to attempt richer web UIs which I believe we’re now lumping under the banner of ‘Ajax’ (or is it ‘RIA’?). Like the web and Java before it, the availability of Ajax technology is causing some developers to diverge from the native look and feel of the web in favor of a user interface style I call “desktop app in a web browser”. For an example of this style of Ajax app, take a few minutes and view this Flash demo of the Zimbra collaboration suite.

To me, Zimbra doesn’t in any way resemble my mental model of a web application; it resembles Microsoft Outlook [Footnote 4]. On the other hand Gmail, which is also an Ajax-based email application, almost exactly matches my mental model of how a web application should look and feel (screenshots). Do I prefer the Gmail look and feel over the Zimbra look and feel? Yes. Why? Because over the past twelve years, my mind has developed a very specific model of how a web application should look and feel, and because Gmail aligns to this model, I can immediately use it and it feels natural to me. Gmail uses Ajax to accelerate common operations (e.g. email address auto-complete) and to enable data transfer sans jarring page refresh (e.g. refresh Inbox contents) but its core look and feel remains very similar to that of a traditional web page. In my view, this is not a shortcoming; it’s a smart design decision.

So I’d recommend that if you’re considering or actively building Ajax/RIA applications, you should consider the Uncanny Valley of user interface design and recognize that when you build a “desktop in the web browser”-style application, you’re violating users’ unwritten expectations of how a web application should look and behave. This choice may have significant negative impact on learnability, pleasantness of use, and adoption. The fact that you can create web applications that resemble desktop applications does not imply that you should; it only means that you have one more option and subsequent set of trade-offs to consider when making design decisions.

[Footnote 1] Who Framed Roger Rabbit is a notable exception.

[Footnote 2] I work for the IBM group (Eclipse/Jazz) that created SWT, so I may be biased.

[Footnote 3] Though SWT favors delegation to native platform widgets, it sometimes uses emulated widgets if the particular platform doesn’t provide an acceptable native widget. This helps it get around the ‘least-common denominator’ problem of AWT.

[Footnote 4] I’m being a bit unfair to Zimbra here because there’s a scenario where its Outlook-like L&F really shines. If I were a CIO looking to migrate off of Exchange/Outlook to a cheaper multiplatform alternative, Zimbra would be very attractive because since Zimbra is functionally consistent with Outlook, I’d expect that Outlook users could transition to Zimbra fairly quickly.

Note to readers: For a while now, I’ve been looking for guidance on designing useful messages and message-based systems, but without much luck. To help others and also because I learn by writing, I’m going to use my blog to document some of the messaging lessons I’ve learned over the past couple of years. I hope this blog entry and future ones like it don’t seem overly-pedantic; my only goal is to help clarify my own thoughts and perhaps help others looking for similar information on a topic with which I’ve personally struggled.

In this blog entry, I talk about the fundamentals of caching resource representations in HTTP-based distributed systems using the language of basic concepts while avoiding HTTP terminology which might sidetrack novice readers. This entry does assume some knowledge of HTTP (e.g. requests, responses, URIs), so if you find these concepts sidetracking you, I’d suggest you read the first couple of chapters of a book like HTTP: The Definitive Guide to familiarize yourself.

If you’re already familiar with HTTP caching (e.g. most likely anyone reading this via Planet Intertwingly), you may wish to skip this entry altogether, unless you’re curious about my take on the topic or are interested in looking for mistakes or misrepresentations. If you do find a problem, please add a comment and I’ll attempt to correct and/or clarify.

Intro

One of the benefits of developing distributed applications using the REST architectural style with the HTTP protocol is their first-class support for caching documents (or ‘entities-bodies’ in HTTP terminology). If you’re simply serving files using a world-class web server like Apache HTTP Server, you get some degree of caching for free. But in dynamic web applications, you’re often generating dynamic documents (e.g. an XML document containing data from a row in a relational database) rather than simply serving files, where the resource and the representation are equivalent.

Unless you’re using an application framework that automatically generates caching information for HTTP responses based on the framework’s meta-data model, you’ll likely have to roll your own caching logic. This presents both a challenge and an opportunity. The challenge is that you must learn about the various HTTP caching options so that you can intelligently apply them to your particular data model; the opportunity is that you can often take advantage of your data model’s semantics to perform smarter caching logic than out-of-the-box file system caching.

In this entry I describe the basic rationale for caching and then discuss the basic caching options possible with the HTTP protocol. Note that I describe these caching options at a very high level, without getting into many implementation details, and at this level the ‘HTTP caching options’ are more like general caching patterns, but nevertheless I describe them in the context and using the language of HTTP, since it’s both a ubiquitously deployed protocol and also the protocol with which I’m most familiar.

Why Cache?

Caching may be one of the most boring topics in software, but if you’re working with distributed systems (like the web), smart cache design is absolutely vital to both system scalability and responsiveness, among other things. In brief, a cache is simply a local copy of data that resides elsewhere. A computing component (whether hardware or software) uses a data cache to avoid performing an expensive operation like fetching data over a network or executing a computationally-expensive algorithm. The trade-off is that your copy of the data may become out of sync with the original data source, or stale, in caching terminology. Whether or not staleness matters depends on the nature of the data and the needs of your application.

For example if your web site displays the average daily temperature for Philadelphia over the past hundred years, you probably display a simple stored data element (e.g. “59 degrees F”) rather than performing this very expensive computation in realtime. Because it would take a long period of unusual weather to noticably affect the result, it doesn’t really matter if your cached copy doesn’t consider very recent temperatures. At the other extreme, an automated teller machine (ATM) definitely should not use a cached copy of your checking account balance when determining whether you have enough money to make a withdrawl since this might allow a malicious customer to make simultaneous withdrawls of his entire balance from multiple ATMs.

Generally speaking, the cacheability of a particular piece of data varies along two axes:

  • the volatility of the data
  • the potential negative impact of using stale data

HTTP Caching Options

Caching is a first-class concern of the REST architectural style and the HTTP protocol. Indeed, one of the main goals of HTTP/1.1 was to enhance the basic caching capabilities provided by HTTP/1.0 (see chapter 7 of Krishnamurthy and Rexford’s Web Protocols and Practice for an excellent discussion on the design goals of HTTP 1.1). At the risk of oversimplifying, for a given RESTful HTTP URI, you have three basic caching options:

  1. don’t use caching
  2. use validation-based caching
  3. use expiration-based caching

These options demonstrate the trade-offs between the need to avoid stale data and the performance benefits of using cached data. The no caching option means that a client will always fetch the most recent data available from an origin server. This is useful in cases where the data is extremely volatile and using stale data may have dire consequences. For example, anytime you view a list of current auctions on eBay (e.g. for 19th Century Unused US Stamps), you’ll notice many anti-caching directives in the HTTP response included to ensure that you always see the most recent state of the various auctions. The downside of no caching is that every request is guaranteed to incur some cost in terms of client-perceived latency, server resources (e.g. CPU, memory), and network bandwidth.

Validation-based caching allows an HTTP response to include a logical ‘state identifier’ (such as an HTTP ETag or Last-Modified timestamp) which a client can then resend on subsequent requests for the same URI, potentially resulting in a short ‘not modified’ message from the server. Validation-based caching provides a useful trade-off between the need for fresh data and the goal to reduce consumption of network bandwidth and, to a lesser extent, server resources and client-perceived latency.

For example, imagine a web page that changes frequently but not on a regular schedule. This web page could use validation-based caching so that each time a client attempts to view the page, the request goes all the way back to the origin server but may result in either a full response (if the client either has an old version of the page or no cached version of the page) or a terse ‘not modified’ response (if the client has the most recent version of the page). All other things being equal, in the ‘not modified’ case the response will be smaller (since the server sent no document), the server will do less work (since it doesn’t have to stream the page bytes from disk or memory), and the client may observe a faster load time since the message is smaller and the user agent (e.g. the browser) may even have a cached rendering of the page. These are certainly superior non-functional characteristics to the ‘no caching’ case and we don’t have to worry about seeing stale data (assuming the client does the right thing). However, the server still did some work to determine that the client had the most recent resource, the client still experienced some latency waiting for the ‘not modified’ message, and we still used some network bandwidth to send the request and received the (albeit short) response.

Expiration-based caching allows an origin server associate an expiration timestamp on a particular document so that clients can simply assume that their cached copy is safe to use if it has not passed its expiration date. In other words, an origin server asserts that the document is ‘good’ or ‘good enough’ for a certain period of tme. This sort of caching has fantastic performance characteristics but requires the designer to ensure either that:

  • the data won’t become stale before the expiration period ends, or
  • the impact of a client using stale data is negligible

An example of a resource that is well-suited for expiration-based caching is an image of a book cover on Amazon.com (e.g. the image of the cover of Steve Krug’s Don’t Make Me Think). While it’s possible that the book cover could change, it’s extremely unlikely and since image files are relatively large, it would be wise for Amazon to set an expiration date so that clients load the image from their cache without even asking Amazon whether or not they have the most recent version. If somehow the cover of the book does change between when you cache your copy and when your cache copy expires, it’s not a big deal unless you base your purchasing decisions on book cover aesthetics.

Another performance benefit of expiration-based caching is that even in the case where a client doesn’t have a valid cached copy of a document, it’s possible that a network intermediary (e.g. a proxy server) does. In this case a client requests a particular URI and before the request reaches the origin server, an intermediary determines that it has a still-valid cached copy of the document and returns its copy immediately rather than forwarding the request to the next intermediary or the origin server. It should be clear from these examples that expiration-based caching results in significantly less user-perceived latency and consumes significantly less network bandwith and server resources. The trick is that you have to guarantee either no staleness or feel confident that the risks involved in a client processing stale data are justified by the performance benefits. Note that its generally not possible to take advantage of intermediary caching over an HTTPS connection.

Summary

In this entry I’ve explained the basic rationale for why we cache things in distributed systems and given an overview of the three basic caching options in REST/HTTP-based systems. This information represents a bare-bone set of fundamental caching concepts, but you must understand these concepts thoroughly before being able to make informed caching design choices vis-à-vis your data model.

In future entries, I’ll build upon these foundational concepts to discuss caching design strategies for various scenarios.

After some prodding from Pat Mueller and James Governor, I signed up for a Twitter account about a week ago. It’s surprisingly fun.

If for some reason you wish to follow my daily activities, you can do so via the following two links:

Don’t know what Twitter is? Ask Wikipedia.

Andrew Shebanow of Adobe recently wrote an interesting blog entry with the unfortunate title of “The Death of UI Consistency“. A few excerpts:

What I’m really talking about here is how the goal of complete UI consistency is a quest for the grail, a quest for a goal that can never be reached.

The reason I think [that RIAs bringing web conventions to the desktop is a good thing] is that it lets us move the conversation away from the discussion of conformance with a mythical ideal and towards a discussion of what a usable UI should be.

I’ve been thinking about UI consistency quite a bit recently. Although Andrew’s on the right track, I think he clouds the issue by arguing that the “the goal of complete UI consistency is a quest for the grail”. I personally don’t know anyone who has argued for complete UI consistency; indeed my recent experience, especially with Ajax-style web applications, has been that many designers don’t consider UI consistency enough. But before going further, I think it’s important to consider what it means to provide UI consistency.

First it’s important to remember that consistency is relative. While we can measure certain UI characteristics, like background color or width, in absolute terms (e.g. ‘white’ or ‘1400 pixels’), we can only measure consistency relative to established visual and behavioral conventions. These conventions vary by platform – for example in a Windows application you expect to see a closing ‘x’ in the upper right hand corner of each application; on a web site you expect clicking on a hyperlink to take you to a new page. So because there are no universal UI conventions, there’s no such thing as absolute consistency; there is only consistency vis-à-vis platform conventions.

I believe Andrew is observing that as rich client and web technologies converge, so too do their UI conventions, and sometimes these conventions conflict with one another. John Gruber complained that the Adobe CS3 close box didn’t follow the Mac convention; Andrew posits that this is because CS3 does not try to follow Mac conventions nor Windows conventions – it follows the conventions of the Adobe platform.

It’s all well and good to say that you’re creating a new platform and that your new platform introduces new UI conventions, but the fact is that users do have certain expectations about how UIs should look and behave, and when you violate these expectations by not following conventions, you’d better be confident that the benefits outweigh the potential pain you’ll cause users.

So how should we decide whether to follow established UI conventions or to attempt something new and different? To answer this question, it’s important to first understand the value of following conventions as well as the costs and benefits of violating conventions.

Observing established UI conventions has two main benefits:

  • You reduce your application’s learning curve because the user can (subconsciously) leverage previous experience within your application. For example, when you see blue underlined text on a web page, no one needs to explain that you can click it.
  • Your app is more pleasant to use or, more accurately, your app is less unpleasant to use; observe Gruber’s comment “God, that just looks so wrong” – have you ever felt that way when you used a Swing application that was trying to emulate a native Windows or Macintosh look and feel but not quite succeeding?

To quote my former colleague Don Ferguson, “different is hard”. Different can also feel awkward. As you interact with a class of apps over time, your mind builds up subconscious expectations about how apps of that class should look and behave. When an app violates its platform conventions, it often becomes harder to use and sometimes just plain annoying. For instance, have you ever used a web site that decided its hyperlinks shouldn’t be underlined and shouldn’t be blue? Not pleasant. All this being said, it seems like we should always observe UI conventions, but this is not the case either.

UI conventions are not the laws of physics. They represent previous human design decisions that became the norm either because they were very useful (the hyperlink) or just because they became entrenched (the ‘File’ menu). Either way it is possible that a smart or lucky designer can invent a new mechanism that violates existing conventions yet overcomes the barriers to entry because of its usefulness. But it’s a high bar. A new UI mechanism must not simply be better than a UI convention; it must be significantly better such that its value negates the added learning curve and strangeness. A good example of a UI innovation that succeeded is the ‘web text box auto-complete dropdown’ pattern that we see in web applications like Google Suggest, del.icio.us, and Google Maps. Many smart people considered this behavior strange and novel when they first discovered it; these days we don’t really notice it though we certainly appreciate its usefulness. In other words it’s on its way to becoming a new convention.

So I believe that designers should observe established UI conventions except when they decide that violating said conventions provides enough value to obviate the costs. In practice, many designers don’t really think about observing or breaking conventions; they just do what feels right. And you know what? Sometimes they succeed and their arbitrary design choices become new conventions. But a design that violates conventions without understanding the trade-offs runs the risk of feeling just plain arbitrary.