Wednesday, November 22, 2006

More on Web Optimization: Automated Deployment

I’ve learned a bit more about Web optimization system since yesterday’s post. Both Memetrics and Offermatica have clarified that they do in fact support some version of automated deployment of winning test components. It’s quite possible that other multi-variate testing systems do this as well: as I hope I made clear yesterday, I haven’t researched each product in depth.

While we’re on the topic, let’s take a closer look at automated deployment. It’s one of key issues related to optimization systems.

The first point to consider is that automated anything is a double-edged sword: it saves work for users, often letting them react more quickly and manage in greater detail than they could if manual action were required. But automation also means a system can make mistakes which may or may not be noticed and corrected by human supervisors. This is not an insurmountable problem: there are plenty of techniques to monitor automated systems and to prevent them from making big mistakes. But those techniques don’t appear by themselves, so it’s up to users to recognize they are needed and demand they be deployed.

With multi-variate Web testing in particular, automated deployment forces you to face a fundamental issue in how you define a winner. Automated systems aim to maximize a single metric, such as conversion rate or revenue per visit. Some products may be able to target several metrics simultaneously, although I haven’t seen any details. (The simplest approach is to combine several different metrics into one composite. But this may not capture the types of constraints that are important to you, such as capacity limits or volume targets. Incorporating these more sophisticated relationships is the essence of true optimization.) Still, even vendors whose algorithms target just one metric can usually track and report on several metrics. If you want to consider multiple metrics when picking a test winner, automated deployment will work only if your system can automatically include those multiple metrics in its winner selection process.

A second consideration is automatic cancellation of poorly performing options within an on-going test. Making a bad offer is a wasted opportunity: it drags down total results and precludes testing something else which could be more useful. Of course, some below-average performance is inevitable. Finding what does and doesn’t work is why we test in the first place. But once an option has proven itself ineffective, we’d like to stop testing it as soon as possible.

Ideally the system would automatically drop the worst combinations from its test plan and replace them with the most promising alternatives. The whole point of multi-variate testing is that it tests only some combinations and estimates the results of the rest. This means it can identify untested combinations that work better than anything that's actually been tried. But you never know if the system’s estimates are correct: there may be random errors or relationships among variables (“interactions”) that have gone undetected. It’s just common sense—and one of the ways to avoid automated mistakes—to test such combinations before declaring them the winner. If a system cannot add those combinations to the test plan automatically, benefits are delayed as the user waits for the end of the initial test, reads the results, and sets up a another test with the new combination included.

So far we’ve been discussing automation within the testing process itself. Automated deployment is something else: applying the test winner to the production system—that is, to treatment of all site visitors. This is technically not so hard for Web testing systems, since they already control portions of the production Web site seen by all visitors. So deployment simply means replacing the current default contents with the test winner. The only things to look for are (a) whether the system actually lets you specify default contents that go to non-test visitors and (b) whether it can automatically change those default contents based on test results.

Of course, there will be details about what triggers such a replacement: a specified time period, number of tests, confidence level, expected improvement, etc. Plus, you will want some controls to ensure the new content is acceptable: marketers often test offers they are not ready to roll out. At a minimum, you’ll probably want notification when a test has been converted to the new default. You may even choose to forego fully automated deployment and have the system request your approval before it makes a change.

One final consideration. In some environments, tests are running continuously. This adds its own challenges. For example, how do you prevent one test from interfering with another? (Changes from a test on the landing page might impact another test on the checkout page.) Automated deployment increases the chances of unintentional interference along these lines. Continuous tests also raise the issue of how heavily to weight older vs. newer results. Customer tastes do change over time, so you want to react to trends. But you don’t want to overreact to random variations or temporary situations. Of course, one solution is to avoid continuous tests altogether, and periodically start fresh with new tests instead. But if you’re trying to automate as much as possible, this defeats your purpose. The alternative is to look into what options the system provides to deal with these situations and assess whether they meet your needs.

This is much longer post than usual but it’s kind of a relaxed day at the office and this topic has (obviously) been on my mind recently. Happy Thanksgiving.

No comments: