Skip to main content
UsabilityNews.com - for all the latest in usability and human-computer interaction
BCS Interaction
 
 
The All the Latest section presents all general usability news articles


 
  advanced search
 
all the latest

CHI 2003 Feature: Testing... 1 2 3 4 5 ... Testing...


Source: UN, 1 May 2003
Submitted by Larry Constantine

Usability can sometimes be more about belief than about evidence or engineering, with usability testing heading the list as a central tenet of the dogma of modern practice. One disgruntled participant in a recent conference even commented: 'It is unbelievable that an instructor at CHI would question the importance of user research and usability testing'. Yet, precisely because of its leading role, it is important for the profession to question the dogma of usability testing and for professionals to keep abreast of new developments and changing perspectives.

The received view in our field holds that testing is the yellow-brick road to usability, that testing is always a good idea, and that a small number of tests is enough to catch most of the problems. How many is enough and how much is most? Since Jakob Nielsen and Thomas Landauer published their now classic work in 1993 (see resources at end), the widely accepted answer has been that 5 subjects is probably enough to uncover roughly 80% of the usability defects. Beyond that point the return-on-investment from added testing falls off steadily. Many organizations stake their reputations and the success of their products on some variant of this formula.

This particular piece of the received wisdom of usability was revisited in a panel at the recent CHI 2003 conference in Fort Lauderdale, Florida. Called, appropriately enough, "The Magic Number 5," the panel's acknowledged aim was to be the last panel of its kind and to lay the issue to rest for once and for all time.

Moderated by Nigel Bevan of Serco Usability Systems in the UK, the panel brought together Carol Barnum of Southern Polytechnic State University, Gilbert Cockton from University of Sunderland in the UK, Rolf Molich of DialogDesign in Denmark, Jared Spool of User Interface Engineering, and Dennis Wixon from Microsoft. Jakob Nielsen of the Nielsen Norman Group, who was listed on the program but failed to show for the panel, was a persistent presence nonetheless.

True to their aim, the panel reviewed and summarised relevant work already reported elsewhere and previously discussed. As an erstwhile firm believer in the small numbers approach, Jared Spool started the panel by recounting how his views had changed after an experience with one client who insisted on testing with at least 18 users. Expecting to uncover fewer and fewer problems as testing progressed, Spool and company were surprised that new problems were still showing up at about the same pace after 16 users. Their experience was supported by Rolf Molich, whose well-known CUE (Comparative Usability Evaluation) studies have the same software evaluated by a number of different usability testing labs. In the second study in that series, for example, the 7 teams returned almost completely different, mostly non-overlapping results. Of the total 310 usability problems uncovered, 75% were identified by but a single testing team and missed by the others, and only one problem showed up in the findings of every team.

Far from being a scientific and reproducible procedure, as it is touted by many professionals and regarded by many managers, usability testing now appears to be a highly variable art in which the results depend on who is testing what by which protocol with which particular subjects. It is quite possible that for some systems being evaluated by some procedures, no matter how many subjects you test, you will continue to uncover new and significant problems. The problems you find will be different from those you would find with other users and different from those that another tester would uncover with the same number of users.

The Nielsen and Landauer work is often summarised as a set of curves plotting the discovery of defects versus the number of test subjects and the relative return-on-investment with each additional subject. The latter curve, which peaks around 5 users, was repeatedly referred to by Spool and other panelists as the 'parabola of optimism'. Not only do recent experiences and research call the numbers into question, but even the underlying statistical assumptions were challenged by Gilbert Cockton. Indeed, if there is a curve of diminishing returns that gradually levels off with more and more test subjects – and that itself is no longer beyond question – the shape of the curve may be unique to every product and even every test protocol.

Not every panelist was on the attack. As Carol Barnum highlighted in her defence of Nielsen-style discount usability, effective testing with small numbers requires clear, well-defined test scenarios based on specific testing objectives and using carefully chosen test subjects. All too often in practice, test scenarios are overly vague and open-ended and test subjects are really just the people who happen to show up at the laboratory door.

The most impassioned advocacy came from Dennis Wixon, who kicked off a lively exchange by arguing that size is not what matters and that identifying the total set of problems is irrelevant. The true goal of testing is not finding defects but fixing them. Wixon championed the RITE approach (Rapid Iterative Testing and Evaluation) currently in favor at Microsoft. The technique uses a short find-fix cycle that echoes the daily-build philosophy pioneered by David Cutler on the Windows NT project. Every day the latest version of the software is tested with another user. Problems are fixed immediately and then the system is re-evaluated with another test the next day. This approach, Wixon claimed, continually answers the question 'Does the system as modified actually work for users?'.

Interestingly, neither the panel nor the audience challenged the concept of repeatedly testing a changing system with changing test subjects. If, as the panel's own evidence clearly suggests, each test and subject is in some aspects unique and the testing process may not converge even over substantial numbers of subjects using a stable system, the repeated testing of an ever-changing system is more about the illusion of rapid progress than the reality.

Nevertheless, rapid iteration struck a responsive chord with the audience. Speaking from the floor, Robin Jeffries, Distinguished Engineer at Sun Microsystems, drew applause when she advocated that professionals 'test early, test often, and test iteratively'. Repeated with variation – and invariably to enthusiastic audience response – this theme became a veritable mantra for the session.

One major conclusion from the panel and the conference might be that usability testing is so entrenched in the canon of usability practice that no amount of counter-evidence will shake the faith of its true believers. An unbiased reading of the research results would suggest that no amount of testing is enough, a conclusion already well-established a quarter century ago within the software quality movement. The focus of software quality improvement is now on reducing the so-called injection rate, that is, avoiding problems in the first place. There is a limit to how many defects can be uncovered, cataloged, and analysed in a given number of sessions no matter what protocols one follows or how rapidly one iterates. The more problems lurking in the system to be tested, the more hopeless the fate of those who put their faith in testing.

Among the panellists, only Rolf Molich took that next logical step to question the very role of usability testing. He shared a hopeful vision of the future in which robust and disciplined design processes avoided most usability problems from the outset. In his vision, usability testing facilities would gradually fall into disuse and ultimately be abandoned to gather dust.

Amen, brother.

As I listened in on the buzz among attendees leaving the session, it became even clearer that Molich was addressing a small sect that believes in usability by design while Wixon was preaching to the choir. The widespread belief in the power of testing remains safely unshaken by the facts.

Larry Constantine,
Constantine & Lockwood Ltd


RESOURCES
Constantine, L. L. "Testing, Testing, One, Two." forUSE: The Electronic Newsletter of Usage-Centered Design, #11. [http://www.foruse.com/newsletter/foruse11.htm]

Medlock, M. C., Wixon D., Terrano, M., Romero R., Fulton B. (2002). "Using the RITE Method to improve products: a definition and a case study." Usability Professionals Association (UPA2002), Orlando, FL, July 2002. [http://www.microsoft.com/usability/Playtest/Publications/Using%20the%20RITE%20Method%20to%20improve%20products.doc;%20a%20definition%20and%20a%20case%20study.doc]

Molich, Rolf, Nigel Bevan, Ian Curson, Scott Butler, Erika Kindlund, Dana Miller & Jurek Kirakowski. (1998). Comparative Evaluation of Usability Tests. Proceedings of the Usability Professionals Association (UPA98) Conference, Washington, DC. [http://www.dialogdesign.dk/tekster/cue1/cue1paper.doc]

Molich, Rolf, Ann Thomsen, Barbara Karyukina, Lars Schmidt, Meghan Ede, Wilma van Oel, & Meeta Arcuri. (1999). Comparative Evaluation of Usability Tests. CHI99 Extended Abstracts 83-84. [http://www.dialogdesign.dk/tekster/cue2/abstract.doc]

Nielsen, J., and Landauer, T. K. 1993. A mathematical model of the finding of usability problems. Proceedings ACM/IFIP INTERCHI'93 Conference (Amsterdam, The Netherlands, April 24-29), 206-213. [See http://www.useit.com/alertbox/20000319.html and http://www.useit.com/papers/heuristic/heuristic_evaluation.html]

Woolrych, A. and Cockton, G., "Why and When Five Test Users Aren't Enough," in Proceedings of IHM-HCI 2001 Conference: Volume 2, eds. J. Vanderdonckt, A. Blandford, and A. Derycke, Cépadèus Éditions: Toulouse, 105-108, 2001. [http://osiris.sunderland.ac.uk/~cs0gco/fiveusers.doc]

 


External link to another web site Associated Link:
Constantine & Lockwood Ltd

other news

'Internet addiction' linked to Depression
Source: BBC, 9 February 2010
 
There is a strong link between heavy internet use and depression, UK psychologists have said.

Could *You* be more Usable?
Source: UN, 8 February 2010
 
Bet you could.

Stowe Boyd on 'Steampunk' thinking about the Future of Computing
Source: Stowe Boyd's blog via Experientia, 6 February 2010
 
Are established metaphors of user experience holding us back from new ways of structuring our interaction through computers?

Nokia's User Experience Programme
Source: UN, 5 February 2010
 
Nokia has put together a rich and informative website covering the key elements of user experience.

Interfaces magazine: latest issue available now
Source: HCI News Service, 4 February 2010
 
The latest issue of Interfaces is now available in pdf format, free from the Interaction Website.

A Lighter Brigade of Chargers
Source: UN, 3 February 2010
 
Lots of gadgets, one charger. At last.

Mobile Touch Screens could soon Feel the Pressure
Source: MIT Technology Review, 2 February 2010
 
A quantum switch could add pressure sensing to mobile screens.

Usability, Usability, Usability: why the iPad will Succeed
Source: Econsultancy, 1 February 2010
 
The tech critics love it, hate it, love it again, shrug it off. What do usability experts say?

British Airways - at last some good news
Source: Loop11, 30 January 2010
 
In a recent website usability study for the world's leading airlines, the British Airways website proved to be the most user friendly, with Malaysia Airlines and Virgin Atlantic having the lowest user experience rating.

Computation of Emotions in Man and Machine
Source: Royal Society, 29 January 2010
 
Advances in computer technology now allow machines to recognise and express emotions, paving the way for improved human-computer and human-human communications.

 
 

 

home | contribute | subscribe | news feed/RSS | search | contact us | disclaimer

UsabilityNews.com (version 1.41), along with its associated web site and content,
are all strictly © Copyright of the BCS Interaction 2001-2010. All rights reserved.

Joanna Bawa (editor), Dave Clarke (founder, designer and developer). Ian Parry (graphics).