Skip to main content
UsabilityNews.com - for all the latest in usability and human-computer interaction
BCS Interaction
 
 
The All the Latest section presents all general usability news articles


 
  advanced search
 

All the Latest

Caroline's Corner: Designing comparative Evaluations


Source: UN, 31 August 2004
Submitted by Caroline Jarrett

Caroline's picture

It was one of those calls that is simultaneously good news and bad news: 'We’d like you to do an evaluation for us. We have two designs here and we want to know which one is better'.

The good news: well, I’m a consultant so phone calls offering work are always good, right?

The bad news: comparative evaluations. Ugh. So I thought I’d at least make use of the pain by writing a few notes on them here.

IS A BETTER THAN B?
The first challenge of a comparative evaluation is that the client wants a nice clear answer: A is better than B. Or perhaps: B is better than A. The problem is that the actual answer is usually more complicated. Parts of A are better than parts of B. Parts of B are better than parts of A. Some bits of A are execrable. Some bits of B, usually but not always different ones, are also execrable. There’s probably an approach C that is better than A or B, and the final answer is probably D: a bit of C, plus some of the good points from A and from B. It’s not exactly a nice clean story, is it?

'BETWEEN SUBJECTS’ or 'WITHIN SUBJECTS’
I don’t like to use the term 'subject’ for the participant in a test, because my view is that the system is the subject not the person. But here we need to turn to the design of psychological experiments here where the subject of the experiment is the person. If you have two designs to test, are you going to get the same participants to test both designs ('within subjects’) or are you going to do two rounds of testing: one group of participants gets A, and another gets B ('between subjects’)?

The problem with 'within subjects’ design is that nearly all systems have some learning effects. If you ask the participants to try the same or similar tasks with both systems then they learn about the task with the first system and can’t unlearn that knowledge before they try the second system. If they try different tasks with each system then are they really comparing like with like? I’ve known participants who had a hard time with the task on A so they were adamant that they preferred B even though it was downright horrible to do the task with B. And we also get into much larger sample sizes because we have to vary the order of presentation of systems so that the one group of participants get A then B and an equal group gets B then A.

The problem with 'between subjects’ design is that you can’t ask the participants which they preferred. And surely that is one of the main reasons why we’re doing an evaluation anyway, to establish preference? So we end up in the murky world of inferential statistics: trying to figure out what the population of whole as a whole might prefer on the basis of the two samples from that population who tried these two interfaces. And now we’re into the issues of random sampling and statistical tests that require much larger sample sizes than we normally use in usability testing.

MINIMAL OR RADICAL DIFFERENCES?
My third recurring problem with comparative evaluation is the 'identical twins’ problem. The client knows these babies and sees all the subtle and, to them, important differences that they want to explore. The participants see them as identical twins: both products look pretty much the same. For example, we were looking at three different versions of a form that is much hated by the general public. The client could see all sorts of really, really major differences between then. The participants just saw the form they loathed.

SOME TIPS
If you do have to undertake a comparative evaluation, maybe these tips will help:

1. Prepare your client for a complicated answer that picks elements from the different approaches.

2. Be prepared to undertake far more tests. You’ll probably need at least three times the number of participants you usually work with rather than just twice the number.

3. Dust off your statistics books. You really do need to think about what assertions are supported by your sample size.

4. Try to make sure that the differences you are exploring really do seem like differences to your participants.

If you have any comments or suggestions about this article then please contact Caroline at:

Caroline.Jarrett@Effortmark.co.uk

Caroline Jarrett
www.effortmark.co.uk

 


External link to another web site Associated Link:
Effortmark


Other News

All change at the top for System Concepts
Source: System Concepts Ltd, 3 July 2009
 
Leslie Fountain has been promoted to joint Managing Director of leading usability consultancy System Concepts.

Life in UCD immortalised in fiction: you couldn't make it up
Source: UN, 2 July 2009
 
Sarah Herman's fictitious book on life in a user-centred design company has hit the shelves and The Guardian's book pages...

Interfaces Magazine - Issue 79: The Education Issue
Source: Interaction Group, 1 July 2009
 
The latest issue of Interfaces is now available as a free download from the Interaction Website.

Two new Behavioural research Tools from Noldus
Source: UN, 30 June 2009
 
Tool updates make on-site behavioural data collection easier.

Cell Phones that Listen and Learn
Source: MIT Technology Review, 29 June 2009
 
New software tracks a user's behavior by monitoring everyday sounds.

Top Six Don’ts for Usability Testing
Source: FutureNow Inc., 27 June 2009
 
Six tips for creating quality usability tests to ensure useful feedback from testers.

Usability: ‘Lovely software. But I can’t work it’
Source: FT.com, 26 June 2009
 
In a recent survey by Global Graphics, 77 per cent of office workers estimate they lose up to one hour a week because business software is difficult to use.

And what do you do?
Source: Dexo Design, 25 June 2009
 
How do you describe your job role? Here are the results of a recent 'Preferred UX/UI Title' Poll.

Most Doctors cite Usability as critical to Electronic Health Record Adoption
Source: TMCNet, 24 June 2009
 
It's all about 'meaningful use'.

Glossy monitors look good but can hurt
Source: QUT, 23 June 2009
 
A new advisory cites research which suggests high gloss monitors make users sit awkwardly.

 
 

 

home | contribute | subscribe | news feed/RSS | search | contact us | disclaimer

UsabilityNews.com (version 1.41), along with its associated web site and content,
are all strictly © Copyright of the BCS Interaction 2001-2009. All rights reserved.

Joanna Bawa (editor), Dave Clarke (founder, designer and developer). Ian Parry (graphics).