Skip to main content
UsabilityNews.com - for all the latest in usability and human-computer interaction
BCS Interaction
 
 
The All the Latest section presents all general usability news articles


 
  advanced search
 

All the Latest

Caroline's Corner: Statistically significant Usability Testing


Source: Efffortmark, 4 June 2009
Submitted by Caroline Jarrett

Caroline Jarrett

It was an intriguing question: "How do I find out about statistically significant usability testing?". I'm sure it's one that you've encountered, and maybe your reaction was the same as mine: "That's the wrong question". Then I realised that if someone asks a question, it probably means that they want to get an answer to it. So here we go.

STATISTICAL SIGNIFICANCE
I'm assuming that we all know what we mean by 'usability testing', so let's start with the idea of 'statistical significance'. Suppose you have a target for your product - let's make it a web site - that a typical user can complete a typical task within 30 seconds - let's make it finding the telephone number for Customer Services. You run a usability test, and find that the mean task time is 35 seconds. Then you make some changes, run another usability test, and find that the mean task time is now 29 seconds. Result: much rejoicing. Or maybe not: was the improved result purely from the natural variability in the samples, or because of the changes that you made?

That's where 'statistical significance' comes in. If you take two different random samples, measure something about them, compare the measurements, and find a difference then the statistical tests of significance will help you to work out whether the difference you found could have come from the natural variation you get from any random sample, or because the two samples were genuinely different for some underlying reason.

For example, suppose I wanted to know the typical number of years of experience of a usability professional. I could go to the next UPA meeting, ask the first ten people to arrive "how long have you been a usability professional", and calculate the mean of their answers. Then, being meticulous, I might decide to go to an extra meeting and do the same thing. I'd probably get a difference, and I could use a significance check to find out whether that difference arose from chance.

OTHER STATISTICAL ERRORS
Maybe you're thinking: "hmm, I wouldn't try to find out typical years of experience in that way"? I agree: my strategy is replete with other statistical errors. First of all, I've got a sampling error: the sample I chose (first ten people) isn't random and it's quite likely that the first ten people are somewhat different from the rest of the people in some way. Secondly, I've got a non-response error: it's quite possible that usability professionals who go to UPA meetings have a different experience profile to those who do not. What about people with young children? Very experienced professionals who are under the illusion that they know it all and don't go to meetings any more? Students who happened to all be having exams that week and couldn't turn up? It's possible that all these factors might average out, but unlikely. Thirdly, I've almost certainly got a measurement error: some people at the meeting might not be usability professionals, others may have had breaks in their experience. What exactly did I want to find out, how will I use the data that I obtained, and so did I ask the right question?

STATISTICAL ERRORS IN USABILITY TESTING
I think that's why I reacted a bit negatively to the original question. I'm sure all of us start planning our usability tests by thinking first of all about what exactly we want to find out, how we'll use the findings, and making sure that we "ask the right questions" by careful design of our tasks and choice of what parts of the product to test. Statistically speaking, that's all about avoiding measurement error and has nothing to do with statistical significance.

Do I hear you cry "But what about recruiting?". Definitely. We all know that we have to be thoughtful about who we recruit. That's where sampling error and non-response error comes in. Whether you go for the cheap-and-cheerful 'hey you' method, whether you throw a lot of money at a complex, stratified sample, or whether you do something in-between. No matter what you spend, it's unlikely that everyone in the target audience has an equal chance of being selected (that's sampling error). And it's very likely that there are some systematic differences between the people who did have a chance of being selected and those that did not: for example, we often have to recruit for specific geographical locations. That's non-response error. So whatever your recruitment strategy, it is absolutely crucial to think about how the users you get differ from the actual target population. They always will.

(Aside: thinking about how the respondents you get differ from the actual target population is a completely standard step in large-scale surveys done by any reputable market research organisation. They call it 'rebalancing the sample'.)

We run the test and we find some problems. Did those problems arise from chance, in that our random selection of participants happened to bring us people who made lots of mistakes? Or did the problems arise because the product we tested was riddled with obvious usability defects? Or some mixture? For many of us, it's a no-brainer: our participants are indeed entirely representative of the target audience, and sadly the product is ghastly. We now have to exert our diplomatic skills on tactfully breaking the bad news. There's no question of statistics here, it's all about plain facts - and that's why I reacted badly to the intial question. If that's your experience too, you can skip the 'statistical significance' question and stop reading here.

STATISICAL SIGNIFICANCE CAN BE USEFUL
But as your product improves, you'll start to get more complex results. Some people have problems, others don't, and then when you look at solving the problem it's actually quite subtle and involves a trade-off that might make the experience better for some but worse for others. Now you're into a place where statistics can really be helpful, allowing you to punch a few numbers into an appropriate calculator or spreadsheet and bingo, get some more numbers to use in the decision-making process.

OK, HOW DO I DO THAT?
And now we're back to the original question: "How do I find out about statistically significant usability testing?" This used to be quite hard to answer, but no longer. Tom Tullis and Bill Albert have obliged with their book "Measuring the user experience: Collecting, analyzing and presenting usability metrics". I admit the title doesn't sound all that enticing and I also admit that it's got lots of rather nerve-wracking entries in the table of contents such as 'Measures of Central Tendency'. Please don't let that put you off. Tackle it gently, a chapter at a time. You'll find that they ease you through with lots of practical advice, even to the point of explaining exactly what to do in Excel.

They have a companion web site, 'Measuring UX', which has presentations and spreadsheets. I admit I was tempted to avoid paying for the book and instead just try to work directly from the web site, but that's a mistake: the book demystifies the spreadsheets and puts them into a context which is definitely worth the price.

Finally, there's Jeff Sauro's site 'Measuring Usability'. This dives straight into the tough stuff, such as calculating a Z-Score.

If you have comments or questions about this article, please contact me:

Caroline.Jarrett@Effortmark.co.uk
Caroline Jarrett is a usability consultant specialising in forms and improving web content.
© 2009 Caroline Jarrett, all rights reserved.

 


External link to another web site Associated Link:
Effortmark


Other News

'Internet addiction' linked to Depression
Source: BBC, 9 February 2010
 
There is a strong link between heavy internet use and depression, UK psychologists have said.

Could *You* be more Usable?
Source: UN, 8 February 2010
 
Bet you could.

Stowe Boyd on 'Steampunk' thinking about the Future of Computing
Source: Stowe Boyd's blog via Experientia, 6 February 2010
 
Are established metaphors of user experience holding us back from new ways of structuring our interaction through computers?

Nokia's User Experience Programme
Source: UN, 5 February 2010
 
Nokia has put together a rich and informative website covering the key elements of user experience.

Interfaces magazine: latest issue available now
Source: HCI News Service, 4 February 2010
 
The latest issue of Interfaces is now available in pdf format, free from the Interaction Website.

A Lighter Brigade of Chargers
Source: UN, 3 February 2010
 
Lots of gadgets, one charger. At last.

Mobile Touch Screens could soon Feel the Pressure
Source: MIT Technology Review, 2 February 2010
 
A quantum switch could add pressure sensing to mobile screens.

Usability, Usability, Usability: why the iPad will Succeed
Source: Econsultancy, 1 February 2010
 
The tech critics love it, hate it, love it again, shrug it off. What do usability experts say?

British Airways - at last some good news
Source: Loop11, 30 January 2010
 
In a recent website usability study for the world's leading airlines, the British Airways website proved to be the most user friendly, with Malaysia Airlines and Virgin Atlantic having the lowest user experience rating.

Computation of Emotions in Man and Machine
Source: Royal Society, 29 January 2010
 
Advances in computer technology now allow machines to recognise and express emotions, paving the way for improved human-computer and human-human communications.

 
 

 

home | contribute | subscribe | news feed/RSS | search | contact us | disclaimer

UsabilityNews.com (version 1.41), along with its associated web site and content,
are all strictly © Copyright of the BCS Interaction 2001-2010. All rights reserved.

Joanna Bawa (editor), Dave Clarke (founder, designer and developer). Ian Parry (graphics).