Enhancing UX Insights: Combining Usability Questionnaires with Hands-On Testing

November 17, 2024

Integrating standardised usability questionnaires with hands-on usability testing offers a powerful method for evaluating and improving user experience. This approach blends qualitative and quantitative data, providing both a comprehensive overview of user satisfaction and a deeper understanding of specific pain points. In this article, we explore a practical example of how this combined method can yield actionable insights, streamline usability evaluations, and support data-driven UX decisions for enhanced product performance.

What I did: Measuring UX in a B2B context requires focusing on metrics that reflect the usability of the product, its impact on productivity, user satisfaction and ease of integration with other systems.  

Methods: Usability tests combined with standardised questionnaires.  

Why: To compare previous and new versions of the product and to identify industry evaluation standards.  

Role: User research lead

Result: I would recommend this combination of quantitative and qualitative methods.

Why are they useful?  

Why use standardised usability questionnaires? How to use them and why they can be useful.  Below is an overview of the three usability questionnaires and some very interesting results. First of all, they are all internationally recognised, have good comparability and a high Cronbach's alpha. With frequent use, the development of usability can be made measurable.

SUS

The SUS - System Usability Scale (5 items) was invented by John Brooke (1986) and is the most widely used questionnaire for measuring perceptions of usability. Its main advantages are that sample size and reliability are unrelated and it has a high reliability (alpha=0.91). It could be used with a very small sample size of 12 participants. It has a 5-point scale: from 1 'strongly disagree' to 5 'strongly agree'. The SUS score is a single number between 0 and 100.  

 

A SUS score above 68 would be considered above average and anything below 68 would be considered below average. It focuses on overall usability and is an independent technology that has been tested on hardware, consumer software, websites and mobile phones.

Items:  

  1. I think that I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I think that I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.

PSSUQ

The PSSUQ - Post Study System Usability Questionnaire (19 items) is the second most widely used and quoted questionnaire. It covers system usefulness, information quality and interface quality and was invented by Jim Lewis (1995).  

Similar to the SUS, it provides an overall score by averaging all 16 items, but it also has three sub-factors. The PSSUQ is effective with small sample sizes (12 participants) and has a 7-point Likert scale + 'no answer' option (lower is better). It also has high reliability, similar to the SUS, with alpha=0.96. It also correlates significantly with task-based measures and is specifically designed for scenario-based usability studies.  

 

How to calculate it:  

  • OVERALL – Overall user satisfaction with their system – calculated by taking the average of questions 1-19
  • SYSUSE – System usefulness – calculated by taking the average of questions 1-8
  • INFOQUAL – Information quality – calculated by taking the average of questions 9-15
  • INTERQUAL – Interface quality – calculated by taking the average of questions 16-18

For the calculation you get the average of 2.82, results between 2.82 and 1 are good.  

The PSSUQ is technology agnostic, meaning it can be applied to any interface - hardware, software, mobile apps or websites. And it's effective with small sample sizes. It can do both - large sample sizes (more than 100) and smaller ones (less than 15).  

ISO 9241-110

ISO 9241-110 (21 points) by Jochen Prümper (1993) uses the seven interaction principles. It can be used with small sample sizes (13 participants) and includes a 7-point Likert scale and text anchors at both ends. It is internationally accepted with high reliability (Cronbach's alpha and re-test reliability) and tested validity.

7 interaction principles:

  • Suitability for the task / Aufgabenangemessenheit
  • Self-descriptiveness / Selbstbeschreibungsfähigkeit
  • Controllability / Steuerbarkeit (neu 2020: User Engagement / Benutzerbindung)
  • Conformity with user expectations / Erwartungskonformität
  • Error tolerance / Fehlertoleranz (neu 2020: Robustheit gegen Benutzungsfehler)
  • Suitability for individualization / Individualisierbarkeit
  • Suitability for learning / Lernförderlichkeit (neu 2020: Erlernbarkeit)

Standardised questionnaires combined with usability testing

One of the biggest benefits is the ability to get feedback on how usability is developing over time. If you do usability testing on a regular basis, this helps you to see if the product is moving in the right direction.  

As I used the three questionnaires in combination with a usability test, it helped me a lot to have quantitative data combined with qualitative interview data. I can only recommend that you try it yourself. There is also the question of which of the three questionnaires to use - I cannot answer that yet. I would start with 2-3 of them and then compare over time. You will get a feel for it and then you can think about some adjustments.  

My participants were really open to filling in the questionnaires. I organised a 90-minute time slot and some qualitative tasks for 60 minutes. The following 30 minutes was the buffer time plus time to fill in the questionnaires. This worked quite well, so as a UX researcher I am exhausted to test it again.  

 

Tip: Pay attention to how you have coded the scales and how you have to interpret the results.

Interpretation

I got some pretty good feedback, but it was not easy for me to interpret, so it is good to get some comparative data on the internet.  

I liked the following graph to get a feel for where I am with my result and what the next steps should be. In general, you are never done with UXR after one round. You have to adapt, conduct interviews again and derive suggestions for improvement.  

Source: https://measuringu.com/interpret-sus-score/

Sources: