Evaluation

Dr Charles Martin

Announcements

Plan for the class

Evaluation

Sharp et al. 2019 Textbook: Chapters 14-16

What is evaluation?

Evaluation: collecting and analysing data from user experiences with an artefact
Goal: to improve the artefact’s design.
Addresses:
- functionality
- usability
- user experience
Appropriate for all different kinds of artefacts and prototypes
Methods vary according to goals.

Why is evaluation important?

Understanding people
- Users may not have the same experiences or perspectives as you do
- Different users use software differently
Understanding designs
- Proof that ideas work
- Understand limitations, affordances, applications

Business
- Invest in the right ideas
- Find problems to solve (before production, before next iteration, etc.)
Research
- Evidence for new interactive systems
- Empirical proof of hypotheses
- New knowledge to answer research questions

What should you evaluate/measure?

Does the design do what the users need and want?

Examples:

Game App Developers: Whether young adults find their game fun and engaging compared to other games
Government authority: Whether their online service is accessible to users with a disability
Children’s talking toy designers: Whether six-year-olds enjoy the voice, feel of the soft toy, and can use safely

Usability and Usability Goals

Six usability goals:

Effective to use (effectiveness)
Efficient to use (efficiency)
Safe to use (safety)
Having good utility (utility)
Easy to learn (learnability)
Easy to remember how to use (memorability)

Where should you evaluate your design?

Depends on your evaluation goal!

Lab studies (controlled settings)
In-the-wild studies (natural settings)
Remote studies (online behaviour)

Activity: Evaluating an interactive toy

You’re all HCI researchers and we need to evaluate this interactive toy.

We need to choose:

how we will evaluate the toy?
in what environment?
what information do we need and why?
what research questions are being asked?

Talk for 2-3 minutes and then we will hear some answers.

Where and why will we evaluate this toy? (Photo by COSMOH LOVE on Unsplash)

When should you evaluate?

Evaluation serves different purposes at different stages of the design process

Formative evaluation:
- Assessing whether a product continues to meet users’ needs during a design process
- Early or late stages
Summative evaluation:
- Assessing whether a finished product is successful
- Feeds into an iterative design process

Formative vs Summative Evaluation https://www.youtube.com/watch?v=730UiP7dZeo

Types of Evaluation

Controlled settings (e.g., Usability testing)

Usability Testing

Measures: Can involve numbers and time (e.g., number of task completion, number of errors made, time taken to complete task)
Methods: Can involve a mixture of methods e.g., think aloud, observation, interviews, questionnaires, data logging and analytics
Data: Can collect a variety of data depending on the methods used (e.g., video, audio, facial expressions, key presses, verbal feedback)
Settings: Usability lab + observation room vs mobile usability kit
Number of participants: 5-12 baseline but more is better
Read the textbook for other kinds of experimental design

Usability Testing Example

Schaadhardt et al. (2021) Understanding Blind Screen-Reader Users’ Experiences of Digital Artboards — Schaadhardt et al. (2021) **Understanding Blind Screen-Reader Users’ Experiences of Digital Artboards**

Natural settings (e.g., Field studies)

Goals of field studies:

Help identify opportunities for new technology
Establish the requirements for a new design
Facilitate the introduction of technology or inform deployment of existing technology in new contexts

Field Studies

Goals:
- Understanding how people interact with technologies in “messy worlds”, how technologies will be integrated into contexts
- Studying use of existing technologies and impacts of introducing new ones
Methods: Emphasis on qualitative methods rather than statistical measures e.g., Observations, interviews, diaries, interaction logging
Duration: No fixed length- can be seconds, months, years
Paying attention to: Use situations, problems/errors, distractions, patterns of behaviours
How does your presence and involvement shape engagement? Observation vs participant observation
Findings: Used for creating thematic analysis, vignettes, narratives, critical incident analysis etc.

Field Studies Example

Co-Designing with Orangutans: Enhancing the Design of Enrichment for Animals (Sarah Webber, Marcus Carter, Wally Smith, and Frank Vetere) Proc. DIS ’20 (Webber et al., 2020) — **Co-Designing with Orangutans: Enhancing the Design of Enrichment for Animals** (Sarah Webber, Marcus Carter, Wally Smith, and Frank Vetere) Proc. DIS ’20 (Webber et al., 2020)

Design objective 1: Develop a digital installation to provide enhanced, varied enrichment for orangutans at Melbourne Zoo

Evaluation by Inspection

Skip the “users”! Just evaluate against established principles (heuristics) and standards.

Expert Evaluation

Conducted by designers and design “experts” rather than with end users
Inspection methods – expert role plays user
Heuristic evaluation: Researchers evaluate whether aspects design adhere to established usability principles (see over)
Cognitive walkthroughs: Simulating user reasoning and problem solving at each step in an interaction sequence (evidence, availability, accessibility of correct action)
Analytics: Understanding user demographics and tracing activities (e.g., number of clicks, duration of sessions etc.)
A/B Testing: Large number of users assigned Design A or B and compare use to test “variable of interest” (e.g., number of clicks on advertising during test period)

Heuristic Evaluations of User Interfaces (video)

Using established principles (heuristics) to evaluate (video)

Nielsen’s 10 Usability Heuristics

Visibility of system status: keep the user informed
Match between system and real world: system uses language and communication familiar to the user, information is natural and logical
User control and freedom: users make mistakes, there should be “emergency exits” to cancel and return quickly
Consistency and standards: users should not wonder whether words, situations or actions mean the same thing, follow conventions
Error prevention: eliminate error-prone conditions, or check with user before they occur

Recognition rather than recall: make elements, actions, and options visible
Flexibility and efficienty of use: shortcuts to speed up for experts, allow tailored experiences
Aesthetic and minimal design: less is more, no unnecessary information
Help users recognise, diagnose and recover from errors: error messages need plain language, and suggest solutions
Help and documentation: best if explanation is not needed, if it is, make it good

Web Design Heuristics

Budd (2007) introduces further heuristics focussed on web, here’s some from the list:

Clarity: Make the system as clear, concise and meaningful as possible for the intended audience.
Minimise unneccessary complexity and cognitive load: Make the system as simple as possible for people to accomplish their tasks.
Provide context: Interfaces should provide people with a sense of context in time and space
Promote a pleasurable and positive experience: people should be treated with respect and the design should be aesthetically pleasing and promote a pleasurable and rewarding experience

Evaluating a website. Image: nngroup (link)

Shneiderman’s Eight Golden Rules of Design

Strive for consistency
Seek universal usability
Offer informative feedback
Design dialogs to yield closure
Prevent errors
Permit easy reversal of actions
Keep users in control
Reduce short-term memory load

Analytics: What can you learn?

Evaluation after deployment: adoption, use, and non-use

Adoption/Appropriation/Design-in-use (Ehn, 2008)
Technology acceptance (Davis, 1989)
Non-use (Satchell & Dourish, 2009)
Technology habitation (Soro et al., 2016)
Technology individuation (Ambe et al., 2017)

Planning Evaluations

Issues during evaluation

Ethical dimensions and consent
Evaluation design and conduct:
- Reliability: “how well it produces the same results on separate occasions under the same circumstances”
- Validity: “whether the evaluation method measures what it intended to measure”
- Ecological validity: “how the environment in which an evaluation is conducted influences or distorts results”
- Bias: “occurs when the results are distorted”
- Scope: “how much of the findings can be generalised”

Developing an evaluation plan

Evaluation Goal/Aims
Participants
Setting
Data to collect
Methods
Ethical Considerations/Consent Process
Data capture/recording/storage
Analysis method
Output(s) of evaluation process

Questions: Who has a question?

Who has a question?

I can take cathchbox question up until 2:55
For after class questions: meet me outside the classroom at the bar (for 30 minutes)
Feel free to ask about any aspect of the course
Also feel free to ask about any aspect of computing at ANU! I may not be able to help, but I can listen.

Meet you at the bar for questions. 🍸🥤🫖☕️ Unfortunately no drinks served! 🙃 — Meet you *at the bar* for questions. 🍸🥤🫖☕️ Unfortunately no drinks served! 🙃

References

Ambe, A. H., Brereton, M., Soro, A., & Roe, P. (2017). Technology individuation: The foibles of augmented everyday objects. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 6632–6644. https://doi.org/10.1145/3025453.3025770

Budd, A. (2007). Heuristics for modern WEb application development. https://andybudd.com/archives/2007/01/heuristics_for_modern_web_application_de

Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340. https://doi.org/10.2307/249008

Ehn, P. (2008). Participation in design things. Proceedings of the Tenth Anniversary Conference on Participatory Design 2008, 92–101.

Raffaele, R., Carvalho, B., Lins, A., Marques, L., & Soares, M. M. (2016). Digital game for teaching and learning: An analysis of usability and experience of educational games. In A. Marcus (Ed.), Design, user experience, and usability: Novel user experiences (pp. 303–310). Springer International Publishing.

Satchell, C., & Dourish, P. (2009). Beyond the user: Use and non-use in HCI. Proceedings of the 21st Annual Conference of the Australian Computer-Human Interaction Special Interest Group: Design: Open 24/7, 9–16. https://doi.org/10.1145/1738826.1738829

Schaadhardt, A., Hiniker, A., & Wobbrock, J. O. (2021). Understanding blind screen-reader users’ experiences of digital artboards. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3411764.3445242

Soro, A., Brereton, M., & Roe, P. (2016). Towards an analysis framework of technology habituation by older users. Proceedings of the 2016 ACM Conference on Designing Interactive Systems, 1021–1033. https://doi.org/10.1145/2901790.2901806

Webber, S., Carter, M., Smith, W., & Vetere, F. (2020). Co-designing with orangutans: Enhancing the design of enrichment for animals. Proceedings of the 2020 ACM Designing Interactive Systems Conference, 1713–1725. https://doi.org/10.1145/3357236.3395559