Evaluation

Dr Charles Martin

Announcements

Plan for the class

Evaluation

Sharp et al. 2019 Textbook: Chapters 14-16

What is evaluation?

  • Evaluation: collecting and analysing data from user experiences with an artefact

  • Goal: to improve the artefact’s design.

  • Addresses:

    • functionality
    • usability
    • user experience
  • Appropriate for all different kinds of artefacts and prototypes

  • Methods vary according to goals.

Evaluating iPad apps in 2013.

Why is evaluation important?

  • Understanding people
    • Users may not have the same experiences or perspectives as you do
    • Different users use software differently
  • Understanding designs
    • Proof that ideas work
    • Understand limitations, affordances, applications
  • Business
    • Invest in the right ideas
    • Find problems to solve (before production, before next iteration, etc.)
  • Research
    • Evidence for new interactive systems
    • Empirical proof of hypotheses
    • New knowledge to answer research questions

What should you evaluate/measure?

Does the design do what the users need and want?

Examples:

  • Game App Developers: Whether young adults find their game fun and engaging compared to other games
  • Government authority: Whether their online service is accessible to users with a disability
  • Children’s talking toy designers: Whether six-year-olds enjoy the voice, feel of the soft toy, and can use safely
Preece in Raffaele et al. (2016)

Usability and Usability Goals

Six usability goals:

  • Effective to use (effectiveness)
  • Efficient to use (efficiency)
  • Safe to use (safety)
  • Having good utility (utility)
  • Easy to learn (learnability)
  • Easy to remember how to use (memorability)
Image: dtravisphd on Unsplash

Where should you evaluate your design?

Depends on your evaluation goal!

  • Lab studies (controlled settings)
  • In-the-wild studies (natural settings)
  • Remote studies (online behaviour)
Image: Unsplash, UX Indonesia

Activity: Evaluating an interactive toy

You’re all HCI researchers and we need to evaluate this interactive toy.

We need to choose:

  • how we will evaluate the toy?
  • in what environment?
  • what information do we need and why?
  • what research questions are being asked?

Talk for 2-3 minutes and then we will hear some answers.

Where and why will we evaluate this toy? (Photo by COSMOH LOVE on Unsplash)

When should you evaluate?

Evaluation serves different purposes at different stages of the design process

  • Formative evaluation:
    • Assessing whether a product continues to meet users’ needs during a design process
    • Early or late stages
  • Summative evaluation:
    • Assessing whether a finished product is successful
    • Feeds into an iterative design process
Formative vs Summative Evaluation https://www.youtube.com/watch?v=730UiP7dZeo

Types of Evaluation

Controlled settings (e.g., Usability testing)

Image Source: Usability Testing (interactiondesign.org)

Usability Testing

  • Measures: Can involve numbers and time (e.g., number of task completion, number of errors made, time taken to complete task)
  • Methods: Can involve a mixture of methods e.g., think aloud, observation, interviews, questionnaires, data logging and analytics
  • Data: Can collect a variety of data depending on the methods used (e.g., video, audio, facial expressions, key presses, verbal feedback)
  • Settings: Usability lab + observation room vs mobile usability kit
  • Number of participants: 5-12 baseline but more is better
  • Read the textbook for other kinds of experimental design

Usability Testing Example

Schaadhardt et al. (2021) Understanding Blind Screen-Reader Users’ Experiences of Digital Artboards

Natural settings (e.g., Field studies)

Goals of field studies:

  • Help identify opportunities for new technology
  • Establish the requirements for a new design
  • Facilitate the introduction of technology or inform deployment of existing technology in new contexts
Source: Ambe et al. (2017)

Field Studies

  • Goals:
    • Understanding how people interact with technologies in “messy worlds”, how technologies will be integrated into contexts
    • Studying use of existing technologies and impacts of introducing new ones
  • Methods: Emphasis on qualitative methods rather than statistical measures e.g., Observations, interviews, diaries, interaction logging
  • Duration: No fixed length- can be seconds, months, years
  • Paying attention to: Use situations, problems/errors, distractions, patterns of behaviours
  • How does your presence and involvement shape engagement? Observation vs participant observation
  • Findings: Used for creating thematic analysis, vignettes, narratives, critical incident analysis etc.

Field Studies Example

Co-Designing with Orangutans: Enhancing the Design of Enrichment for Animals (Sarah Webber, Marcus Carter, Wally Smith, and Frank Vetere) Proc. DIS ’20 (Webber et al., 2020)
Design objective 1: Develop a digital installation to provide enhanced, varied enrichment for orangutans at Melbourne Zoo

Evaluation by Inspection

Skip the “users”! Just evaluate against established principles (heuristics) and standards.

Expert Evaluation

  • Conducted by designers and design “experts” rather than with end users
  • Inspection methods – expert role plays user
  • Heuristic evaluation: Researchers evaluate whether aspects design adhere to established usability principles (see over)
  • Cognitive walkthroughs: Simulating user reasoning and problem solving at each step in an interaction sequence (evidence, availability, accessibility of correct action)
  • Analytics: Understanding user demographics and tracing activities (e.g., number of clicks, duration of sessions etc.)
  • A/B Testing: Large number of users assigned Design A or B and compare use to test “variable of interest” (e.g., number of clicks on advertising during test period)

Heuristic Evaluations of User Interfaces (video)

Using established principles (heuristics) to evaluate (video)

Nielsen’s 10 Usability Heuristics

  1. Visibility of system status: keep the user informed
  2. Match between system and real world: system uses language and communication familiar to the user, information is natural and logical
  3. User control and freedom: users make mistakes, there should be “emergency exits” to cancel and return quickly
  4. Consistency and standards: users should not wonder whether words, situations or actions mean the same thing, follow conventions
  5. Error prevention: eliminate error-prone conditions, or check with user before they occur
  1. Recognition rather than recall: make elements, actions, and options visible
  2. Flexibility and efficienty of use: shortcuts to speed up for experts, allow tailored experiences
  3. Aesthetic and minimal design: less is more, no unnecessary information
  4. Help users recognise, diagnose and recover from errors: error messages need plain language, and suggest solutions
  5. Help and documentation: best if explanation is not needed, if it is, make it good

Web Design Heuristics

Budd (2007) introduces further heuristics focussed on web, here’s some from the list:

  • Clarity: Make the system as clear, concise and meaningful as possible for the intended audience.
  • Minimise unneccessary complexity and cognitive load: Make the system as simple as possible for people to accomplish their tasks.
  • Provide context: Interfaces should provide people with a sense of context in time and space
  • Promote a pleasurable and positive experience: people should be treated with respect and the design should be aesthetically pleasing and promote a pleasurable and rewarding experience
Evaluating a website. Image: nngroup (link)

Shneiderman’s Eight Golden Rules of Design

  1. Strive for consistency
  2. Seek universal usability
  3. Offer informative feedback
  4. Design dialogs to yield closure
  5. Prevent errors
  6. Permit easy reversal of actions
  7. Keep users in control
  8. Reduce short-term memory load

Analytics: What can you learn?

Evaluation after deployment: adoption, use, and non-use

(Ambe et al. 2017)

Planning Evaluations

Issues during evaluation

  • Ethical dimensions and consent
  • Evaluation design and conduct:
    • Reliability: “how well it produces the same results on separate occasions under the same circumstances”
    • Validity: “whether the evaluation method measures what it intended to measure”
    • Ecological validity: “how the environment in which an evaluation is conducted influences or distorts results”
    • Bias: “occurs when the results are distorted”
    • Scope: “how much of the findings can be generalised”

Developing an evaluation plan

  • Evaluation Goal/Aims
  • Participants
  • Setting
  • Data to collect
  • Methods
  • Ethical Considerations/Consent Process
  • Data capture/recording/storage
  • Analysis method
  • Output(s) of evaluation process

Questions: Who has a question?

Who has a question?

  • I can take cathchbox question up until 2:55
  • For after class questions: meet me outside the classroom at the bar (for 30 minutes)
  • Feel free to ask about any aspect of the course
  • Also feel free to ask about any aspect of computing at ANU! I may not be able to help, but I can listen.
Meet you at the bar for questions. 🍸🥤🫖☕️ Unfortunately no drinks served! 🙃

References

Ambe, A. H., Brereton, M., Soro, A., & Roe, P. (2017). Technology individuation: The foibles of augmented everyday objects. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 6632–6644. https://doi.org/10.1145/3025453.3025770
Budd, A. (2007). Heuristics for modern WEb application development. https://andybudd.com/archives/2007/01/heuristics_for_modern_web_application_de
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340. https://doi.org/10.2307/249008
Ehn, P. (2008). Participation in design things. Proceedings of the Tenth Anniversary Conference on Participatory Design 2008, 92–101.
Raffaele, R., Carvalho, B., Lins, A., Marques, L., & Soares, M. M. (2016). Digital game for teaching and learning: An analysis of usability and experience of educational games. In A. Marcus (Ed.), Design, user experience, and usability: Novel user experiences (pp. 303–310). Springer International Publishing.
Satchell, C., & Dourish, P. (2009). Beyond the user: Use and non-use in HCI. Proceedings of the 21st Annual Conference of the Australian Computer-Human Interaction Special Interest Group: Design: Open 24/7, 9–16. https://doi.org/10.1145/1738826.1738829
Schaadhardt, A., Hiniker, A., & Wobbrock, J. O. (2021). Understanding blind screen-reader users’ experiences of digital artboards. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3411764.3445242
Soro, A., Brereton, M., & Roe, P. (2016). Towards an analysis framework of technology habituation by older users. Proceedings of the 2016 ACM Conference on Designing Interactive Systems, 1021–1033. https://doi.org/10.1145/2901790.2901806
Webber, S., Carter, M., Smith, W., & Vetere, F. (2020). Co-designing with orangutans: Enhancing the design of enrichment for animals. Proceedings of the 2020 ACM Designing Interactive Systems Conference, 1713–1725. https://doi.org/10.1145/3357236.3395559