Guest Column | July 21, 2021

Selecting The Best Human Factors Method For Your Medical Or Drug Delivery Device Evaluation

By Natalie Abts, Genentech

Over the past 10 years, the medical device industry has advanced substantially in its thinking about human factors. Often thought of initially as a “check box” activity and an additional burden on developers, many companies have learned the value not only of meeting regulatory human factors requirements but of activities that go beyond the minimum needs. A robust human factors process has become critical to product optimization and market competitiveness.

However, not all companies have the resources, funding, or time to conduct iterative, robust user studies with large numbers of participants. Fortunately, there is a wide array of possibilities when it comes to human factors testing, and not only are there many options for smaller-scale activities that require less time and fewer resources, but these activities are often better choices than studies that follow the standard methods required for human factors validation. Companies can customize their human factors program based on the needs of the product, including the stage of development, prototype fidelity, and the type of product or specific use issues under investigation.  

Though there are many human factors tools and methods, there are three major categories of evaluation that can be considered: subject matter expert (SME) evaluation, remote user testing, and in-person simulated use testing. In this context, “user testing” generally refers to a series of study sessions in which a participant performs tasks and answers questions while a moderator observes. Each method has its benefits, depending on the specific product needs.

1. Subject Matter Expert Evaluation

Human factors SME evaluations typically take the form of a heuristic evaluation. This technique is used to assess compliance of a product with regard to human factors design principles and a predetermined set of heuristics (Nielsen’s 10 heuristics1 and the Nielsen-Shneiderman heuristics for medical devices2 are commonly used sets). Typically, three or more human factors SMEs apply the heuristics to a design, identify heuristic violations, and assess the severity of violations to make recommendations for improvement. Though many heuristic sets are created with software evaluation in mind, they can typically be adapted to evaluate non-software products, accessories, written materials, or other elements of the device user interface.

Heuristic evaluations have many benefits, especially when used early in the design process. In fact, executing heuristic evaluations early is a great way to identify and mitigate potential usability problems before going through the time and expense of setting up formal user studies, and seasoned professionals who understand human behavior and cognition can often quickly identify design flaws likely to facilitate use errors. Heuristic evaluations can also be executed on low-fidelity prototypes such as photos, animations depicting device operation, or static screen shots from software interfaces, so issues can be identified before designs are locked down and prototype designs are too costly to change.

One thing to keep in mind when deciding to execute a heuristic evaluation is their predictive nature. For complex devices and use issues, the solutions are not always straightforward, and additional testing is often needed to confirm the viability of design changes. Additionally, some issues may require specific end user input to identify (e.g., if the language used in labeling matches a clinician’s mental model).  Therefore, if you only have time to execute one formative stage activity, a user test is more appropriate.

2. Remote User Testing

Although remote user testing has received increased attention in the past year due to the COVID-19 pandemic and the need to reduce in-person activities, this has long been a technique utilized in the human factors industry. In these situations, users can either be shipped materials to use during study sessions or view a product presented by the study moderator or in some other format on the computer screen. Costs can be drastically reduced by alleviating the need to book research facilities, provide extra participant compensation to cover travel costs, and incur other expenses for research personnel. Flexibility in scheduling is also optimized, and the ease of participation and rescheduling for users can lead to lower dropout rates.

However, many researchers have experienced the downsides of remote testing, such as the higher potential for user distraction, technical difficulties, and limited camera angles leading to reduced data fidelity. Remote interaction also decreases the possibility that the study moderator can stop an unsafe act from occurring (e.g., a participant is about to experience a needle stick). Thus, barring recent pandemic safety-related concerns, remote testing should be used thoughtfully in situations in which the fidelity of the desired data is similar to what can be achieved in person, or when supplemental evaluations are planned.

Product components such as software and instructional materials are particularly conducive to remote testing since there are minimal differences in how users interact with the product. Physical devices can also be evaluated remotely but are often best suited to early-stage research with low-fidelity prototypes. In these instances, there is not typically an expectation that a user could operate the product in a manner consistent with real-life use, so the lack of realism in the use environment and the need to perform tasks within camera view have less impact on the end result. Products that are close to their final design and are fully operational can also be evaluated effectively with the right logistical considerations, though this is not recommended for validation stage studies.

The most difficult prototypes to evaluate remotely are those that lie somewhere in the middle. If prototypes are partially functional and require the user to execute a combination of performing tasks, talking through tasks that cannot be fully executed, and receiving instructions from the moderator, this can lead to divided attention and become confusing for participants in a remote setting, and it can also be more difficult for researchers to capture all desired data points. Partially developed prototypes are easier to assess in person, where the study moderator has greater ability to control the activities of the session.

3. In-Person Simulated Use Testing

In-person user testing is the most well-known form of human factors evaluation, and likely what comes to mind when most people picture a usability study. Participants complete tasks in a research facility or other simulated use environment while a moderator observes, and other personnel often observe remotely or from behind a one-way mirror. This is the most robust method of evaluation and is consistent with how the majority of human factors validation studies are executed. Activities can be more closely controlled and monitored than with remote testing, confounding variables are minimized, and the simulated use environment can be customized to meet a variety of needs.

In-person simulated use testing also has advantages in versatility of what can be evaluated. Whether assessing wireframes, written materials, products of varying degrees of fidelity, or a combination of product elements, in-person evaluations can be easily adapted to accommodate needs. Though it is possible to do this remotely, in-person testing is the more suitable method as sessions become more complex. Testing in a research or laboratory setting also provides options for setting up stations in multiple areas or rooms where different types of activities can be conducted, and if decay periods (i.e., breaks between activities) are incorporated, there are more options for controlling what the participants are exposed to during that time, and comfortable break areas away from the data collection space can be utilized.

Though in-person simulated use studies are the closest approximation to mimicking realistic use, this is also the highest cost and most complex activity to plan and execute. Recruiting can be time-consuming, and researchers should plan on engaging extra participants with the anticipation of higher dropout rates. Thus, this may not be the best method when evaluating a limited set of device characteristics or components that require only short data collection sessions. In these instances, remote sessions may be more practical and efficient.

User Selection

Considering the makeup of your study participant cohort is an additional way to customize studies based on immediate needs. Though participants in medical device studies are typically the target device users, it is not always necessary to seek out fully representative participants, especially if they have a very specific combination of characteristics that can make recruiting difficult.

In some instances, participants can come from inside your company. Though participants may have some biases (which is less problematic in large companies developing multiple products), there are advantages when looking to gather feedback quickly and for low cost. This is especially applicable for products used by a patient population in which any lay person can represent a naïve user who may be prescribed use of a device following a diagnosis. It is important, however, that internal participants are not directly involved with product design and development, as they are likely to be too familiar with the product to act as a naïve participant. Additionally, if there are issues under investigation that are safety-critical or require a specific perspective to understand (e.g., that of a specialty healthcare provider), this type of investigation should only be used to gather preliminary feedback rather than as a final form of testing.

The standard approach of recruiting external participants also has some flexibility regarding use of the specific target population versus surrogate users. The most common use of surrogates in the medical device space is to substitute healthy volunteers for users with specific medical conditions when evaluating patient-facing devices. Much like internal company-employed participants, external healthy volunteers can also fill the role of a general naïve user. This tactic is especially applicable if the product is novel (i.e., no user would have previous experience with similar devices) or if specialized knowledge, such as understanding terminology related to a specific medical condition, would not be expected to impact device use.

When using surrogates, certain types of use issues may be more difficult to detect during testing. For example, patients who have previous experience using other medical devices to treat their condition can experience negative transfer, in which a user applies their current knowledge to a new situation for which it may not be applicable (e.g., a user attempts to perform an injection without removing the cap on a syringe because they assume the needle will project through the cap like an autoinjector). Recruiting surrogates with previous device experience is an option, but the target end users are the most appropriate participants when seeking to understand the unique experiences of the specific user population.

Though recruiting the target users for a human factors study is always a reliable choice, having multiple options for your recruits can alleviate some common constraints and, for very specialized users, help developers to avoid “using up” participants for a validation study, as those who have participated in formative tests generally cannot take part in validation due to their previous familiarization.


In-person simulated use testing with the target device end users is the gold standard for medical device human factors evaluations, and companies generally cannot go wrong with doing these types of studies. However, developers can thoughtfully consider different human factors methods to meet their needs. Customizing activities can help companies work within their constraints and design studies that yield the most useful data at the right time.


  1. Nielsen J. 10 usability heuristics for user interface design,
  2. Zhang, J., Johnson, T.R., Patel.V.L., Paige, D.L., and Kubose, T. (2003). Using usability heuristics to evaluate patient safety of medical devices. Journal of Biomedical Informatics, 36 (2003); 23-30.

About The Author:

Natalie Abts is the head of human factors engineering at Genentech, where she manages a team of engineers conducting human factors assessments for drug delivery devices. Before joining Genentech, she worked as a consultant providing advice on human factors considerations for medical products. Abts has specialized experience in planning and executing both formative stage usability evaluations and validation studies for medical devices and combination products on the FDA approval pathway. She holds a master’s degree in industrial engineering, with a focus on human factors and ergonomics, from the University of Wisconsin, where she was mentored by Dr. Ben-Tzion Karsh.