Keywords

1 Introduction

Many gameful design methods have recently emerged as part of the user experience (UX) design toolkit. They aim to augment and improve the UX of interactive systems with gamification—defined as using game design elements in non-game contexts [1]. Even though these tools have been increasingly adopted during the design phase of software projects, designers still lack standard evaluation methods. There are no guidelines for experts (i.e., people with background knowledge in UX) to evaluate a gameful implementation early on in a project.

For usability evaluation, two standard approaches exist. First, the gold standard is a usability test. UX researchers can either run a formative usability test (where they usually sit close to the participant and observe their behaviour) or a summative one (where they are often present locally or virtually, but the participant is working through an assigned task or scenario while some outcome measures are recorded). However, the second type (the heuristic evaluation or usability inspection) is cheaper and easier to set up—and can be conducted before planning an expensive usability test. Heuristic evaluation or usability inspections allow experts to evaluate a design based on a set of principles or guidelines (i.e., heuristics). These are fast and inexpensive methods that can be used to identify and address design issues.

These expert guidelines date back to the early days of software design (e.g., Smith and Mosier [2]) and have over the past decades improved how we develop software and interactive applications. In the established areas of UX, heuristic evaluation or inspection methods [3, 4] are commonly used as evaluation tools during the project design and implementation phases. These are not meant to replace user testing, but rather complement the set of evaluation tools. While it has become more common to conduct user tests with gamified applications (just as games user researchers have done in the video game industry), the domain is still lacking robust methodologies for evaluating gameful designs.

The benefit of using a gamification inspection method is that it allows rapid and early evaluation of a gameful design. While several studies have investigated the effectiveness of gameful applications by studying their users [5], user tests are conducted after a prototype has already been implemented. Although concerns have been voiced that heuristic evaluation can be influenced by subjective interpretations [6], it remains a valuable tool for practitioners, who operate under tighter time constraints than researchers. Heuristic evaluation affords researchers a finer focus in the user tests that are usually done subsequently to this initial validation, since the most basic issues will have already been discovered at that point.

While UX tests focus on identifying issues related to usability, ergonomics, cognitive load, and affective experiences, gamification is concerned with understanding and fostering the user’s motivation to use a product, system, or service. Thus, gamification methods rely on motivational psychology research, such as self-determination theory (SDT) [7,8,9,10], to understand human motivation. Our heuristics were informed by this theoretical framework.

Several gameful design frameworks and methods have been suggested [11, 12] with prescriptive guidelines for augmenting an application with motivational affordances (note that we refer to gamification and gameful design interchangeably because both frame the same set of phenomena from different points of view [1]). Motivational affordances are properties added to an object, which allow its users to experience the satisfaction of their psychological needs [13, 14]. In gameful design, motivational affordances are used to facilitate intrinsic and extrinsic motivation. Thus, motivational affordances supporting a user’s feelings of competence, autonomy, and relatedness can facilitate intrinsic motivation, whereas external incentives or rewards facilitate extrinsic motivation.

Our work contributes to the human-computer interaction (HCI) and gamification communities by presenting a new set of guidelines for heuristic evaluation of gameful design in interactive systems. We began our research by reviewing several gameful design frameworks and methods to identify which dimensions of motivational affordances were common among them. Next, we created a set of heuristics focused on each of the identified dimensions. The resulting set of heuristics provides a new way of evaluating gameful user experiences. It is the first inspection tool focused specifically on evaluating gameful design through the lens of intrinsic and extrinsic motivational affordances. The aim of our inspection tool is to enable any UX expert to conduct a heuristic evaluation of a gameful application more easily, even if they have no background expertise in gameful design or motivational psychology.

To evaluate the proposed heuristics, we conducted a study with five UX or HCI professionals who evaluated two online gameful applications. Three participants used our gameful design heuristics, while the remaining two used a two-page description of gamification and motivational affordances. Results showed that usage of our heuristics led to more motivational issues being identified in the evaluated applications, as well as a broader range of identified issues, comprising a larger number of different dimensions.

2 Related Work and Model Development

2.1 Heuristic Evaluation for Games

In usability engineering, heuristics are broad usability guidelines that have been used to design and evaluate interactive systems [15]. Heuristic evaluation is the use of these principles by experts in a usability inspection process to identify usability problems in an existing design as part of an iterative design process [3, 4]. These inspections are usually done early in the design process to identify application errors before scheduling user tests.

Several authors have suggested heuristic evaluation models for games. These models vary both in their goals and in the dimensions they address: while some are more general, aimed at evaluating any game genre or type, others are more focused for example on networked or mobile games. Some of the most relevant heuristic evaluation models for game design are shown in Table 1.

Table 1. Existing heuristic evaluation models for games.

Some heuristics for evaluating games or playability may also be applied to gameful applications. Some of the dimensions addressed by most game design heuristics are of relevance to gameful design, such as goals, challenge, feedback, and social interaction. However, heuristics for games include several dimensions that are not applicable to most gameful applications, such as control and concentration.

Additionally, some of the game heuristics cover issues that can be addressed in gameful applications using general UX principles, such as screen layout or navigation. These heuristics might be necessary when evaluating games because game design often uses its own user interface principles, which can be different from traditional application interfaces. However, most gameful applications follow current design standards for user interfaces; thus, general UX evaluation methods can be easily applied to gameful applications to address issues such as usability or ergonomics.

Game design heuristics do not cover the full range of common motivational affordances used in gamification. For example, meaning, rewards, and scarcity are dimensions of motivational affordances often used in gameful design that are not covered by existing game heuristics. This makes it difficult to use game design heuristics to evaluate gameful applications. In order to do so, an evaluator would have to decide first which dimensions from the game heuristics should be used and which should not; next, they would also have to be concerned with motivational issues that are not currently covered by game heuristics. Consequently, we conclude that we need an inspection method better suited to assess gameful applications.

Before creating our set of gameful design heuristics, we reviewed the abovementioned game heuristics and considered the possibility of extending the existing models rather than proposing a new one. However, we encountered the same issues mentioned above: we would have to separate which heuristics from the existing models are applicable to gameful design and which are not. The resulting model would be confusing and difficult to apply. Therefore, we decided to create a new set of gameful design heuristics by analyzing existing gameful design methods rather than analyzing and extending existing game design heuristics.

2.2 Heuristic Evaluation for Playful Design

The Playful Experiences (PLEX) Framework [16, 17] provides an understanding of pleasurable user experience, which can be applied to both games and gameful applications. It classifies playful experiences according to 22 categories (see Table 2).

Table 2. The 22 categories of the PLEX framework.

The PLEX framework can be used as a tool for heuristic evaluation of gameful interactive systems, similar to the gameful design heuristics we are presenting. Nevertheless, PLEX is focused on classifying the types of experiences that the system can afford, rather than the motivational potential of these experiences. Therefore, the PLEX framework and the gameful design heuristics are two complementary tools, which can each provide insights into different characteristics of interactive systems that work together to afford an enjoyable user experience.

2.3 Review of Gameful Design Methods

To the best of our knowledge, no extant set of heuristics is available for evaluating motivation in gameful design. Some of the existing gameful design methods, namely Octalysis [18], HEXAD [19], and Lens of Intrinsic Skill Atoms [11], suggest procedures to evaluate an existing system. Nevertheless, these procedures only provide a starting point for the design process. They are less suited for being used as an evaluation tool by a quality control team because they lack a concise set of heuristics with brief descriptors which could be quickly checked by a UX practitioner. Moreover, the lack of a succinct rubric implies that an evaluator would need to study the methods intensively before being able to conduct an evaluation. Therefore, presently, there is no evaluation method for gameful applications that can be easily learned by UX professionals who are not familiar with gameful design. Our research fills this gap.

Several gameful design frameworks and methods are currently available (see [11, 12, 20] for comprehensive reviews). Therefore, we decided to review these existing methods to extract the different dimensions of motivational affordances that need to be considered in gameful design. Since the reviewed methods synthesize the current set of best practices in gameful design, we considered that they could provide an adequate starting point to identify motivational dimensions of concern. However, only a few of the reviewed methods feature a classification of motivational affordances in different dimensions, which we could use as a theoretical background to devise our heuristics. This was unfortunate, since our goal was to use these dimensions of motivational affordances as the starting point for the development of our framework. Gameful design methods that do not provide a classification of dimensions of motivational affordances would not be helpful in creating our gameful design heuristics. Therefore, we expanded the scope of our analysis to include methods that presented some sort of classification of motivational affordances. Table 3 lists the frameworks and methods we considered, as well as the rationale for their inclusion or otherwise in our analysis.

Table 3. A summary of the gameful design frameworks & methods considered in our research.

After reviewing the frameworks and methods and selecting six of them for further analysis (see Table 3), we conducted a comparison of the motivational dimensions in each model to map the similarities between them, using the following procedure:

  1. 1.

    The first framework was added as the first column of a table, with each one of its suggested motivational dimensions as separate rows. We chose the Octalysis framework as the first one because it comprised the highest number of dimensions (eight), which facilitated subsequent procedures/steps, but we could have chosen any of the frameworks as a starting point.

  2. 2.

    Next, we added each one of the remaining models as additional columns into the table. For each added model, we compared each one of its suggested dimensions with the rows that already existed in the table. When the new dimension to be added corresponded to one of the dimensions already in the table, we added it to the relevant existing row. Otherwise, we added a new row to the table creating a new dimension. In some cases, the addition of a new dimension also prompted the subdivision of an existing row. For example, the competence dimension was split into challenge/competence and completeness/mastery.

  3. 3.

    After adding all the models to the table, we observed the characteristics of the dimensions named in the rows and created for each of the latter a unique label, comprising the meaning of all the dimensions it encompassed.

The resulting model consists of twelve common dimensions of motivational affordances (see Table 4). The similarity analysis between dimensions of different models was conceptual, meaning that we studied the description of each dimension as presented by their original authors and decided whether they represented the same core construct as any of the dimensions already present in the table. Similarly, we derived the labels for each one of the twelve resulting dimensions (first column of Table 4) by identifying the core concepts of each dimension. In the resulting classification, we noted that these dimensions were strongly based on: (1) the theories of intrinsic and extrinsic motivation (SDT; [7,8,9]), (2) behavioural economics [21], and (3) the practical experience of the authors of the analyzed frameworks. The entire initial analysis was conducted by one of the researchers; next, three other researchers (co-authors) also analyzed the resulting table. We then conducted an iterative loop of feedback and editing until none of the researchers had additional suggestions to improve the final model.

Table 4. Dimensions of motivational affordances from the reviewed gameful design methods.

3 Gameful Design Heuristics

Our set of heuristics enables experts to identify gaps in a gameful system’s design. This is achieved by identifying missing affordances from each of the dimensions.

Prior to creating the heuristics, we reviewed the research on motivation [7, 8] to help categorize the twelve dimensions into intrinsic, extrinsic, and context-dependent motivational categories. This is a common practice in gameful design and many of the reviewed methods also employ a similar classification. Although it is a simplification of the underlying theory, this simple categorization helps designers and evaluators better understand the guidelines and focus their attention on specific motivational techniques. We chose SDT as the theoretical background for this classification because it is the motivational theory most frequently employed in gameful design methodologies [11, 12].

We used the following criteria to split our heuristics into categories:

  • Intrinsic motivation includes affordances related to the three intrinsic needs introduced by SDT [7, 8] (competence, autonomy, and relatedness), as well as ‘purpose’ and ‘meaning’ as facilitators of internalization [22,23,24] and ‘immersion’, as suggested by Ryan and Rigby [9, 25] and Malone [26].

  • Extrinsic motivation includes affordances that provide an outcome or value separated from the activity itself as suggested by SDT [8] and Chou [18]: ownership and rewards, scarcity, and loss avoidance.

  • Context-dependent motivation includes the feedback, unpredictability, and disruption affordances, which can afford either intrinsic or extrinsic motivation depending on contextual factors. For example, the application can provide feedback to the user regarding either intrinsically or extrinsically motivated tasks; therefore, feedback might afford intrinsic or extrinsic motivation according to the type of task with which it is associated.

We constructed the heuristics based on an examination of the literature cited in Table 4, by writing adequate guidelines for each of the twelve identified dimensions. Following the literature review, we created these guidelines by studying the descriptions of each dimension in the original models, identifying the main aspects of each dimension, and writing concise descriptions of each aspect to assist expert evaluation. We employed the following procedure:

  1. 1.

    For each one of the twelve motivational dimensions, we first studied the underlying concepts and wrote a short description of the dimension itself, aimed at guiding expert evaluators’ understanding of each dimension.

  2. 2.

    Next, for each dimension, we identified the main aspects of concern, meaning the aspects that should be considered by designers when envisioning a gameful system, as suggested by the reviewed frameworks or methods. We argue that these aspects of concern, when designing a system, should also be the main points of evaluation.

  3. 3.

    For each aspect of concern, we then wrote a concise description aimed at guiding experts in evaluating whether the aspect being scrutinized was considered in the evaluated system’s design.

Tables 5, 6 and 7 present the final set of 28 heuristics organized within the 12 dimensions, following the initial analysis, framing, and iterative feedback mentioned above, and which have been presented previously in a work-in-progress [27].

Table 5. Intrinsic motivation heuristics.
Table 6. Extrinsic motivation heuristics.
Table 7. Context-dependent heuristics.

Additionally, we have extended the gameful design heuristics by writing a set of questions for each heuristic. These questions inquire about common ways of implementing each guideline, helping the evaluators assess whether the guideline is implemented in the system at all. We do not include the complete set of questions here because of space constraints, but we provide them in our websiteFootnote 1.

3.1 Using the Gameful Design Heuristics

Similar to previous heuristic UX evaluation methods, gamification heuristics should be used by experts to identify gaps in a gameful system’s design. Experts should consider each guideline to evaluate whether it is adequately implemented into the design. Prior studies have shown that evaluations conducted by many evaluators are more effective in finding issues than those conducted by an individual evaluator [3, 28, 29]. Thus, we recommend the evaluation to be conducted by two or more examiners.

When applying the heuristics, the evaluators should first familiarize themselves with the application to be analyzed and its main features. Then, for each heuristic, they should read the general guideline and observe the application, identifying and noting what the application does to implement this guideline. Next, they should read the questions associated with the heuristic and answer them to identify possible gaps in the application’s design. The evaluation is focused on observing the presence or absence of the motivational affordances and, if the evaluator has enough expertise, in evaluating their quality. However, it does not aim to observe the actual user experience, which is highly dependent on the users themselves in addition to the system. Therefore, this method cannot evaluate the user experience; its goal is to evaluate the system’s potential to afford a gameful, engaging experience. As we have stated before, the heuristic evaluation should be subsequently validated by user studies to establish whether the observed potential translates into actual gameful experiences.

It is important to note that the questions associated with each heuristic act as guidelines to facilitate the evaluation process. They are not intended to represent every aspect related to the heuristic. Therefore, it is important that the evaluator also thinks beyond the suggested questions and considers other issues that might be present in the application regarding each heuristic.

After evaluating all the dimensions, a count of the number of issues identified in each dimension can help identify which motivational issues (from the heuristics) require more attention in improving the system’s potential to engage users.

3.2 Turning the Evaluation Results into Actionable Design

Since the gameful design heuristics are an evaluation method, they do not provide the means to turn the identified issues into actionable design ideas to improve the application’s design. Although the heuristics identify what dimensions of motivational affordances are implemented in, or excluded from the system, they do not provide any information about the need (or otherwise) to implementing the missing dimensions. Depending on the goals of the gameful software being developed, including motivational affordances for all dimensions might be either necessary or unimportant. Therefore, we suggest that the identified design gaps should be considered within an iterative gameful design method, which can then provide the tools to assess the need for including new motivational affordances into the system to address the gaps. The methods used to inform the development of the heuristics (see Table 4) are adequate for this goal because they make it easy to map the dimensions where gaps are identified to the design element categories suggested by these gameful design methods.

4 Evaluation

We conducted a summative study with five UX or HCI experts to evaluate the gameful design heuristics. We asked participants to evaluate two online gameful applications: HabiticaFootnote 2 and TermlingFootnote 3. Data were collected between August and December 2016.

Three participants (P1, P2, P3) conducted the evaluation using the heuristics and the remaining two (P4, P5) without it, enabling us to compare how many motivational design issues were found by experts with and without the heuristics. Furthermore, three participants (P1, P3, P4) had expertise in gamification or games, whereas two (P2, P5) were knowledgeable in UX or HCI, but did not have a specific background in gamification. This enabled us to assess if prior gamification expertise would influence the evaluators’ ability to identify motivational design issues.

4.1 Participants

We initially invited 18 experts in UX, HCI, or gamification to participate in the study. Potential participants were selected from the authors’ acquaintances and from previous project collaborators. The criterion was that potential participants should have an expertise either in gamification or games (including design practice or research experience) or in using other UX or HCI methods to evaluate interactive digital applications. Potential participants were contacted by email or in person. No compensation was provided for participation.

From the 18 invited participants, 10 initially agreed to participate and were sent the instructions; of these only five participants completed the procedures (likely because of scheduling difficulties and the lack of compensation). Of these five, two participants completed the evaluation of Habitica only; however, we decided to include their feedback in the study anyway. This meant that we collected five evaluations for Habitica, but only three for Termling. Table 8 summarizes the demographics of the participants.

Table 8. Participant demographics.

4.2 Procedure

Initially, participants read and signed a consent form and filled out a short demographic information form (see Table 8). Next, the instructions to evaluate the two applications were sent out. Since both applications were free and available online, participants were instructed to create a free account to test them. We instructed participants P1 and P2 to carry out the evaluation without the gameful design heuristics and participants P3, P4, and P5 to use the heuristics. Assignment to experimental conditions was not random because we needed to ensure that we had participants with and without gamification expertise in both conditions (with or without the heuristics).

The instructions for P1 and P2 contained a one-page summarized introduction about gamification and motivation, followed by instructions requesting them to reflect on the applications’ design and motivational affordances, try to understand how they afford intrinsic and extrinsic motivation, and then list any issue they identified related to the motivational affordances (or lack of the same).

Participants P3, P4, and P5 received information that contained the same introduction about gamification and motivation, followed by an introduction to the gameful design heuristics, and instructions that asked them to reflect on the applications and identify motivational issues using the gameful design heuristics. Participants were given a complete copy of the gameful design heuristics to guide them during the evaluation, including the full list of heuristics with all the accompanying questions to guide the evaluation (see Sect. 3). The heuristics were formatted as a fillable form, which offered an additional column where participants could take notes about the issues observed in the applications. After receiving the instructions, participants could conduct their evaluations at their own pace and discretion; they were not supervised by the researchers. After completing the evaluation, participants emailed the forms back to the researchers.

4.3 Results

Table 9 shows the number of issues found in the two evaluated applications by the participants. Overall, participants who used the gameful design heuristics identified more issues than those who did not use any heuristics.

Table 9. Number of issues found by participants.

The number of issues identified by the participant who had no prior gamification expertise and used the heuristics (P5) was just slightly higher than the participants who did not use the heuristics (P1 and P2), whether they had gamification expertise or not. However, it is noteworthy that the heuristics helped P5 identify issues in more dimensions than did P1 and P2: while P5 identified issues in 10 different dimensions for Habitica, P1’s and P2’s issues were concentrated in only six dimensions.

Moreover, congruent to our intentions, the heuristics helped evaluators focus their analyses on the motivational affordances instead of other usability issues or bugs. This is demonstrated by the fact that P1 and P2 both reported some issues that were not related to the motivational affordances at all (e.g., usability issues or bugs), whereas P3, P4, and P5 only reported motivational issues.

Furthermore, a qualitative comparison of participants’ responses shows that when they used the heuristics, their comments were generally more focused on the motivational aspects, whereas the comments from participants who did not use the heuristics were more general. For example, regarding Habitica’s onboarding, P1, P2, and P3 mostly recognized the fact that some information or tutorial material is missing or hidden. However, they do not comment on how this would affect the user’s motivation. On the other hand, P4 and P5 could point out that, although a set of instructions existed, it did not motivate the user because it was not challenging or fun. Thus, it seems that the heuristics are useful in focusing the evaluator’s attention into the motivational issues of the application.

Additionally, the participants who had prior gamification expertise and used the heuristics could identify approximately twice as many motivational issues as the participant who also had gamification expertise but did not use the heuristics. In comparison, P3 found 12 and P4 found 16 motivational issues in Habitica, whereas P2 found only eight. In Termling, P4 found 24 motivational issues while P2 found only 12.

We can also observe that the motivational dimensions where participants classified the issues sometimes differ. However, this is not a characteristic specific to our tool, but it is a known fact of heuristic evaluation in general that a single evaluator usually does not notice all the existing issues. This is why it is recommended that a heuristic evaluation should be conducted by a number of experts instead of only one [3, 28, 29]. This way, by combining all the issues identified by the different experts, good coverage of the total issues existing in the system will be achieved.

In summary, the results provided the following evidence:

  • A participant who had no prior gamification expertise, but used the gameful design heuristics, could find as many motivational issues as participants who did not use the heuristics (with or without prior gamification expertise), but in a broader range of motivational dimensions;

  • Participants who had prior gamification expertise and used the gameful design heuristics could find twice as many motivational issues than participants who did not use the heuristics or did not have prior expertise;

  • Using the gameful design heuristics helped participants focus their analyses on the motivational issues, avoiding any distraction with other types of problems.

5 Discussion

We have created a set of 28 gameful design heuristics for the evaluation and identification of design gaps in gameful software. Due to the lack of direct applicability of existing heuritics from game design, we deliberately decided to create a new set of heuristics specific to gameful design, based on motivational theories and gameful design methods, rather than extending the existing heuristics for game design. By deriving our set of heuristics from common dimensions of motivational affordances employed by different gameful design methods, we have presented a novel and comprehensive approach that encompasses a broad range of motivational affordances. Furthermore, to enable expert evaluation, the heuristics are written in a concise form, together with supportive questions for reflection.

Our study with five UX and HCI experts provided empirical evidence that:

  • gameful design heuristics can help UX evaluators who are not familiar with gamification to evaluate a gameful system at least as well as a gamification expert who does not use the heuristics; and

  • gameful design heuristics can greatly improve the ability of gamification experts to perform a heuristic evaluation, leading them to find twice as many issues as they would find without the heuristics.

The implications of our findings are twofold. First, we provide evidence that evaluation of gameful applications without a support tool is subjective; therefore, even gamification experts might miss important issues. A probable reason for this is the complexity of gameful design and the number of motivational dimensions involved. Second, we demonstrate that usage of the gameful design heuristics can significantly improve the results of heuristic evaluations conducted both by gamification experts and non-experts. Considering that gameful design still suffers from difficulties in reproducing some of the successful results and that several studies have reported mixed results [5, 30], our work sheds light on one of the probable causes for this. Consequently, the gameful design heuristics represent an important instrument, which can be used to improve the chances of building effective gameful applications.

Nevertheless, the study was limited by the small sample size. Thus, although these initial results seem promising, future studies will be needed to support them. Additionally, even though the proposed method was meant to be generic enough to work in any heuristic evaluation of gameful applications, future studies will need to consider diverse usage scenarios to investigate if adaptations are needed for specific purposes.

6 Conclusion

Evaluation using heuristics is a way of identifying issues during various stages of software development, ranging from ideation, design, and prototyping to implementation and tests. While many heuristics exist in various fields such as usability and game design, we still lacked guidelines specific to gameful design due to the differences in types of solutions emergent from this domain. Therefore, our work addresses this gap and contributes to gameful design research and practice by identifying key motivational dimensions and presenting a novel evaluation tool specific for gameful systems. This gameful design heuristics provides a method of evaluating interactive systems in various stages of their development. The suggested method fulfills a need for UX evaluation tools specific to gameful design, which could help evaluators assess the potential UX of a gameful application in the early phases of the software project. The expert evaluation of the gameful design heuristics provided information that the heuristics enabled experts to identify the presence of motivational affordances from several dimensions, as well as the absence of specific affordances from other dimensions. This is valuable information, which could help software developers and systems designers to incorporate the missing elements. We expect the gameful design heuristics to be of use to both researchers and practitioners who design and evaluate gameful software, whether in research studies or in industry applications.