Skip to main content
Open AccessOriginal Article

Three Attempts to Replicate the Moral Licensing Effect

Published Online:https://doi.org/10.1027/1864-9335/a000189

Abstract

The present work includes three attempts to replicate the moral licensing effect by Sachdeva, Iliev, and Medin (2009). The original authors found that writing about positive traits led to lower donations to charity and decreased cooperative behavior. The first two replication attempts (student samples, 95% power based on the initial findings, NStudy1 = 105, NStudy2 = 150), did not confirm the original results. The third replication attempt (MTurk sample, 95% power based on a meta-analysis on self-licensing, N = 940) also did not confirm the moral licensing effect. We conclude that (1) there is as of yet no strong support for the moral self-regulation framework proposed in Sachdeva et al. (2009) (2) the manipulation used is unlikely to induce moral licensing, and (3) studies on moral licensing should use a neutral control condition.

People like to present themselves as good people, both to themselves and to others, to maintain a positive self-image and to feel like a moral person (Aronson, Cohen, & Nail, 1999; Schlenker, 1980; Steele, 1988). Furthermore, central theories of human behavior highlight humans’ desire for cognitive consistency in their thoughts, feelings, and behavior (Festinger, 1957; Heider, 1946). Intriguing research on moral licensing qualifies this desire for consistency by suggesting that individuals who behave in a morally laudable way, later feel more justified to perform a morally questionable action (Merritt, Effron, & Monin, 2010; Miller & Effron, 2010). Moral licensing is found to lead to a broad spectrum of undesirable behaviors. For example, after (reminders of) prior moral or socially desirable behavior people displayed more prejudiced attitudes (Effron, Cameron, & Monin, 2009; Monin & Miller, 2001), cheated more (Jordan, Mullen, & Murninghan, 2011; Mazar & Zhong, 2010), displayed a preference for hedonic over utilitarian products (Khan & Dhar, 2006), and indulged more in highly palatable foods (Mukhopadhyay, Sengupta, & Ramanathan, 2008).

An important contribution to the literature on moral licensing examines how writing about one’s own positive or negative traits can influence donations to charity and cooperative behavior in a commons dilemma (Sachdeva, Iliev, & Medin, 2009). In just 4 years since publication, this paper has been cited 129 times (Google Scholar, November 27, 2013). Based on their findings, the authors argued that this moral licensing effect can best be interpreted as part of a larger moral self-regulation framework where internal balancing of moral self-worth and the costs associated with prosocial behavior determine whether one will display (im)moral behavior. When the moral image of oneself is established, an immoral action is allowed without the fear of losing that moral image (moral licensing). However, when one appears immoral to others, positive actions are needed to restore the moral image (moral cleansing). The studies of Sachdeva et al. (2009) comparing licensing with neutral control conditions show medium-sized effect sizes (d = 0.62 ([CL95] −0.11 to 1.35) for Study 1 and d = 0.59 ([CL95] −0.12 to 1.30) for Study 3).1 However, note that because of the small sample sizes (N = 14 to 17 per condition), the obtained effects have large variances, implying that the true effect sizes could range from very small to very large.

There are no published direct replication attempts of the methodologies of Sachdeva et al.’s (2009) studies. Conway and Peetz (2012) conducted a study that was similar to Sachdeva et al.’s Study 1. However, this was not a direct replication because they adapted the procedure and added extra manipulations. We sought to replicate the studies by Sachdeva et al. to obtain additional insight in the complete moral self-regulation framework by testing for both moral licensing and moral cleansing effects contrasted to a neutral control condition.

We conducted high-powered replications of Sachdeva et al.’s (2009) Study 1 and Study 3 in Dutch student samples with 95% statistical power based on the effect size of the original studies. We did a third study with a US sample via Amazon’s MTurk with 95% power based on the effect size that we obtained in our meta-analysis on self-licensing (d = 0.26; Blanken, Van de Ven, & Zeelenberg, 2014). This study examined both dependent variables of original Studies 1 and 3 in a counterbalanced order. For all studies, we report how we determined our sample sizes, all data exclusions, all manipulations, and all measures.

Study 1 – Replication of Sachdeva et al.’s (2009) Study 1

Participants

Using G*Power (Faul, Erdfelder, Buchner, & Lang, 2009) we calculated that at least 63 participants were needed to achieve 95% power for the effect size of Sachdeva et al.’s Study 1 (2009; N = 46). We planned to collect data for one full week, and our sample consisted of 106 undergraduate students who participated for course credit. One was removed because this participant indicated a willingness to donate €100 to charity, more than 32 standard deviations from the mean donation response. The remaining 105 (25 males, 78 females, 2 unknown, Mage = 19.58) participants included native Dutch students (83.3%), non-native Dutch students (9.5%), and foreign students (4.8%). Participants were randomly assigned to either the positive trait (N = 35), negative trait (N = 34), or neutral control condition (N = 36).

Materials and Procedure

Participants completed the study as the first of a series of experiments behind separate desks in the laboratory. The experimenter in the laboratory was blind to condition. Prior to the experiment, participants provided their informed consent. The experimenter guided participants to their desks and instructed them to complete the paper-and-pencil questionnaire.

We obtained the original paper-and-pencil questionnaire from Sachdeva et al. (2009) and translated these materials into Dutch (for all materials see supplements section). The cover story indicated that the study was about handwriting styles. Depending on the assigned condition, participants were exposed to either nine positive trait words, nine negative trait words, or nine neutral words and were asked to copy each word four times and think about each word for 5–10 s. Next, participants were asked to write a short story about themselves including the words they just copied.

After this manipulation, participants responded to some filler items. Subsequently, the main dependent variable was presented. Participants read that the laboratory, in an effort to increase social responsibility, asked all participants whether they would like to contribute to a worthy cause. If they would like to do so, they could pledge to make a small donation to any good cause of their choice. They were told that they would be reminded of their choice at a later time via a confirmation e-mail from the experimenter. Participants could select to which cause(s) they would like to donate (cancer research, animal rights, ending world hunger, environmental preservation, human rights, veteran’s affairs, or other) and how much they would be willing to donate (from €0 up to €10 or another specified amount). Finally, participants completed seven self-presentation items from the Self-Monitoring scale (Lennox & Wolfe, 1984) and a set of demographic measures.

Known Differences From Original Study

The only known difference between our replication and the original Studies 1 and 2 of Sachdeva et al. (2009) was that we ran this study in a laboratory at a Dutch university, while the original study was conducted in a laboratory at a USA university. When participants were asked to write about the positive trait, neutral, or negative trait words, we used the exact instruction of the original Study 2, which explicitly stated that participants should use the nine given words to write a story about themselves. This was not done in Sachdeva et al.’s Study 1, although it was intended that participants would do so. As such, for this replication, we combined the best of Sachdeva et al.’s Study 1 (including a control condition) and Study 2 (the manipulation with the clearest instruction).

Results

Following our confirmatory analysis plan, we conducted Sachdeva et al.’s (2009) analysis to test the effect of writing about one’s own positive traits, negative traits, or neutral words on donation amount. Table 1 contains the mean responses per condition and statistical tests. There were no significant differences between the moral identity conditions on donation amount.2 The results of an additional regression model including gender, age, and ethnicity indicated that none of these factors significantly predicted donation amount (all ps ≥ .321). A reviewer suggested that self-monitoring might moderate the observed effects. It did not, p = .086. Analysis details for all studies are available in the supplements.

Table 1. Means, standard deviations, sample sizes, and test statistics for dependent variables in all studies

Exploratory Analysis

When reading the recalled stories, we noticed that 55.7% of the participants violated the instructions by not writing about themselves or by using the words in a negating way (for instance, “Alyssa is a generally friendly person with a caring and compassionate disposition” or “I am neither a very caring nor compassionate individual”). When we only used a post hoc selection of those that wrote about their own positive traits (N = 28) and compared it to the neutral control condition, there was still no difference on donation amount (p = .756).

Study 2 – Replication of Sachdeva et al.’s (2009) Study 3

Participants

Using G*Power (Faul et al., 2009) we calculated that we should include at least 96 participants in our study to achieve 95% power for the effect size that Sachdeva et al. (2009) obtained in their Study 3 (the original used N = 46). We planned to collect data for one full week, and our sample consisted of 150 undergraduate students who participated for course credit (27 males, 122 females, 1 unknown, Mage = 20.34) and included native Dutch students (87.3%), non-native Dutch students (7.3%), and foreign students (4.7%). All participants were randomly assigned to either the positive trait condition (N = 49), the negative trait condition (N = 52), or the neutral control condition (N = 49).

Materials and Procedure

Participants first provided informed consent, and then completed the study as the first of a series of experiments. The laboratory experimenter was blind to condition. The experimenter led participants to a separate cubicle and instructed them to complete the paper-and-pencil questionnaire.

The materials were the same as those in Study 1 except that the dependent variable was a hypothetical commons dilemma. In this commons dilemma, participants imagined a scenario in which they were the manager of a midsized industrial manufacturing plant. They read that all manufacturers reached an agreement to install filters to eliminate toxic gasses and to run these filters 60% of the time. Running a filter was costly for the manufacturing plant, but would be beneficial to society. To measure cooperative behavior, participants were asked to indicate what percentage of time they would operate the filters, indicated on an 11-point scale from 0 (labeled 0%) to 10 (labeled 100%).

After the main dependent variable, participants explained their decision and completed three secondary measures; they estimated (1) the percentage of other managers who would not cooperate, on the same 11-point scale; (2) the amount of environmental damage expected when the filters would be run less than the agreed 60% on an 11-point scale from 0 (none) to 10 (a great amount); and (3) the likelihood of getting caught when operating the filters less than 60% of the time on an 11-point scale from 0 (= impossible) to 10 (= certain). Finally, participants completed the seven self-presentation items from the Self-Monitoring scale (Lennox & Wolfe, 1984) and a set of demographic measures.

Known Differences From Original Study

The only known difference compared to the original study is that we ran this study in a laboratory at a Dutch university instead of a USA university.

Results

Following our confirmatory analysis plan, we conducted Sachdeva et al.’s (2009) analysis to test the effect writing about one’s own (im)moral traits on cooperation (the amount of time participants were willing to run the filters). There were no significant differences between the conditions on cooperative behavior (Tabel 1).3 Furthermore, there were no effects on the secondary variables (Table 2 ). The results of an additional regression model including gender, age, and ethnicity indicated that none of these demographic variables predicted cooperative behavior (all ps ≥ .257). Self-monitoring did not moderate the observed effects (p = .787).

Table 2. Means, standard deviations, and test statistics for secondary measures in Studies 2 and 3

Exploratory Analysis

We noticed that 48.5% of the participants violated the recall instructions and did not write about their own traits or used the words in a negating way. When we only used a post hoc selection of those who actually wrote about their own positive traits (N = 42), there was still no difference on cooperative behavior between the positive trait stories about oneself and the neutral control condition (p = .197).

Study 3 – Replication of Sachdeva et al.’s (2009) Study 1 and Study 3 With a General US Population Sample on MTurk

Participants

Whereas in Study 1 and Study 2, we based our sample size on a power analysis using the original studies, for Study 3 we did so based on the effect size of self-licensing that we obtained in the preliminary data of our meta-analysis (d = 0.26) (Blanken et al., 2014). We calculated with G*Power (Faul et al., 2009) that we would need at least 918 participants in our study to achieve 95% power to find a self-licensing effect. The sample was recruited on Mturk. We included an instructional manipulation check to prevent inattentive participants from starting the study (see Oppenheimer, Meyvis, & Davidenko, 2009). Participants were asked to provide an answer to three neutral questions about stories and were explicitly instructed to answer “five” on the first question, and “seven” on the second and third question. Participants who did not follow these instructions (N = 160) could not participate in our study. Our final sample consisted of 940 participants (449 males and 491 females, Mage = 33.41) who participated in exchange for $1.80.4 All participants were randomly assigned to the positive trait condition (N = 306), the negative trait condition (N = 308), or the neutral control condition (N = 326).

Materials and Procedure

Participants completed the study materials via the Qualtrics survey program. Participants could subscribe to participate in our study entitled “writing style and several questions” if they had an MTurk approval rate that was higher than 95% and if they lived in the US.

After finishing writing the stories with the positive traits, negative traits, or neutral words, participants answered the filler questions and both dependent measures from Sachdeva et al.’s (2009) Study 1 (donation amount) and Study 3 (cooperative behavior) in a counterbalanced order. Subsequently, participants completed the self-presentation items from the Self-Monitoring scale and a set of demographic measures.

Known Differences From Original Study

The study was conducted online. We made two slight changes to these materials to increase the credibility of the online study. First, for the cover story, we instructed participants that the study was about general writing styles instead of handwriting, as the latter would not be believable in an online study. Second, we changed the donation measure. We told participants that 10 of them would be randomly selected to win an additional $10 MTurk worker bonus. They were then asked that if they were one of the winners, would they be willing to donate a portion of this bonus to a cause of their choice from a list (cancer research, animal rights, ending world hunger, environmental preservation, human rights, veteran’s affairs, or other). Participants selected a cause and indicated the amount they would donate ranging from $0 to $10 (or more).

Results

Donations

Following our confirmatory analysis plan, we conducted Sachdeva et al.’s (2009) analyses to test the effect of writing about (im)moral traits on how much participants would want to donate to a good cause. We controlled for order effects by including the order in which the two dependent variables were presented as a separate independent variable in the model. Order did not affect the donation amount, F(1, 934) = 0.78, p = .378, ηp2 = .001, nor was there an interaction effect of order with the manipulation of what words participants wrote about, F(2, 934) = 0.42, p = .656, ηp2 = .001.

As Table 1 shows, there was a main effect of moral identity condition on donation amount.5 Post hoc Tukey tests indicated that participants in the negative trait condition donated more money than participants in the positive trait condition (p = .044) and participants in the neutral control condition (p = .020). There was no difference in donation amount between participants in the positive trait condition and participants in the neutral control condition (p = .729). Thus, we did not find a moral licensing effect, but we did observe a moral cleansing effect – the recall of negative traits increased subsequent moral behavior. Self-monitoring did not moderate the observed effects.

Of the demographic variables gender, age, education level, family income, and ethnicity, only age significantly influenced donation amount (β = .11, t(930) = 3.36, p < .001). When we included age as a covariate to the effect of the manipulation on donation amount, the effect of the manipulation remained significant, F(2, 932) = 3.15, p = .043, ηp2 = .007.

Cooperative Behavior

Next, we conducted Sachdeva et al.’s (2009) analyses to test the effect of moral identity condition on cooperation in a hypothetical commons dilemma. The order in which the dependent variables were presented did affect cooperative behavior, F(1, 934) = 11.20, p = .001, ηp2 = .012, with participants who first completed the donation dependent variable displaying slightly more cooperative behavior (M = 6.47, SD = 1.77) than participants who first completed this cooperative behavior dependent variable (M = 6.11, SD = 1.52). The interaction between moral identity and order was not significant, F(2, 934) = 0.83, p = .438, ηp2 = .002. We do not know why this order effect exists, but for the current study it is mainly important that we control for this possible influence by adding it as a factor in the analyses. As Table 1 shows, there was no main effect of moral identity on cooperative behavior,6 nor on the secondary variables (see Table 2). Again, self-monitoring did not moderate the observed effects.

Of the demographic variables, only one of the ethnicity dummy variables significantly influenced cooperation (with African Americans cooperating less than others, β = −.19, t(930) = −3.11, p = .002). When including ethnicity as a covariate, there was still no effect of moral identity condition on cooperative behavior, F(2, 933) = 0.81, p = .447, ηp2 = .002.

Exploratory Analyses

We noticed that 43.6% of the participants violated the recall instructions and did not write about their own traits or used the words in a negating way. Using solely the coded stories about oneself in our analyses, there was a main effect of moral identity condition on donation amount (p = .020) with participants in the negative trait condition donating more money than participants in the positive trait condition (p = .017) and in the neutral control condition (p = .009). There was no main effect of moral identity condition on cooperative behavior (p = .495).

General Discussion

We made three attempts to replicate the findings of Sachdeva et al. (2009) on moral licensing, with samples based on pre-calculated power and preplanned analyses. In the first two replication attempts using student samples, the data did not confirm the original results. In our third replication attempt using a general population sample the data did not confirm the moral licensing effect. We did, however, find support for the moral cleansing effect on one of the two dependent variables in Study 3, but not in Studies 1 and 2.

Current Status of the Moral Licensing Effect

We conducted a meta-analysis of this moral licensing effect by including both the original Studies 1 and 3 by Sachdeva et al. and the three current replication attempts, using the metafor package of Viechtbauer (2010). For our Study 3, we used the average effect size of the two dependent variables. The random effects meta-analysis including all five studies produced a mean effect size of moral licensing of d = 0.07 ([CL95] −0.20 to 0.35). There was thus no significant moral licensing effect across studies (z = 0.52, p = .603). Figure 1 contains an overview of all moral licensing effect sizes (when compared to the neutral control conditions).

Figure 1. Forest plot including all comparisons between the moral licensing and neutral control conditions of the original studies by Sachdeva et al. (2009) and our replication attempts.

Current Status of the Moral Cleansing Effect

We conducted a meta-analysis of this moral cleansing effect by including both the original Studies 1 and 3 by Sachdeva et al. and the three current replication attempts. The random effects meta-analysis including all five studies produced a mean effect size of moral cleansing of d = 0.04 ([CL95] −0.11 to 0.20). There was thus no significant moral cleansing effect across studies (z = 0.53, p = .593). Figure 2 contains an overview of all moral cleansing effect sizes (when compared to the neutral control conditions). However, note that only a small number of participants in the moral cleansing condition of the replication studies actually wrote about themselves.

Figure 2. Forest plot including all comparisons between the moral cleansing and neutral control conditions of the original studies by Sachdeva et al. (2009) and our replication attempts.

Possible Limitations of Our Replication Attempts

Although we did our best to design direct replications of the original studies, differences are inevitable, and some of those may be consequential for moderating the results. First, our Studies 1 and 2 used Dutch students not US students. There is no theoretical reason to expect different licensing effects for Dutch compared to US citizens, but our pilot test (see supplements) suggested that words in the positive moral trait condition were seen to be slightly more positive in the US than in the Netherlands. Even so, the words were evaluated very positively in both national samples. Study 3 used a US based sample, but this study differed on two aspects compared to the original study. It was conducted online instead of in the laboratory, and the manipulation involved donating a part of potential winnings instead of money out-of-pocket. We cannot rule out that these procedural differences were consequential, but there presently exists no theoretical reason or identification of these as boundary conditions on moral licensing.

Conclusion

Although Sachdeva et al. (2009) theorized that moral licensing and moral cleansing should be considered jointly as being part of a moral self-regulation process, our three high-powered studies did not replicate the key moral licensing effect. Further, the meta-analytic result suggests that the present state of evidence with this paradigm is not different from a null effect. Sachdeva et al. (2009, p. 524) suggested that their findings showed that “moral-licensing and moral-cleansing effects can act convergently as part of a moral self-regulation process.” Based on the present findings, we do not argue that the theory is incorrect, only that it lacks sufficient empirical support when using the Sachdeva et al. (2009) paradigm.

We suggest three concrete steps to clarify the effects of moral licensing on social judgment. First, the method used by Sachdeva et al. (2009) seems unlikely to elicit moral licensing, especially since many participants violated the recall instructions and did not write about their own traits or used the words in a negating way. This is a procedure-specific issue; it does not invalidate moral licensing more generally. Second, the meta-analysis of all licensing research suggests that the effect is relatively small (Blanken et al., 2014). Therefore, small sample studies are highly inadvisable as they would need to leverage chance to detect a result using null hypothesis significance testing. Third, because moral licensing and moral cleansing are theoretically distinct, it is important to use a neutral control condition to clarify the role of each in social judgment.

1Note that the overall differences between the three conditions (moral licensing, moral cleansing, and the neutral control condition) of Sachdeva et al.’s Study 1 and Study 3 were significant. For Study 1, no statistics on post hoc comparisons were reported. When calculating the Cohen’s d effect sizes comparing the moral licensing with the neutral control conditions, we found that for both studies, the confidence intervals included zero, indicating marginally significant moral licensing effects.

2A nonparametric independent-samples Kruskal-Wallis test (which controls for the skewness of the data), also found a nonsignificant effect, H(2) = 0.36, p = .837.

3A nonparametric independent-samples Kruskal-Wallis test (which controls for the not normally distributed data), also showed no effect, H(2) = 2.87, p = .238.

4We set the target higher than 918 to ensure a minimum of 918 valid participants after data exclusion.

5A nonparametric independent-samples Kruskal-Wallis test (which controls for the skewness of the data), found a similar effect, H(2) = 5.85, p = .054.

6A nonparametric independent-samples Kruskal-Wallis test (which controls for the not normally distributed data), also found a nonsignificant effect, H(2) = 2.68, p = .713.

References

This work was supported by a grant from the Center of Open Science. The authors gratefully thank Sonya Sachdeva and her co-authors for sharing all materials and information about the methodology and for her helpful comments. The authors declare no conflict-of-interest with the content of this article. Designed research: I.B., N.V., M.Z., M.H.C.M.; Performed research: I.B.; Analyzed data: I.B.; Wrote paper: I.B., N.V., M.Z., M.H.C.M. All materials, data, and the preregistered design are available at: https://osf.io/3cmz4/.

Irene Blanken, Department of Social Psychology, Tilburg University, PO Box 90153, 5000 LE Tilburg, The Netherlands,