Introduction

Tobacco, alcohol, and substance use disorders, which will be referred to as addiction in the present review, are all driven by a transition toward compulsive drug use characterized by a loss of control over drug intake, persistent drug use despite dreadful consequences, and frequent episodes of relapse. Among recreational users, only a subset ultimately lose control over drug use and develop an addiction. To explain this transition, several, often overlapping, theories have been proposed [1]. Among them, the influential but controversial habit theory of addiction posits that the transition to addiction emerges from the progressive development and dominance of drug habits over goal-directed control [2, 3]. Although drug habits appear omnipresent in any form of addiction, whether formation or expression of drug habits contribute to the transition to addiction remains a matter of debate.

The involvement of automatic processes in addiction was suggested 30 years ago in the seminal work of Tiffany [4]. Several diagnostic criteria for SUD are consistent with the concept of drug habit; notably, the persistence of drug use when it is no longer pleasurable and despite negative consequences, the high reactivity to drug-associated cues and context, and the fact that addictive behaviors appear out of voluntary control [1, 5, 6]. Habits are defined as automatic responses elicited by antecedent stimuli without deliberation or representation of the consequences of one’s action. Because habits do not depend on the response–outcome association underlying goal-directed behavior, they are generally operationalized as an absence of goal-directed behavior; that is, actions not affected by a reduction of the outcome value and/or by a degradation of the response–outcome contingency are under habitual control (Box 1) [7, 8]. Although these tests typically answer a yes-or-no question, habit and goal-directed systems likely control behavior along a continuum, and the balance between these two systems would be shifted toward habit in SUD.

However, the relation between drug use and habit remains controversial in humans, with mixed results and significant discrepancies [9, 10]. Furthermore, although the literature in rodents converges to show that drug exposure promotes habit, how drug habits favor further drug use and, ultimately, the transition to addiction remains unclear. In this review, we try to address this question by reviewing behavioral evidence supporting the habit theory of addiction in rodents and discussing important limitations, notably the absence of habit in choice settings. We then present new evidence of habitual behavior in a drug choice setting and propose several clues to explain our unexpected results in the light of the habit theory of addiction. We propose new perspectives on this theory that embrace the complexity of the decision-making environment of drug addicts and of interactions between decision-making processes.

Drugs promote habit

A large number of studies in rodents show that drugs of abuse promote habit. Following drug self-administration training, drugs can be devalued using either sensory-specific satiety or CTA before responding for the drug is tested under extinction (Box 1). Using this procedure, it was shown that responding for ethanol [11,12,13,14,15,16,17], cocaine [18, 19], and nicotine [20, 21] becomes habitual after various length of training. In some studies, the transition to habit was faster for the drug compared to a nondrug reward suggesting stronger facilitation of habit formation for drug seeking [11, 13, 15, 18, 21]. Interestingly, studies in which rats are trained to self-administer cocaine or heroin in a seeking-taking schedule (e.g., heterogeneous chains; seeking RI30—taking FR1 on separate levers) reveal that rats correctly encode the contingency between the seeking response, the taking response and the outcome, indicating that their behavior is under goal-directed control [22, 23]. However, it was also shown that the cocaine-seeking response becomes insensitive to extinction of the cocaine taking response following extended self-administration training, suggesting a shift to habitual control [24].

Numerous studies show that passive drug exposure is sufficient to promote habitual responding for nondrug rewards. For instance, while lever pressing for a solution of 20% sucrose remains under goal-directed control after 8 weeks of training, home-cage access to ethanol during instrumental training renders the behavior habitual [11]. Ethanol-induced facilitation of habitual responding for food was also found following chronic intermittent exposure to ethanol vapor [25]. Passive cocaine [26, 27] or amphetamine [28,29,30] exposure also rendered responding for a nondrug reward insensitive to devaluation by specific satiety or CTA. Interestingly, even limited post-training exposure to cocaine was sufficient to observe habitual responding for food rewards [31], a results not replicated with amphetamine [32]. Drug-induced facilitation of habit was also demonstrated in studies showing insensitivity to degradation of instrumental contingency (Box 1) following ethanol exposure [16] or repeated injections of cocaine [33]. However, two studies have found that exposure to cocaine increased rather than decreased sensitivity to contingency degradation [34, 35]. Overall, besides few exceptions [32, 35, 36], the literature in rodents converges to show that various drugs of abuse shift the balance toward habit.

Limitations to the habit theory of addiction

Although drugs of abuse generally promote habit, a very specific set of conditions is typically required to observe habit in rodents. First, the schedule of reinforcement (i.e., random interval) can bias action control toward habit by reducing the contingency and contiguity between response and reinforcement [37,38,39]. Second, extended operant training can also be required to induce an observable shift toward habit [40,41,42]. For instance, drug seeking is goal-directed after limited training in the seeking-taking schedule [22,23,24] but becomes habitual after extended training [24]. Long training is also required to observe the development of alcohol and nicotine habits [11, 20]. Lack of choice seems to be a prerequisite for observing habits during testing. When animals have concurrent access to at least two rewarded responses, their behavior remains sensitive to outcome devaluation, even after extended training [42,43,44] or cocaine exposure [34]. Furthermore, the degree of reward predictability seems to play a significant role in habit expression [45,46,47]. When uncertainty about task contingencies is introduced before testing, this can be sufficient to render habitual behavior, goal-directed again [45, 46]. Finally, expression of habit is typically observed under conditions of extinction. Indeed, when the devalued reinforcer is delivered during reacquisition tests, instrumental responding for drug or nondrug rewards generally becomes sensitive to outcome devaluation [15, 18, 21, 28, 30, 40, 41].

If we consider that behavior remains goal-directed when there is a simple choice between two options, the hypothesis that drug habits contribute to compulsive drug use and ultimately addiction is difficult to reconcile with real-world scenarios, in which drug addicts typically face a multitude of drug and nondrug alternatives [10]. The apparent incompatibility between choice and habit raises another paradox that extends beyond the question of addiction: if this incompatibility were genuine, then how habitual behaviors could be so ubiquitous in everyday life with its rich array of choices and options? In real-world scenarios, habits must somehow be compatible with choice, if only to minimize the costs associated with computationally demanding goal-directed decision-making processes [48, 49]. Another factor limiting the ecological relevance of animal research on habits is that habits have only been observed under extinction conditions, mainly to avoid incentive learning and reengagement of goal-directed control [15, 18, 21, 40, 41]. However, extinction conditions rarely occur in real-world drug use scenarios, in which drug seeking is typically reinforced [10]. Although current animal models appear to fail to demonstrate habit in conditions of higher face validity, the difficulty of observing habit in drug users could also indicate that habit is not an underlying process driving addiction. One way to address this issue is to improve the validity of the habit construct, mainly impeded by the apparent impossibility of observing habit under conditions of choice and reinforcement. However, two recent studies provide new evidence of habit in a drug choice setting and under conditions of reinforcement.

New evidence of habitual responding for nondrug reinforcers in a drug choice setting

We have recently found that in rats given a choice between a noncaloric solution of saccharin and an intravenous dose of cocaine, responding for saccharin is habitual [50]. Indeed, preference for saccharin was maintained following saccharin devaluation by sensory-specific satiety, in a test conducted under extinction (Fig. 1A, B). In fact, we observed an effect of reward directly reflecting rats’ preference for saccharin, but no effect of devaluation on saccharin- and cocaine-seeking behavior (Fig. 1A, B). This insensitivity of saccharin preference to devaluation was replicated using CTA (Fig. 1D, E). Importantly, devaluation of saccharin was verified by showing a reduction of saccharin consumption in the devalued group compared to the non-devalued group for both devaluation methods (Fig. 1C, F).

Fig. 1: Habitual preference for saccharin in a drug choice setting.
figure 1

AC Responding for saccharin is not reduced following saccharin devaluation by specific satiety. A Rats’ performance on the cocaine and saccharin levers did not differ between the devalued group (D; white) and the non-devalued group (ND; blue) across 1 min time bins in the extinction test. *p < 0.05 Coc vs. Sacch. B The total number of lever presses was higher on the saccharin lever compared to the cocaine lever but was not affected by devaluation. *p < 0.05 Coc vs. Sacch. C Saccharin was correctly devalued as measured by a reduction in posttest consumption of saccharin in the D group compared to the ND group. DF Preference for saccharin is also insensitive to saccharin devaluation by CTA. D, E Rats responded more on the saccharin lever compared to the cocaine lever but did not differ as a function of devaluation. *p < 0.05 Coc vs. Sacch. F Devaluation of saccharin was confirmed during the test of consumption immediately after the extinction session. Adapted from [50].

Another study from our laboratory tested the sensitivity of the rats’ preference to changes in the current value of the nondrug option, in conditions of choice and reinforcement [51]. Specifically, water-restricted rats were trained to choose between water and cocaine. Preference was assessed across repeated cycles of water restriction and satiation (Fig. 2A). 1 h or 2 h presession access to water (1h-Ø and 2h-Ø sessions) had no effect on preference and only moderately suppressed water consumption during water trials (Fig. 2A, B). Thus, water was also made available during every intertrial intervals (ITI) of the session (Free-Water condition, FW sessions). This resulted in a drastic suppression of water consumption during water trials, indicating successful devaluation (Fig. 2B). However, rats kept preferentially selecting the water option, even though they consumed little of it. Importantly, experiencing the devalued outcome during ITI and water trials did not reverse preference toward the still valued drug option by reengaging goal-directed control, indicating that preference for water was habitual and inflexible.

Fig. 2: Inflexible preference for the alternative nondrug reward in a drug choice setting is under habitual, model-free control.
figure 2

Water-restricted rats offered a choice between water and cocaine expressed a robust preference for water (black; baseline preference under water deprivation). Water was then partially devalued with 1 h (1h-Ø, pink) and 2 h free-water access (2h-Ø, purple) before the choice session. Water preference was not affected (A) but there was moderate suppression of water consumption. B Thus, free-water access was also introduced during each intertrial interval (ITI) of choice sessions in addition to the hour of water presession access (white; 1 h + ITI, Free-Water FW). Although this condition drastically suppressed water consumption from the first FW session (B), nine sessions were needed to observe a complete reversal of preference (A). Following this devaluation training, 1 h water access was sufficient to raise cocaine preference to 50% in a second 1h-Ø choice session (pink). Finally, devaluation of water by taste adulteration with quinine (blue) only moderately affected preference (A) despite a strong suppression of water consumption (B). Adapted from [51].

A progressive reversal of preference toward the drug was observed across nine cycles of water restriction and satiation, indicating that preference can only change after repeated training with the novel water value. These results could be well explained in the context of model-based (MB) and model-free (MF) control, used as proxies for goal-directed and habitual control, respectively (Box 2) [48, 52,53,54]. The slow reversal of preference observed in our study is what would be expected under MF control, which depends on iterative and retrospective learning of an action’s values in a given “state”. Thus, rats may have learned to compute the actions’ value from the start of the session, based on their motivational state. In other words, rats learn to select water when thirsty, and cocaine when sated, without relying on the expected current value of these two rewards. To test this hypothesis, rats were tested again with 1 h water access before the session but not during ITI (1h-Ø; Fig. 2A). Although this condition moderately decreased consumption during water trials, the preference for cocaine increased to 50% and was significantly higher than cocaine preference before devaluation training under the same conditions. These results suggest that during devaluation training, rats learn to use their motivational state as a discriminative cue to predict the most valuable option, under MF control. Alternatively, since rats became sensitive to the altered outcome value in the presence of an altered interoceptive state (water satiation), it could be argued that rats progressively learned to reengage MB goal-directed control. Yet, rats maintained their preference for water following quinine-induced devaluation, despite a significant suppression of water consumption (Fig. 2A, B), indicating that rats cannot flexibly adjust their preference in response to outcome devaluation using another modality (e.g., taste instead of motivational state). A more parsimonious hypothesis is that rats learned instead to select options according to their motivational state under MF control (i.e., select water when thirsty), without relying on the outcome value per se.

Possible explanations

The results described above are surprising since responding for the nondrug reward was habitual despite choice and reinforcement. In the following subsection, we will discuss possible explanations for these unexpected results.

Both experiments included prior training in the discrete-trial choice schedule to assess preference under baseline conditions. In this procedure, the lever insertion and retraction at each trial constitute salient cues predicting reward availability and delivery, respectively. By reducing uncertainty about reward delivery and alleviating the need for attentional monitoring, these cues can promote the rapid development of habit [47, 55, 56]. Indeed, arbitration between MF and MB control has been suggested to rely on the relative uncertainty of predictions from each system [52, 57]. In procedures involving discrete trials, the low uncertainty about MF predictions derived from the lever cues through reinforcement learning is hypothesized to favor habit. This could explain why habitual responding for sucrose is observed after only five sessions whereas 8 weeks of training are not sufficient to observe habit when these cues are not available [11, 55]. Therefore, habitual preference in the two studies described above may be promoted by the structure of the discrete-trial choice procedure. It is noteworthy that studies showing goal-directed choice between two nondrug rewards use self-paced random-ratio or -interval schedules, in absence of reward-predictive cues and thus, under conditions of higher reward uncertainty [34, 42, 44, 58].

The strong initial preference for the alternative nondrug reward in our studies indicates large difference in outcome values [50, 51]. In contrast, studies showing goal-directed choice between two response–outcome associations typically use equally valuable rewards [42,43,44, 59,60,61,62]. In this condition, the brain chooses advantageously by assigning and comparing options value and selecting the response associated with the highest value [63,64,65,66]. Consequently, decision-making remains under goal-directed control—driven by a representation of the options’ value—when choice outcomes are difficult to distinguish [67]. However, when there is a clear difference in outcome values, choice may not require effortful outcome representation but could instead rely on MF stimulus–response policy, slowly updated based on prior reward history [48]. This is indeed what we observed when assessing rats’ preference across repeated cycles of water restriction and satiation [51]. The facilitation of MF control in our experimental choice setting is also in accordance with the arbitration model of Daw et al. based on the relative uncertainty of MB vs. MF predictions [52, 57]. While an increase in task complexity is predicted to favor MB control, the strong difference between value of drug and nondrug rewards combined with the high predictability of reward delivery provided by lever cues should favor MF control.

Reframing the habit theory of addiction

In the two studies described above, habitual responding did not promote drug choice but instead favored abstinence. How can we reconcile these results with the habit theory of addiction? In the following section, we will discuss new avenues to reframe the habit theory of addiction by embracing the complexity of (1) drug addicts’ decision-making environment and (2) interactions between decision-making processes.

Facing the complexity of drug addicts’ environment

The discrete-trial choice procedure developed in our laboratory has been used as a rodent model of addiction to isolate a minority of vulnerable rats that prefer the drug, when the large majority prefers the alternative nondrug reward [68,69,70,71]. It is perhaps not surprising that population-wide behavior in rats does not reflect the behavior of the subgroup of individuals losing control over drug use and developing SUD. Future research will assess possible development of habitual cocaine preference in the subset of cocaine-preferring rats.

Although our research departs from the mainstream in showing habitual preference for a nondrug reward in a drug choice setting, there are commonalities with the literature on the role of reward-predictive cues in biasing behavior toward habit. In rodents, it was shown that providing reward-predictive cues—the insertion and retraction of the lever—reduces uncertainty about reward delivery and favors habit [55, 56]. In this context, the lever cue could act as a noncontingent discriminative stimulus signaling the contingency between the response and the reward [72]. Discriminative cues predictive of drug availability have been shown to produce drug seeking in animal models of relapse [72,73,74,75,76]. Interestingly, when smokers are required to choose between cigarette and food rewards, the presentation of discriminative cigarette cues (cigarette pictures) biased preference toward cigarettes, an effect that was not reduced by tobacco devaluation using health warning or satiety [77, 78]. This result suggests that habitual behavior is more strongly bounded by discriminative environmental stimuli and less controlled by the primary drug reinforcement itself.

Noncontingent Pavlovian cues can also directly interact with instrumental reward-seeking behavior, a phenomenon known as “Pavlovian to instrumental transfer” (PIT). Pavlovian cues can elicit a representation of the outcome identity and enhance instrumental responding for that same outcome specifically, independently of the current outcome value (specific-PIT) [42, 79, 80]. Specific-PIT can therefore counteract goal-directed responding by enhancing responding for an outcome predicted by a cue, despite devaluation of this outcome by satiety [81, 82]. However, the role of PIT in addiction remains unclear [83] and this process is presumably rare in human drug-seeking behavior, which is generally reinforced by contingent drug exposure. Instead, Pavlovian cues are more likely to influence drug-seeking behavior when they are contingent with drug delivery and come to function as conditioned reinforcers (CR), by acquiring motivational salience through repeated pairing with the drug [72, 84]. Although numerous studies demonstrate the fundamental role of CR in producing and maintaining drug-seeking behaviors [72, 75, 85], how resistant habitual behaviors are to changes in CR remains relatively unexplored. More generally, the fundamental role of Pavlovian cues in the control of reward-seeking behaviors remains largely overlooked in tasks employing self-paced free-operant schedules in absence of conditioned and discriminative stimuli.

Because of the multiple interactions between cues, actions and outcomes, task structure plays a fundamental role in the orchestration of associative control during choice behavior. Moving forward, it is fundamental to face the associative complexity underlying drug choice in addiction to understand how interactions between stimuli, actions, and outcomes shape individuals’ choices between drug and nondrug rewards.

Facing the complexity of interactions between decision-making processes

The habit theory of addiction is limited by the difficulty of observing habits in real-world settings and evidence that drug-seeking behaviors are primarily goal-directed [5, 10]. It could be argued that behavioral persistence toward a devalued goal results from an excessively strong motivation for the goal rather than from an action executed “out of habit”. Indeed, it was recently suggested that excessive goal-directed control would drive the transition to addiction [10]. Interestingly, evidence suggests that rats showing compulsive-like methamphetamine self-administration (i.e., resistance to footshock punishment) exhibited hyperactivity in the orbitofrontal cortex (OFC) to dorsomedial striatum (DMS) pathways, and lower engagement of the medial prefrontal cortex (mPFC)—ventrolateral striatum circuitry [86]. Furthermore, in a model of optogenetic dopamine neurons self-stimulation [87], it was shown that potentiation of the OFC to dorsal striatum synaptic pathway drives compulsive-like reinforcement [88]. Given the established role of OFC in encoding of value during goal-directed behavior, these results suggest that compulsive-like drug use may be driven by an overestimation of drug value relative to punishment [89]. Furthermore, impairment of executive functioning resulting from drug-induced dysfunctions in PFC activity can disrupt inhibitory control, resulting in an inability to suppress strong motivation after a change in contingencies [89,90,91]. Together, these studies suggest that compulsive-like drug use is driven by excessive goal-directed motivation for the drug.

Evidence of a shift from ventromedial to dorsolateral striatum in striato-nigro-striatal dopaminergic pathways, which is proposed to underlie the transition from goal-directed to habitual control over drug seeking remains limited. Indeed, studies demonstrating this shift during cocaine self-administration under a second-order schedule of reinforcement did not assess whether behavior was habitual [92]. Although a shift from ventromedial to dorsolateral striatal (DLS) dopamine release has been observed during cocaine self-administration, this shift was suggested to promote refinement of instrumental learning rather than escalated and compulsive-like cocaine seeking [93]. Numerous studies suggest that DMS and DLS are sequentially involved during early and late instrumental training, when behavior is goal-directed or habitual, respectively [94,95,96,97]. This dissociation between DMS and DLS has also been reported following ethanol and cocaine self-administration [11, 24]. Furthermore, dopamine transmission in the DMS and DLS is required for early and late performance of cue-mediated cocaine seeking, respectively [98]. However, the hypothesis of sequential involvement of DMS and DLS across habitual learning has been recently challenged [56] and whether this serial recruitment in dorsostriatal activity is accelerated by drug exposure remains unknown. Clearly, more research is needed to demonstrate a shift in meso-nigro-striatal dopaminergic signaling and dorsostriatal activity in the context of habitual drug-seeking behavior.

Although some neurobiological evidence suggests that addiction is associated with excessive goal-directed drug seeking while other studies seem to indicate a shift toward DLS-dependent drug-seeking habits, drug-related behaviors may not be exclusively habitual or goal directed. There are instances of both goal-directed and habitual behavior in drug addiction. Some strategies developed by drug addicts to acquire money, procure the drug and consume it are undoubtedly goal-directed in that they are highly flexible, driven by expectation of drug effects, and involve careful assessment of risks and benefits [5, 99]. On the other hand, some drug-related behaviors can also be conceived as habitual, for instance, the first cigarette smoked in the morning. Therefore, instead of asking whether drug-seeking behavior is goal-directed or habitual, it may be more relevant to consider exercise of goal-directed control as a gradient and to determine how tilted the balance on that gradient is. However, tasks assessing individual sensitivity to outcome devaluation typically answer a yes-or-no question [100]. In humans, the 2-step task (Box 2) was developed to estimate individual reliance on MB and MF control [48, 52,53,54] and is more suitable to measure the relative strength of both systems (but see [101]). Using this procedure, several studies have shown correlation between drug use and the strength of MB control [102, 103]. Recent adaptation of this task in rodents [104,105,106] will provide further information about the relative contribution of MB and MF systems in animal models of addiction [107].

Studies using the 2-step task converge to suggest that goal-directed and habitual control are engaged in parallel and that subjects rely on both systems to make decisions [53, 108]. Several neurocomputational models suggest that habitual and goal-directed processes are intermingled under a hierarchical decision-making structure. Keramati et al. proposed an integrative “plan-until-habit” model in which MF cached values are directly integrated into MB prospective planning [49]. Along the same line, Dezfouli and colleagues proposed that goal-directed choices can be executed under habitual control [109,110,111,112]. Alternatively, another model suggests that habitual control can be exerted over goal selection. Selected goals are then reached with deliberation and planning [113]. Although these models propose opposite relationships between goal-directed and habitual systems, all share the assumption that humans constantly and flexibly engage habitual and goal-directed control under hierarchical levels in the decision-making structure. Further blurring the frontier between goal-directed and habitual behaviors, several researchers suggest that habits are by essence goal driven [114, 115].

One key problem of goal-directed, MB strategy is the high computational demand for implementation. In theory, to make decisions under MB control, agents build a decision tree of all possible states and actions and navigate in this “cognitive map” to estimate the long-run worth of each available outcome [48]. In the forest of decision-tree possibilities in real-world settings, considering all the available options is not possible; relevant paths must be somehow preselected [116]. For instance, possible outcomes in a choice situation may be irrelevant and not considered in the first place. We have recently shown in rats that options can be available but not considered in the associative structure of the task, despite the engagement of goal-directed control [117]. In this task, we allowed rats to exert goal-directed control over the occurrence of choice trials by requiring them to nosepoke in a hole for the presentation of cocaine and saccharin levers (Fig. 3A). As expected, we found that rats preferred saccharin over cocaine but intriguingly, this preference was exclusive in the majority of rats (Fig. 3B). When the interest for saccharin was temporarily lost due to repeated choice (i.e., specific satiety), rats preferred to pause for long periods before reinitiating a choice trial for saccharin, instead of switching to cocaine (Fig. 3C). To explain this suboptimal behavior, we suggested that rats are preferentially associating the initiation of behavioral sequences with saccharin, thereby ignoring the drug reward. These results show that in some situations, choice outcomes can be available but ignored, even when responding is under goal-directed control [117].

Fig. 3: Rats are oblivious to the cocaine option during self-initiated choice.
figure 3

A Rats are required to nosepoke in a hole under a fixed ratio 10 to trigger the presentation of two levers. Two consecutive presses on the left or right lever result in the delivery of saccharin or an intravenous infusion of cocaine, respectively. B In this procedure, rats expressed a strong preference for saccharin. Interestingly, this preference was exclusive for a majority of rats (right panel). C Analysis of choice patterns reveals that rats choosing saccharin exclusively did so in bouts of varying lengths separated by pauses, during which they did not self-initiated any trial for cocaine, despite transient saccharin devaluation by sensory-specific satiety. This behavior represents an opportunity cost because the duration of pauses is sufficient to earn several cocaine injections (right panel). Adapted from [117].

These results raise an intriguing question; is it possible to select an option among several choice outcomes without actually choosing between them? Instead of comparing and choosing between options, subjects may only consider the relevant options successively and decide whether to accept or reject them. This is the principle of sequential choice models, which assume that in nature, simultaneous encounters are rare and that mechanisms of choice may be evolutionarily adapted to sequential encounters [118,119,120,121,122,123,124]. Applying this model to the discrete-trial choice procedure, choice between drug and nondrug rewards may not involve simultaneous choice with comparison of options value. Instead, only the relevant preferred option would be considered. Since choices are exclusive in this procedure, habitual selection of the nondrug reward with a short latency automatically foregoes the opportunity to select cocaine. Likewise, drug addicts are unlikely to simultaneously choose between drug and nondrug rewards by comparing options values; they may instead decide whether to carry out their drug-seeking sequence. Therefore, experimental settings involving simultaneous choice between options comparable in value in both human and rodent studies may preclude the observation of habit by requiring assessment and comparison of options’ value, thereby reengaging goal-directed control. Yet, this “artificial” choice setting may not represent the true decision-making structure faced by drug users in real-world environment. Although more research is needed to assess the validity of these sequential choice models, this new framework could resolve the challenge of the exponential computational cost of MB strategies in real-world environment and the expression of habit despite choice in our experiments [50, 51], and in the broader context of drug-seeking in addiction.

Conclusion

We hope it is clear from this review that habits alone cannot account for the development of compulsive drug use and that drug habits are not necessary [125], nor sufficient [89] to explain the transition to addiction. However, this does not preclude a role for habits in addiction. Then, to what extent are drug habits actually involved? To answer this question, we suffer from several limitations. The structure of our procedures generally favors reengagement of goal-directed control precluding correct assessment of habit. Experiments in animals suffer from a paucity of reward-predictive cues, which does not reflect the sensorial and associative richness of drug addicts’ environment and does not facilitate the development of habit by reducing reinforcement uncertainty. Finally, investigations are limited by too narrow views that drug-seeking behavior should be either habitual or goal-directed. Moving forward, we propose to better design instrumental tasks, in the presence of choice and reward-predictive cues, and under conditions of high reinforcement predictability to favor implementation of simple stimulus–response MF policies. Alternative task structures involving sequential rather than simultaneous choice should also be considered. On a theoretical level, we may need to consider a more complex framework taking into account (1) the continuous arbitration between goal-directed and habitual systems, (2) the hierarchical decision-making architectures combining these two systems and (3) alternative sequential decision-making models suggesting that individuals may consider one option at a time when making decisions. Although much remains to be done, our hope is that this review opens up new perspectives to determine the role of habit and choice in addiction.

Funding and disclosure

This work was supported by the French Research Council (CNRS), the Université de Bordeaux, the French National Agency (ANR-2010-BLAN-1404-01), the Ministère de l’Enseignement Supérieur et de la Recherche (MESR), the Fondation pour la Recherche Médicale (FRM DPA20140629788), and the Peter und Traudl Engelhorn foundation. The authors declare no competing interests.