Elsevier

Neural Networks

Volume 143, November 2021, Pages 751-758
Neural Networks

2021 Special Issue on AI and Brain Science: Brain-inspired AI
Active sensing with artificial neural networks

https://doi.org/10.1016/j.neunet.2021.08.007Get rights and content

Abstract

The fitness of behaving agents depends on their knowledge of the environment, which demands efficient exploration strategies. Active sensing formalizes exploration as reduction of uncertainty about the current state of the environment. Despite strong theoretical justifications, active sensing has had limited applicability due to difficulty in estimating information gain. Here we address this issue by proposing a linear approximation to information gain and by implementing efficient gradient-based action selection within an artificial neural network setting. We compare information gain estimation with state of the art, and validate our model on an active sensing task based on MNIST dataset. We also propose an approximation that exploits the amortized inference network, and performs equally well in certain contexts.

Introduction

Decision making may be seen as a tradeoff between exploitation – maximizing future reward based on past experience, and exploration – getting more information about the environment (Sutton & Barto, 2018). Here we focus on active sensing as a particular form of exploration (Yang, Wolpert, & Lengyel, 2016). Suppose that environment emits observations x with probability p(x). If an agent intends to plan and evaluate the informativeness of its actions, it needs to form a probabilistic model of the environment — p(x). A canonical functional to measure, and consequently reduce the mismatch of beliefs, in this case between agent’s model and the real world, is Kullback–Leibler divergence DKL[p(x)p(x)]Ep(x)logp(x)Ep(x)logp(x) (Cover & Thomas, 2005). The first term does not depend on agent’s model, and thus can be treated as a constant, leaving only the negative log likelihood (NLL) Ep(x)logp(x) to minimize. Practically, the model may represent a certain structure, or, more often, hyperparameters that define a particular model within a family with a chosen structure. Furthermore, a given model often contains latent (unknown) variables that can be generally classified into the ones that change on shorter (e.g. observation’s hidden causes, denoted by a random variable z) or longer timescales (e.g. model parameters θ): p(x)=Ep(z,θ)p(x|z,θ). Therefore, the problem of directed exploration can be formulated as gaining information about the latent variables of the model (Fig. 1-a). While active learning (MacKay, 1992, Settles, 2012), is focused on resolving uncertainty about parameters θ, reflecting the statistical structure of the environment, we focus on active sensing — gaining information about the hidden causes z, latent variables that change on a trial-by-trial timescale (Yang, Wolpert et al., 2016). Importantly, the latent variable represents the global context (e.g. layout of a maze/location of reward) which the agent wants to figure out, and local context (e.g. compressed current observation/position within the maze), prior belief over which depends on action (Huys, Guitart-Masip, Dolan, & Dayan, 2015) (Fig. 1-b). Active sensing is an important problem both in pattern recognition (i.e. deciding which features to collect Yu, Krishnapuram, Rosales, & Rao, 2009), and in neuroscience, as the pattern of human eye movements during visual exploration has been shown, to optimize resolution of uncertainty about the underlying context (Friston et al., 2012, Hoppe and Rothkopf, 2019, Yang, Lengyel et al., 2016, Yarbus, 1967).

Previous work on active sensing has been focused on tractable but limited scenarios, using kernel methods (Yu et al., 2009), Gaussian mixture models (Yang, Wolpert et al., 2016) or entirely discrete domains (Friston et al., 2015). Here, we set to investigate the case in which both observations x and learned latent representations z are continuous. Using the popular linear Gaussian model is not fit for active sensing, since the amount of uncertainty reduction is constant (Bishop, 2006). In contrast, implementing active sensing with an arbitrary nonlinear relationship could be difficult in part because of statistical limitations of information gain estimation. In particular, it has been shown that in the frequent scenario of intractable p(x), unbiased estimates of mutual information estimated from N samples cannot be larger than O(logN) (Mcallester and Stratos, 2020, Poolel et al., 2019).

The main idea of this paper is to rely on the insight that neural networks with rectifying activations implement piecewise linear functions over the input space (Hanin and Rolnick, 2019, Park et al., 2019). Thus, we can both learn flexible representations (Lecun, Bengio, & Hinton, 2015) and compute a sensible (sampling-free) measure of information gain. First, we illustrate that the structure of the relation between z and xp(x|z) – has a key role in the potential information gain. Then, we describe how Laplace approximation could be effectively used to quantify information gain in piecewise linear networks, complete the model by specifying dynamics, and apply the approach to an active sensing (saccade simulation) task based on the MNIST dataset.

Section snippets

Materials and methods

Suppose that agent’s beliefs about the next observation x, given a hypothetical action a, have the following structure: p(x|a)=Ep(z|a)p(x|z) (we omit the time-index and non-essential variables in the conditioning sets for clarity). In active sensing, information gain is quantified as the mutual information I(z;x|a)H[p(z|a)]Ep(x|a)H[p(z|x,a)] (leaving the conditioning on model parameters θ implicit). A detailed intuition on using I in the context of active sensing as well as comparison with

Unbiased information gain estimates

We first validated our approach by comparing it with other popular information gain measures on a benchmark proposed by Belghazi et al., 2018, Poolel et al., 2019: two 20-dimensional random variables z,x, correlated across corresponding dimensions: p(z)=N(0,Id);p(x)=N(0,Id);p(x|z)=N(ρIdz,(1ρ)2Id). Additionally, we implemented a nonlinear problem by following the same setup but applying a transformation xx3 in the end (Song & Ermon, 2020). In both cases, the true mutual information is known: I(

Discussion

Active sensing has been recently studied within the ’planning as inference’ framework (Botvinick and Toussaint, 2012, Levine, 2018) in the context of discrete domains (Friston et al., 2015, Schwartenbeck et al., 2019). In contrast, we focused on continuous states and observations, leveraging the recent advances in probabilistic dynamical models (Chung et al., 2015, Gemici et al., 2017) that were successfully used for building model-based reinforcement learning agents. For example, Hafner et al.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding sources

This research was funded by Fonds de la Recherche Scientifique (FNRS–FDP) Belgium, IDEX Bordeaux, France and ANR JCJC, France (ANR-18-CE37-0009-01). The funders had no involvement in study design; collection, analysis and interpretation of data; writing of the report; and in the decision to submit the article for publication.

References (54)

  • AlemiA.A. et al.

    Deep variational information bottleneck

    5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings

    (2017)
  • Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., & Courville, A., et al. (2018). Mutual information...
  • BishopC.M.

    Pattern recognition and machine learning (information science and statistics)

    (2006)
  • BishopC.M. et al.

    Regression with input-dependet noise: A bayesian treatment

  • BotvinickM. et al.

    Planning as inference

    Trends in Cognitive Sciences

    (2012)
  • BuesingL. et al.

    Learning and querying fast generative models for reinforcement learning

    (2018)
  • BurdaY. et al.

    Large-scale study of curiosity-driven learning

    7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019

    (2019)
  • BurdaY. et al.

    Exploration by random network distillation

    7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019

    (2019)
  • ChoK. et al.

    Learning phrase representations using RNN encoder-decoder for statistical machine translation

  • ChungJ. et al.

    A recurrent latent variable model for sequential data

    (2015)
  • CoverT.M. et al.

    Elements of information theory

    (2005)
  • De BoerP.T. et al.

    A tutorial on the cross-entropy method

    Annals of Operations Research

    (2005)
  • FristonK. et al.

    Active inference and epistemic value

    Cognitive Neuroscience

    (2015)
  • FristonK. et al.

    Free-energy minimization and the dark-room problem

    Frontiers in Psychology

    (2012)
  • GemiciM. et al.

    Generative temporal models with memory

    CoRR

    (2017)
  • GlorotX. et al.

    Deep sparse rectifier neural networks

    Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics

    (2011)
  • GuezA. et al.

    Learning to search with mctsnets

    Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018

    (2018)
  • HafnerD. et al.

    Learning latent dynamics for planning from pixels

    Proceedings of the 36th International Conference on Machine Learning

    (2019)
  • HaninB. et al.

    Complexity of linear regions in deep networks

  • He, J., Spokoyny, D., Neubig, G., & Berg-Kirkpatrick, T. (2019). Lagging inference networks and posterior collapse in...
  • HenaffM. et al.

    Model-based planning with discrete and continuous actions

    (2017)
  • HoppeD. et al.

    Multi-step planning of eye movements in visual search

    Scientific Reports

    (2019)
  • HouthooftR. et al.

    Vime: variational information maximizing exploration

  • HuysQ.J. et al.

    Decision-theoretic psychiatry

    Clinical Psychological Science

    (2015)
  • KingmaD.P. et al.

    Adam: A method for stochastic optimization

  • Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. In 2nd International Conference on Learning...
  • Klyubin, A., Polani, D., & Nehaniv, C. (2005). Empowerment: a universal agent-centric measure of control. 1, In 2005...
  • Cited by (5)

    View full text