Sunday, December 8, 2013

Explorative learning of inverse models: a theoretical perspective

Optimization and exploratory learning
have a non-negative relation
Learning inverse models is an old story in motor learning literature. It is well known to work based on optimization approaches like "feedback-error" learning or learning with "distal-teacher". But these approaches require prior knowledge. The alternative is trying to learn from self-explored data and try to fit an inverse model to it. In some cases this works, it other cases it is known to fail, and some cases have just become tractable with goal babbling. Despite the very wide applicability (and actual application!) of inverse models, these aspects are heavily undertheorized. When does it work, and why? What solutions are selected? What is the relation between exploratory learning and optimization? And how does all of that work when not using a fixed data set, but dynamic exploration like goal babbling?
Convergence of learning towards the
Moore-Penrose pseudo-inverse
Our recent paper (see below) gives some answers to these questions. It only considers linear domains, but including arbitrarily high dimensions and redundancy. This obviously leaves questions for the non-linear case, but already gives some very non-trivial findings. Some highlights:
  • We prove that the gradients of optimization and exploratory learning satisfy a non-negative relation.
  • We prove that any fixpoint of exploratory learning for any data must be an actual inverse model.
  • We prove that exploratory learning with goal babbling not only works, but converges to the optimal least-squares solution.
  • We show that the basic learning dynamics of goal babbling resemble those of explosive combustion processes. This gives a neat view on the previous finding that goal babbling constitutes a positive feedback loop – and explains the S-shaped learning curves also observed in human learning.
Rolf, M., and J.J. Steil, "Explorative Learning of Inverse Models: a Theoretical Perspective", Neurocomputing, in press (available online).
Abstract — We investigate the role of redundancy for exploratory learning of inverse functions, where an agent learns to achieve goals by performing actions and observing outcomes. We present an analysis of linear redundancy and investigate goal-directed exploration approaches, which are empirically successful, but hardly theorized except negative results for special cases, and prove convergence to the optimal solution. We show that the learning curves of such processes are intrinsically low-dimensional and S-shaped, which explains previous empirical findings, and finally compare our results to non-linear domains.