Inferring the trial-by-trial structure of pitch reinforcement learning in songbirds

Open access
Author
Date
2019Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
One of the most effective ways to train animals and humans to selectively change motor
behaviors is by reinforcing desired behaviors using reward or punishment. Reinforcement
learning theory provides a promising framework for modeling behavioral conditioning. It
subsumes that a reinforced behavioral trial biases future trials via the correlation between
exploration and reward. While we know that exploration is necessary for reinforcement
learning, which part of motor variability constitutes exploration and how reinforcement acts on
a trial-by-trial basis to improve future behavior is currently unknown. This thesis aims to
differentiate between motor exploration (used for learning via its correlation with reward) and
noise (inaccessible for learning) and suggests a simple behavioral model implementing this
basic reinforcement learning strategy.
Songbirds such as zebra finches provide a tractable model system to study neural
mechanisms underlying trial-and-error processes of reinforcement learning. The learning and
the neuronal mechanisms for song learning are very similar to the mechanisms for human
speech learning. Furthermore, both spectral and temporal aspects of birdsong can be modified
independently by delivering real-time auditory feedback (short bursts of noise) as aversive
reinforcement contingent on the trials’ pitch (fundamental frequency) or duration.
First, we test whether birds require auditory feedback of the exploratory motor behavior to be
able to learn to adaptively change their pitch in a targeted direction. We modify the widely
used reinforcement learning paradigm; By using visual feedback (brief events of light-off)
instead of auditory feedback, we can teach deaf (and hearing) zebra finches to selectively
modify their syllable’s pitch. This shows that reinforcement learning is possible without
evaluation of vocal performance contrary to song learning in juveniles and song maintenance
in adult birds that both critically depend on auditory feedback during performance. Hence,
birds do not require auditory feedback of their trial-by-trial variation in pitch providing support
for reinforcement learning theories that assume centrally generated variability (movement
planning) is used for learning.
Second, we identify different types of pitch variability based on their statistical structure and
investigate whether they have a central origin suggesting an exploratory contribution or a
peripheral origin (movement execution) suggesting a noise contribution that cannot be used
for learning. In songbirds, the main known source of centrally originating variability is generally
assumed to correspond to the nucleus of the anterior nidopallium (LMAN), the output nucleus
of the anterior forebrain pathway. We perform lesions in LMAN and show that the type of
variability that decreases proportional to the lesion extent corresponds to a type of variability
that is independent for each trial, suggesting that the variability injected by LMAN is
independent from one trial to the next.
Third, we formulate a simple stochastic behavioral model for reinforcement learning of pitch
where learning is based on the correlation between LMAN exploration and reward. In the
model, we combine several sources of variability into a single mechanistic framework where
each source of variability is modeled as a latent state. Simulations of this model correctly
reproduce our experimental data: When low-pitched renditions of a target syllable are
aversively reinforced, birds modify their pitch with a tendency to avoid the aversive
reinforcement (increased pitch).Furthermore, we fit trial-by-trial song data to our model that can account for a broad range of
experimental findings. Our model produces excellent fits to pitch conditioning data even when
learning trends are non-monotonic due to diurnal rhythms of pitch and allows estimating the
behavioral variance used for exploration from behavioral data alone. The main benefit of our
model is to provide a rich characterization of motor learning. We anticipate that our reductionist
behavioral model will be of importance for studies aiming at dissecting the neural
underpinnings of operant conditioning in songbirds. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000382074Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
songbird; neuroscience; Kalman filter; variability; Deafness; Sensory feedback; reinforcement learningOrganisational unit
03774 - Hahnloser, Richard H.R. / Hahnloser, Richard H.R.
More
Show all metadata
ETH Bibliography
yes
Altmetrics