Inferring the trial-by-trial structure of pitch reinforcement learning in songbirds
- Doctoral Thesis
Rights / licenseIn Copyright - Non-Commercial Use Permitted
One of the most effective ways to train animals and humans to selectively change motor behaviors is by reinforcing desired behaviors using reward or punishment. Reinforcement learning theory provides a promising framework for modeling behavioral conditioning. It subsumes that a reinforced behavioral trial biases future trials via the correlation between exploration and reward. While we know that exploration is necessary for reinforcement learning, which part of motor variability constitutes exploration and how reinforcement acts on a trial-by-trial basis to improve future behavior is currently unknown. This thesis aims to differentiate between motor exploration (used for learning via its correlation with reward) and noise (inaccessible for learning) and suggests a simple behavioral model implementing this basic reinforcement learning strategy. Songbirds such as zebra finches provide a tractable model system to study neural mechanisms underlying trial-and-error processes of reinforcement learning. The learning and the neuronal mechanisms for song learning are very similar to the mechanisms for human speech learning. Furthermore, both spectral and temporal aspects of birdsong can be modified independently by delivering real-time auditory feedback (short bursts of noise) as aversive reinforcement contingent on the trials’ pitch (fundamental frequency) or duration. First, we test whether birds require auditory feedback of the exploratory motor behavior to be able to learn to adaptively change their pitch in a targeted direction. We modify the widely used reinforcement learning paradigm; By using visual feedback (brief events of light-off) instead of auditory feedback, we can teach deaf (and hearing) zebra finches to selectively modify their syllable’s pitch. This shows that reinforcement learning is possible without evaluation of vocal performance contrary to song learning in juveniles and song maintenance in adult birds that both critically depend on auditory feedback during performance. Hence, birds do not require auditory feedback of their trial-by-trial variation in pitch providing support for reinforcement learning theories that assume centrally generated variability (movement planning) is used for learning. Second, we identify different types of pitch variability based on their statistical structure and investigate whether they have a central origin suggesting an exploratory contribution or a peripheral origin (movement execution) suggesting a noise contribution that cannot be used for learning. In songbirds, the main known source of centrally originating variability is generally assumed to correspond to the nucleus of the anterior nidopallium (LMAN), the output nucleus of the anterior forebrain pathway. We perform lesions in LMAN and show that the type of variability that decreases proportional to the lesion extent corresponds to a type of variability that is independent for each trial, suggesting that the variability injected by LMAN is independent from one trial to the next. Third, we formulate a simple stochastic behavioral model for reinforcement learning of pitch where learning is based on the correlation between LMAN exploration and reward. In the model, we combine several sources of variability into a single mechanistic framework where each source of variability is modeled as a latent state. Simulations of this model correctly reproduce our experimental data: When low-pitched renditions of a target syllable are aversively reinforced, birds modify their pitch with a tendency to avoid the aversive reinforcement (increased pitch).Furthermore, we fit trial-by-trial song data to our model that can account for a broad range of experimental findings. Our model produces excellent fits to pitch conditioning data even when learning trends are non-monotonic due to diurnal rhythms of pitch and allows estimating the behavioral variance used for exploration from behavioral data alone. The main benefit of our model is to provide a rich characterization of motor learning. We anticipate that our reductionist behavioral model will be of importance for studies aiming at dissecting the neural underpinnings of operant conditioning in songbirds. Show more
External linksSearch print copy at ETH Library
Subjectsongbird; neuroscience; Kalman filter; variability; Deafness; Sensory feedback; reinforcement learning
Organisational unit03774 - Hahnloser, Richard H.R. / Hahnloser, Richard H.R.
MoreShow all metadata