Hardware systems for low-latency audio processing: Event-based and multichannel synchronous sampling approaches


Loading...

Author / Producer

Date

2021

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Neuromorphic technology is slowly maturing with a variety of useable event-driven spiking sensors and hardware implementations of spiking neural networks. Sensory processing algorithms are still under investigation and their usefulness in natural environments are still relatively unexplored compared to algorithms using conventional sensors and digital hardware. We developed hardware test beds that allow us to explore event-based sensory processing algorithms and regular sampling based algorithms in real-world conditions. The goal of my thesis is three-fold: 1) to develop a hardware test bed for implementing spiking networks together with spiking sensors to study a possibility of using multiple sensors of different modalities to improve classification performance in real-world conditions; 2) to implement a local automatic gain control mechanism to increase the input dynamic range of a spiking cochlea operating in natural environments where the sound dynamic range can be greater than 60 dB; 3) to implement a multi-microphone hardware platform that can be used for real-time beamforming as part of a wireless acoustic sensor network. The first part of the thesis describes development of a real-time hardware system that fuses information from neuromorphic spiking sensors of different modalities. The core of the system is a general purpose accelerator for spiking Deep Neural Networks (DNN) implemented on a Field-Programmable Gate Array (FPGA). We demonstrate the performance of the system on an audio-visual sensor fusion task using a Dynamic Vision Sensor (DVS) and a Dynamic Audio Sensor (DAS) spiking sensors for classification of digits from the Modified National Institute of Standards and Technology (MNIST) dataset augmented with specific audio tones for each digit. We demonstrate that reliable classification is possible with just a fraction of spikes produced by the sensors. On the other hand, processing the full stream of spikes increases the computational demand of the system proportionally to the increase of the spike rate. In addition, the spike rate of the audio sensor depends on the input signal amplitude, which makes it difficult to train classifiers to be invariant to input signals with a wide dynamic range. However, it is known that biological audio and visual processing systems can accommodate to input signals that differ by orders of magnitude, while maintaining a moderate neuron spike rate. The second part of the thesis addresses the problem of increasing spike rates in response to high amplitude signals in the spiking silicon cochlea by developing a local spike-based gain control algorithm, that constantly monitors the spike rate at the output of each channel and adapts the corresponding channel gain, so that its spike rate would not exceed a predefined threshold. We implemented this algorithm in hardware for the Dynamic Audio Sensor Low Power (DASLP) silicon cochlea and studied its performance on synthetic tests and real audio classification problem. The third part of the thesis work is carried out within a multi-partner European project, COCOHA (COgnitive COntrol of a Hearing Aid, www.cocoha.org), that aimed to develop a system for attention decoding from electroencephalogram (EEG) signals for directing the speech of an attended talker to the user of a hearing aid device. The goal of this work is to construct a synchronized distributed multi-microphone platform which can be used for general auditory scene analysis. The developed platform is composed of multi-microphone modules which can perform synchronized audio sampling at different parts of the room and transmit the audio streams with low latency to a central processing unit, where the samples from different microphones can be aligned with a sub-microsecond precision. Synchronized sampling across the ad-hoc distributed microphone array enables a variety of algorithms to be used for further processing, e.g. for tasks such as beamforming, source separation or speech enhancement. The platform was used for testing a set of beamforming algorithms in the wild. All three parts serve a common goal of enabling application of novel auditory sensing technology in practically relevant settings, by coping with challenges of real-world deployment.

Publication status

published

Editor

Contributors

Examiner : Liu, Shih-Chii
Examiner : Hahnloser, Richard H.R.
Examiner : Conradt, Jörg

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

sensor fusion; Spiking deep neural networks; Event-Driven Sensors; automatic gain control; wireless acoustic sensor networks; wireless synchronization; audio source separation; beamforming

Organisational unit

03774 - Hahnloser, Richard H.R. / Hahnloser, Richard H.R. check_circle
08836 - Delbrück, Tobias (Tit.-Prof.)

Notes

Funding

Related publications and datasets