Chang Gao
Loading...
25 results
Search Results
Publications 1 - 10 of 25
- Hardware Neural Control of CartPole and F1TENTH Race CarItem type: Working Paper
arXivPaluch, Marcin; Bolli, Florian; Deng, Xiang; et al. (2024)Nonlinear model predictive control (NMPC) has proven to be an effective control method, but it is expensive to compute. This work demonstrates the use of hardware FPGA neural network controllers trained to imitate NMPC with supervised learning. We use these Neural Controllers (NCs) implemented on inexpensive embedded FPGA hardware for high frequency control on physical cartpole and F1TENTH race car. Our results show that the NCs match the control performance of the NMPCs in simulation and outperform it in reality, due to the faster control rate that is afforded by the quick FPGA NC inference. We demonstrate kHz control rates for a physical cartpole and offloading control to the FPGA hardware on the F1TENTH car. Code and hardware implementation for this paper are available at https:// github.com/SensorsINI/Neural-Control-Tools. - DeltaKWS: A 65nm 36nJ/Decision Bio-Inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAMItem type: Journal Article
IEEE Transactions on Circuits and Systems for Artificial IntelligenceChen, Qinyu; Kim, Kwantae; Gao, Chang; et al. (2025)This paper introduces DeltaKWS, to the best of our knowledge, the first ΔRNN-enabled fine-grained temporal sparsity-aware Keyword Spotting (KWS) integrated circuit (IC) for voice-controlled devices. The 65 nm prototype chip features a number of techniques to enhance performance, area, and power efficiencies, specifically: 1) a bio-inspired delta-gated recurrent neural network (ΔRNN) classifier leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses; 2) an infinite impulse response (IIR) bandpass filter (BPF)-based feature extractor (FEx) that leverages mixed-precision quantization, low-cost computing structure and channel selection; 3) a 24 kB 0.6 V near-VTH weight static random-access memory (SRAM) that achieves 6.6× lower read power than the foundry-provided SRAM. From chip measurement results, we show that the DeltaKWS achieves an 11/12-class Google Speech Command Dataset (GSCD) accuracy of 90.5%/89.5% respectively and energy consumption of 36 nJ/decision in 65 nm CMOS process. At 87% temporal sparsity, computing latency and energy/inference are reduced by 2.4×/3.4×, respectively. The IIR BPF-based FEx, ΔRNN accelerator, and 24 kB near-VTH SRAM blocks occupy 0.084 mm2, 0.319 mm2, and 0.381 mm2 respectively (0.78 mm2 in total) - An Area-Efficient Ultra-Low-Power Time-Domain Feature Extractor for Edge Keyword SpottingItem type: Conference Paper
IEEE ISCAS 2023 Symposium ProceedingsChen, Qinyu; Chang, Yaoxing; Kim, Kwantae; et al. (2023)Keyword spotting (KWS) is an important task on edge low-power audio devices. A typical edge KWS system consists of a front-end feature extractor which outputs mel-scale frequency cepstral coefficients (MFCC) features followed by a back-end neural network classifier. KWS edge designs aim for the best power-performance-area metrics. This work proposes an area-efficient ultra-low-power time-domain infinite impulse response (IIR) filter-based feature extractor for a KWS system. It uses a serial architecture, and the architecture is further optimized for a low-cost computing structure and mixed-precision bit selection of the IIR coefficients while maintaining good KWS accuracy. Using a 65nm process technology and a back-end neural network classifier, this simulated feature extractor has an area of 0.02mm2 and achieves 3.3 mu W @ 1.2V, and achieves 92.5% accuracy on a 10-keyword, 12-class KWS task using the GSCD dataset. - EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge InferenceItem type: Conference Paper
2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)Gao, Chang; Rios-Navarro, Antonio; Chen, Xi; et al. (2020)This paper presents a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) accelerator called EdgeDRNN designed for portable edge computing. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. It reduces off-chip memory access by a factor of up to 10x with tolerable accuracy loss. Experimental results on a 10 million parameter 2-layer GRURNN, with weights stored in DRAM, show that EdgeDRNN computes them in under 0.5 ms. With 2.42 W wall plug power on an entry level USB powered FPGA board, it achieves latency comparable with a 92W Nvidia 1080 GPU. It outperforms NVIDIA Jetson Nano, Jetson TX2 and Intel Neural Compute Stick 2 in latency by 6X. For a batch size of 1, EdgeDRNN achieves a mean effective throughput of 20.2 GOp/s and a wall plug power efficiency that is over 4X higher than all other platforms. - EILE: Efficient Incremental Learning on the EdgeItem type: Conference Paper
2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems, AICAS 2021Chen, Xi; Gao, Chang; Delbrück, Tobias; et al. (2021)This paper proposes a fully-connected network training architecture called EILE targeting incremental learning on edge. By using a novel reconfigurable processing element (PE) architecture, EILE avoids explicit transposition of weight matrices required for backpropagation to preserve the same efficient memory access pattern for both the forward (FP) and backward propagation (BP) phases. Experimental results on a Zynq XC7Z100 FPGA with 64 PEs show that EILE achieves 19.2 GOp/s peak throughput and maintains nearly 100 % PE utilization efficiency for both FP and BP with batch sizes from 1 to 32. EILE's small on-chip memory footprint and scalability to match any available off-chip memory bandwidth makes it an attractive ASIC architecture for energy-constrained training. - FrameFire: Enabling Efficient Spiking Neural Network Inference for Video SegmentationItem type: Conference Paper
2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)Chen, Qinyu; Sun, Congyi; Gao, Chang; et al. (2023)Fast video recognition is essential for real-time scenarios, e.g., autonomous driving. However, applying existing Deep Neural Networks (DNNs) to individual high-resolution images is expensive due to large model sizes. Spiking Neural Networks (SNNs) are developed as a promising alternative to DNNs due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus they are useful to enable energy-efficient computation. However, exploiting the spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading energy efficiency. In this work, we, therefore, propose an SNN accelerator called FrameFire for efficient video processing. We introduce a Keyframe-dominated Workload Balance Schedule (KWBS) method. It accelerates the image recognition network with sparse keyframes, then records and analyzes the current workload distribution on hardware to facilitate scheduling workloads in subsequent frames. FrameFire is implemented on a Xilinx XC7Z035 FPGA and verified by video segmentation tasks. The results show that the throughput is improved by 1.7× with the KWBS method. FrameFire achieved 1.04 KFPS throughput and 1.15 mJ/frame recognition energy. - Real-time Speech Recognition for IoT Purpose using a Delta Recurrent Neural Network AcceleratorItem type: Conference Paper
Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS)Gao, Chang; Braun, Stefan; Kiselev, Ilya; et al. (2019) - A 23μW Solar-Powered Keyword-Spotting ASIC with Ring-Oscillator-Based Time-Domain Feature ExtractionItem type: Conference Paper
Digest of Technical Papers / IEEE International Solid State Circuits Conference ~ 2022 IEEE International Solid- State Circuits Conference (ISSCC)Kim, Kwantae; Gao, Chang; Zenhas Graça, Rui Pedro; et al. (2022)Voice-controlled interfaces on acoustic Internet-of-Things (IoT) sensor nodes and mobile devices require integrated low-power always-on wake-up functions such as Voice Activity Detection (VAD) and Keyword Spotting (KWS) to ensure longer battery life. Most VAD and KWS ICs focused on reducing the power of the feature extractor (FEx) as it is the most power-hungry building block. A serial Fast Fourier Transform (FFT)-based KWS chip [1] achieved 510nW; however, it suffered from a high 64ms latency and was limited to detection of only 1-to-4 keywords (2-to-5 classes). Although the analog FEx [2]–[3] for VAD/KWS reported 0.2μW-to-1 μW and 10ms-to-100ms latency, neither demonstrated >5 classes in keyword detection. In addition, their voltage-domain implementations cannot benefit from process scaling because the low supply voltage reduces signal swing; and the degradation of intrinsic gain forces transistors to have larger lengths and poor linearity. - To Spike or Not to Spike: A Digital Hardware Perspective on Deep Learning AccelerationItem type: Journal Article
IEEE Journal on Emerging and Selected Topics in Circuits and SystemsOttati, Fabrizio; Gao, Chang; Chen, Qinyu; et al. (2023)As deep learning models scale, they become increasingly competitive from domains spanning from computer vision to natural language processing; however, this happens at the expense of efficiency since they require increasingly more memory and computing power. The power efficiency of the biological brain outperforms any large-scale deep learning (DL) model; thus, neuromorphic computing tries to mimic the brain operations, such as spike-based information processing, to improve the efficiency of DL models. Despite the benefits of the brain, such as efficient information transmission, dense neuronal interconnects, and the co-location of computation and memory, the available biological substrate has severely constrained the evolution of biological brains. Electronic hardware does not have the same constraints; therefore, while modeling spiking neural networks (SNNs) might uncover one piece of the puzzle, the design of efficient hardware backends for SNNs needs further investigation, potentially taking inspiration from the available work done on the artificial neural networks (ANNs) side. As such, when is it wise to look at the brain while designing new hardware, and when should it be ignored? To answer this question, we quantitatively compare the digital hardware acceleration techniques and platforms of ANNs and SNNs. As a result, we provide the following insights: (i) ANNs currently process static data more efficiently, (ii) applications targeting data produced by neuromorphic sensors, such as event-based cameras and silicon cochleas, need more investigation since the behavior of these sensors might naturally fit the SNN paradigm, and (iii) hybrid approaches combining SNNs and ANNs might lead to the best solutions and should be investigated further at the hardware level, accounting for both efficiency and loss optimization. - 3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM NetworkItem type: Conference Paper
2023 IEEE Biomedical Circuits and Systems Conference (BioCAS)Chen, Qinyu; Wang, Zuowen; Liu, Shih-Chii; et al. (2023)This paper presents a sparse Change-Based Convolutional Long Short-Term Memory (CB-ConvLSTM) model for event-based eye tracking, key for next-generation wearable healthcare technology such as AR/VR headsets. We leverage the benefits of retina-inspired event cameras, namely their low-latency response and sparse output event stream, over traditional frame-based cameras. Our CB-ConvLSTM architecture efficiently extracts spatio-temporal features for pupil tracking from the event stream, outperforming conventional CNN structures. Utilizing a delta-encoded recurrent path enhancing activation sparsity, CB-ConvLSTM reduces arithmetic operations by approximately 4.7× without losing accuracy when tested on a v2e-generated event dataset of labeled pupils. This increase in efficiency makes it ideal for real-time eye tracking in resource-constrained devices. The project code and dataset are openly available at https://github.com/qinche106/cb-convlstm-eyetracking.
Publications 1 - 10 of 25