Journal: ACM Journal on Emerging Technologies in Computing Systems

Loading...

Abbreviation

ACM j. emerg. technol. comput. syst.

Publisher

Association for Computing Machinery

Journal Volumes

ISSN

1550-4832
1550-4840

Description

Search Results

Publications 1 - 5 of 5
  • Moser, Clemens; Chen, Jian-Jia; Thiele, Lothar (2010)
    ACM Journal on Emerging Technologies in Computing Systems
  • Fellermann, Harold; Hadorn, Maik; Fuechslin, Rudolf M.; et al. (2014)
    ACM Journal on Emerging Technologies in Computing Systems
  • Azghadi, Mostafa R.; Moradi, Saber; Fasnacht, Daniel B.; et al. (2015)
    ACM Journal on Emerging Technologies in Computing Systems
  • Schmuck, Manuel; Benini, Luca; Rahimi, Abbas (2019)
    ACM Journal on Emerging Technologies in Computing Systems
    Brain-inspired hyperdimensional (HD) computing models neural activity patterns of the very size of the brain's circuits with points of a hyperdimensional space, that is, with hypervectors. Hypervectors are D-dimensional (pseudo)random vectors with independent and identically distributed (i.i.d.) components constituting ultra-wide holographic words: D=10,000 bits, for instance. At its very core, HD computing manipulates a set of seed hypervectors to build composite hypervectors representing objects of interest. It demands memory optimizations with simple operations for an efficient hardware realization. In this paper, we propose hardware techniques for optimizations of HD computing, in a synthesizable open-source VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx UltraScale FPGAs: (1) We propose simple logical operations to rematerialize the hypervectors on the fly rather than loading them from memory. These operations massively reduce the memory footprint by directly computing the composite hypervectors whose individual seed hypervectors do not need to be stored in memory. (2) Bundling a series of hypervectors over time requires a multibit counter per every hypervector component. We instead propose a binarized back-to-back bundling without requiring any counters. This truly enables on-chip learning with minimal resources as every hypervector component remains binary over the course of training to avoid otherwise multibit components. (3) For every classification event, an associative memory is in charge of finding the closest match between a set of learned hypervectors and a query hypervector by using a distance metric. This operator is proportional to hypervector dimension (D), and hence may take O(D) cycles per classification event. Accordingly, we significantly improve the throughput of classification by proposing associative memories that steadily reduce the latency of classification to the extreme of a single cycle. (4) We perform a design space exploration incorporating the proposed techniques on FPGAs for a wearable biosignal processing application as a case study. Our techniques achieve up to 2.39X area saving, or 2337X throughput improvement. The Pareto optimal HD architecture is mapped on only 18340 configurable logic blocks (CLBs) to learn and classify five hand gestures using four electromyography sensors.
  • Rueckauer, Bodo; Bybee, Connor; Goettsche, Ralf; et al. (2022)
    ACM Journal on Emerging Technologies in Computing Systems
    Spiking Neural Networks (SNNs) is a promising paradigm for efficient event-driven processing of spatio-temporally sparse data streams. Spiking Neural Networks (SNNs) have inspired the design of and can take advantage of the emerging class of neuromorphic processors like Intel Loihi. These novel hardware architectures expose a variety of constraints that affect firmware, compiler, and algorithm development alike. To enable rapid and flexible development of SNN algorithms on Loihi, we developed NxTF: a programming interface derived from Keras and compiler optimized for mapping deep convolutional SNNs to the multi-core Intel Loihi architecture. We evaluate NxTF on Deep Neural Networks (DNNs) trained directly on spikes as well as models converted from traditional DNNs, processing both sparse event-based and dense frame-based datasets. Further, we assess the effectiveness of the compiler to distribute models across a large number of cores and to compress models by exploiting Loihi's weight-sharing features. Finally, we evaluate model accuracy, energy, and time-to-solution compared to other architectures. The compiler achieves near-optimal resource utilization of 80% across 16 Loihi chips for a 28-layer, 4M parameter MobileNet model with input size 128×128. In addition, we report the lowest error rate of 8.52% for the CIFAR-10 dataset on neuromorphic hardware, using an off-the-shelf MobileNet.
Publications 1 - 5 of 5