Search

Show Advanced FiltersHide Advanced Filters

Use the advanced filters to refine the search results.

Results

Now showing items 1-8 of 8

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

Gómez Luna, Juan; El Hajj, Izzat; Fernandez, Ivan; et al. (2021)

arXiv

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize ...

Working Paper
ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

Mansouri Ghiasi, Nika; Vijaykumar, Nandita; Oliveira, Geraldo F.; et al. (2022)

arXiv

Partitioning applications between NDP and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated from one segment (e.g., instructions, functions) and used in consecutive segments. Prior works take two approaches to this problem. The first class of works maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-segment data movement overhead. The second ...

Working Paper
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Gómez Luna, Juan; Guo, Yuxin; Brocard, Sylvan; et al. (2022)

arXiv

Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, ...

Working Paper
Accelerating Time Series Analysis via Processing using Non-Volatile Memories

Fernandez, Ivan; Manglik, Aditya; Giannoula, Christina; et al. (2022)

arXiv

Time Series Analysis (TSA) is a critical workload for consumer-facing devices. Accelerating TSA is vital for many domains as it enables the extraction of valuable information and predict future events. The state-of-the-art algorithm in TSA is the subsequence Dynamic Time Warping (sDTW) algorithm. However, sDTW's computation complexity increases quadratically with the time series' length, resulting in two performance implications. First, ...

Working Paper
RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Mansouri Ghiasi, Nika; Sadrosadati, Mohammad; Oliveira, Geraldo F.; et al. (2022)

arXiv

Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift from the main memory to the core and cache ...

Working Paper
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

Orosa, Lois; Koppula, Skanda; Umuroglu, Yaman; et al. (2022)

arXiv

Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. Although these kernels have grown in popularity, they stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption. We find that ...

Working Paper
A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems

Diab, Safaa; Nassereldine, Amir; Alser, Mohammed; et al. (2022)

arXiv

Sequence alignment is a fundamentally memory bound computation whose performance in modern systems is limited by the memory bandwidth bottleneck. Processing-in-memory architectures alleviate this bottleneck by providing the memory with computing competencies. We propose Alignment-in-Memory (AIM), a framework for high-throughput sequence alignment using processing-in-memory, and evaluate it on UPMEM, the first publicly-available general-purpose ...

Working Paper
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

Firtina, Can; Pillai, Kamlesh; Kalsi, Gurpreet S.; et al. (2022)

arXiv

Profile hidden Markov models (pHMMs) are widely used in many bioinformatics applications to accurately identify similarities between biological sequences (e.g., DNA or protein sequences). PHMMs use a commonly-adopted and highly-accurate method, called the Baum-Welch algorithm, to calculate these similarities. However, the Baum-Welch algorithm is computationally expensive, and existing works provide either software- or hardware-only solutions ...

Working Paper

Research Collection

Search

Results

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture ﻿

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems ﻿

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System ﻿

Accelerating Time Series Analysis via Processing using Non-Volatile Memories ﻿

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory ﻿

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators ﻿

A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems ﻿

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis ﻿

Refine by

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Accelerating Time Series Analysis via Processing using Non-Volatile Memories

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis