Search

JavaScript is disabled for your browser. Some features of this site may not work without it.

Now showing items 1-10 of 23

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

Mansouri Ghiasi, Nika; Vijaykumar, Nandita; Oliveira, Geraldo F.; et al. (2022)

arXiv

Partitioning applications between NDP and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated from one segment (e.g., instructions, functions) and used in consecutive segments. Prior works take two approaches to this problem. The first class of works maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-segment data movement overhead. The second ...

Working Paper

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Cavlak, Meryem Banu; Singh, Gagandeep; Alser, Mohammed; et al. (2022)

arXiv

Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads ...

Working Paper

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping

Kanellopoulos, Constantinos; Bera, Rahul; Stojiljkovic, Kosta; et al. (2022)

arXiv

The conventional virtual-to-physical address mapping scheme enables a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical mappings, which incurs significantly high address translation latency and translation-induced interference in the memory hierarchy, especially in data-intensive workloads. Restricting the address mapping so that a virtual address can ...

Working Paper

TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs

Yüksel, İsmail Emir; Olgun, Ataberk; Salami, Behzad; et al. (2022)

arXiv

Prior works propose SRAM-based TRNGs that extract entropy from SRAM arrays. SRAM arrays are widely used in a majority of specialized or general-purpose chips that perform the computation to store data inside the chip. Thus, SRAM-based TRNGs present a low-cost alternative to dedicated hardware TRNGs. However, existing SRAM-based TRNGs suffer from 1) low TRNG throughput, 2) high energy consumption, 3) high TRNG latency, and 4) the inability ...

Working Paper

Fundamentally Understanding and Solving RowHammer

Mutlu, Onur; Olgun, Ataberk; Yağlıkçı, A. Giray (2022)

arXiv

We provide an overview of recent developments and future directions in the RowHammer vulnerability that plagues modern DRAM (Dynamic Random Memory Access) chips, which are used in almost all computing systems as main memory. RowHammer is the phenomenon in which repeatedly accessing a row in a real DRAM chip causes bitflips (i.e., data corruption) in physically nearby rows. This phenomenon leads to a serious and widespread system security ...

Working Paper

DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips

Olgun, Ataberk; Hassan, Hasan; Yağlıkçı, A. Giray; et al. (2022)

arXiv

To understand and improve DRAM performance, reliability, security and energy efficiency, prior works study characteristics of commodity DRAM chips. Unfortunately, state-of-the-art open source infrastructures capable of conducting such studies are obsolete, poorly supported, or difficult to use, or their inflexibility limit the types of studies they can conduct. We propose DRAM Bender, a new FPGA-based infrastructure that enables experimental ...

Working Paper

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators

Manglik, Aditya; Patel, Minesh; Mao, Haiyu; et al. (2022)

arXiv

Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads as RRAM-based Processing-in-Memory (PIM) architectures natively support highly-parallel multiply-accumulate (MAC) operations that form the backbone of most NN workloads. Unfortunately, NN workloads such as transformers require support for non-MAC operations (e.g., softmax) that RRAM cannot provide natively. Consequently, state-of-the-art works ...

Working Paper

Accelerating Time Series Analysis via Processing using Non-Volatile Memories

Fernandez, Ivan; Manglik, Aditya; Giannoula, Christina; et al. (2022)

arXiv

Time Series Analysis (TSA) is a critical workload for consumer-facing devices. Accelerating TSA is vital for many domains as it enables the extraction of valuable information and predict future events. The state-of-the-art algorithm in TSA is the subsequence Dynamic Time Warping (sDTW) algorithm. However, sDTW's computation complexity increases quadratically with the time series' length, resulting in two performance implications. First, ...

Working Paper

A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations

Hassan, Hasan; Olgun, Ataberk; Yaglikci, A. Giray; et al. (2022)

arXiv

The memory controller is in charge of managing DRAM maintenance operations (e.g., refresh, RowHammer protection, memory scrubbing) in current DRAM chips. Implementing new maintenance operations often necessitates modifications in the DRAM interface, memory controller, and potentially other system components. Such modifications are only possible with a new DRAM standard, which takes a long time to develop, leading to slow progress in DRAM ...

Working Paper

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Mansouri Ghiasi, Nika; Sadrosadati, Mohammad; Oliveira, Geraldo F.; et al. (2022)

arXiv

Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift from the main memory to the core and cache ...

Working Paper

Results

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping

TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs

Fundamentally Understanding and Solving RowHammer

DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators

Accelerating Time Series Analysis via Processing using Non-Volatile Memories

A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Refine by

Research Collection

Search

Search

Results

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems ﻿

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering ﻿

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping ﻿

TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs ﻿

Fundamentally Understanding and Solving RowHammer ﻿

DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips ﻿

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators ﻿

Accelerating Time Series Analysis via Processing using Non-Volatile Memories ﻿

A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations ﻿

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory ﻿

Refine by

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping

TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs

Fundamentally Understanding and Solving RowHammer

DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators

Accelerating Time Series Analysis via Processing using Non-Volatile Memories

A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory