Search
Results
-
ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems
(2022)arXivPartitioning applications between NDP and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated from one segment (e.g., instructions, functions) and used in consecutive segments. Prior works take two approaches to this problem. The first class of works maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-segment data movement overhead. The second ...Working Paper -
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System
(2022)arXivTraining machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, ...Working Paper -
Sectored DRAM: An Energy-Efficient High-Throughput and Practical Fine-Grained DRAM Architecture
(2022)arXivThere are two major sources of inefficiency in computing systems that use modern DRAM devices as main memory. First, due to coarse-grained data transfers (size of a cache block, usually 64B between the DRAM and the memory controller, systems waste energy on transferring data that is not used. Second, due to coarse-grained DRAM row activation, systems waste energy by activating DRAM cells that are unused in many workloads where spatial ...Working Paper -
RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory
(2022)arXivRecent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift from the main memory to the core and cache ...Working Paper -
Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Cooperation
(2022)arXivA growth in data volume, combined with increasing demand for real-time analysis (using the most recent data), has resulted in the emergence of database systems that concurrently support transactions and data analytics. These hybrid transactional and analytical processing (HTAP) database systems can support real-time data analysis without the high costs of synchronizing across separate single-purpose databases. Unfortunately, for many ...Working Paper