REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
OPEN ACCESS
Loading...
Author / Producer
Date
2025-06
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Large Language Models (LLMs) face an inherent challenge: their knowledge is confined to the data that they have been trained on. This limitation, combined with the significant cost of retraining renders them incapable of providing up-to-date responses. To overcome these issues, Retrieval-Augmented Generation (RAG) complements the static training-derived knowledge of LLMs with an external knowledge repository. RAG consists of three stages: (i) indexing, which creates a database that facilitates similarity search on text embeddings, (ii) retrieval, which, given a user query, searches and retrieves relevant data from the database and (iii) generation, which uses the user query and the retrieved data to generate a response.
The retrieval stage of RAG in particular becomes a significant performance bottleneck in inference pipelines. In this stage, (i) a given user query is mapped to an embedding vector and (ii) an Approximate Nearest Neighbor Search (ANNS) algorithm searches for the most semantically similar embedding vectors in the database to identify relevant items. Due to the large database sizes, ANNS incurs significant data movement overheads between the host and the storage system. To alleviate these overheads, prior works propose In-Storage Processing (ISP) techniques that accelerate ANNS workloads by performing computations inside the storage system. However, existing works that leverage ISP for ANNS (i) employ algorithms that are not tailored to ISP systems, (ii) do not accelerate data retrieval operations for data selected by ANNS, and (iii) introduce significant hardware modifications to the storage system, limiting performance and hindering their adoption.
We propose REIS, the first Retrieval system tailored for RAG with In-Storage processing that addresses the limitations of existing implementations with three key mechanisms. First, REIS employs a database layout that links database embedding vectors to their associated documents, enabling efficient retrieval. Second, it enables efficient ANNS by introducing an ISP-tailored algorithm and data placement technique that: (i) distributes embeddings across all planes of the storage system to exploit parallelism, and (ii) employs a lightweight Flash Translation Layer (FTL) to improve performance. Third, REIS leverages an ANNS engine that uses the existing computational resources inside the storage system, without requiring hardware modifications. The three key mechanisms form a cohesive framework that largely enhances both the performance and energy efficiency of RAG pipelines. Compared to a high-end server-grade system, REIS improves the performance (energy efficiency) of the retrieval stage by an average of 13× (55×). REIS offers improved performance against existing ISP-based ANNS accelerators, without introducing any hardware modifications, enabling easier adoption for RAG pipelines.
Permanent link
Publication status
published
External links
Editor
Book title
Proceedings of the 52nd Annual International Symposium on Computer Architecture
Journal / series
Volume
Pages / Article No.
1171 - 1192
Publisher
Association for Computing Machinery
Event
52nd International Symposium on Computer Architecture (ISCA 2025)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Retrieval-Augmented Generation; In-Storage Processing; SSD; LLM
Organisational unit
09483 - Mutlu, Onur / Mutlu, Onur