Journal: Future Generation Computer Systems
Loading...
Abbreviation
Future gener. comput. syst.
Publisher
Elsevier
27 results
Search Results
Publications1 - 10 of 27
- Long integer NTT execution on UPMEM-PIM for 128-bit secure fully homomorphic encryptionItem type: Journal Article
Future Generation Computer SystemsBarik, Tathagata; Mehta, Priyam; Pindado, Zaira; et al. (2026)Fully Homomorphic Encryption (FHE) enables secure computations on encrypted data, hence becoming an appealing technology for privacy-preserving data processing. A core kernel in many cryptographic and FHE workloads is the Number Theoretic Transform (NTT). While NTT involves frequent non-contiguous data accesses, limiting overall performance, processing-in-memory (PIM) has the potential to address this limitation. PIM, performing computations close to the data, reduces the need for extensive data transfers between memory and compute units. However, the performance of current PIM solutions is limited by inherent factors related to the integration of processing capabilities within memory modules. In this article we analyze the performance trade-offs of NTT kernel designs along with optimized modular multiplication algorithms on PIM systems based on UPMEM hardware. Our results include significant performance improvements of up to 4.3x over baseline approaches on UPMEM-PIM, while preserving, for the first time in the literature, 128-bit security at high precision. - Special sectionItem type: Other Journal Item
Future Generation Computer SystemsArbenz, P.; Burkhart, H.; Maehle, E.; et al. (2005) - Validating the performance of GPU ports using differential performance modelsItem type: Journal Article
Future Generation Computer SystemsGeiss, Alexander; Hovi, Téodora; Calotoiu, Alexandru; et al. (2026)Offloading computation to the GPU is crucial to leverage many of today's supercomputers. We expect the GPU port of an application to outperform the pure CPU implementation, but is this always true? Simple benchmarking only allows us to take a limited number of samples from a vast space of execution configurations and can, therefore, deliver only a fragmented answer. To answer the question systematically, even for individual application kernels, we propose a semi-automatic toolchain based on differential performance modeling and intuitive visualizations. We combine empirical performance models based on unified CPU-GPU profiles with hardware characteristics to derive differential performance models that can be easily compared across device types. In four case studies, we demonstrate how our toolchain pinpoints scaling issues in GPU ports, guides performance improvements, and identifies execution configurations with superior performance. - Supporting food choices in the Internet of People: Automatic detection of diet-related activities and display of real-time interventions via mixed reality headsetsItem type: Journal Article
Future Generation Computer SystemsFuchs, Klaus Ludwig; Haldimann, Mirella; Grundmann, Tobias; et al. (2020)With the emergence of the Internet of People (IoP) and its user-centric applications, novel solutions to the many issues facing today's societies are to be expected. These problems include unhealthy diets, with obesity and diet-related diseases reaching epidemic proportions. We argue that the proliferation of mixed reality (MR) headsets as next generation primary interfaces provides promising alternatives to contemporary digital solutions in the context of diet tracking and interventions. Concretely, we propose the use of MR headset-mounted cameras for computer vision (CV) based detection of diet-related activities and the consequential display of visual real-time interventions to support healthy food choices. We provide an integrative framework and results from a technical feasibility as well as an impact study conducted in a vending machine (VM) setting. We conclude that current neural networks already enable accurate food item detection in real-world environments. Moreover, our user study suggests that real-time interventions significantly improve beverage (reduction of sugar and energy intake) as well as food choices (reduction of saturated fat). We discuss the results, learnings, and limitations and provide an overview of further technology- and intervention-related avenues of research required by developing an MR-based user support system for healthy food choices. - Reduced precision floating-point optimization for Deep Neural Network On-Device Learning on microcontrollersItem type: Journal Article
Future Generation Computer SystemsNadalini, Davide; Rusci, Manuele; Benini, Luca; et al. (2023)Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units (MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural Network (DNN) models in future TinyML applications. This paper tackles this challenge by introducing a novel reduced precision optimization technique for ODL primitives on MCU-class devices, leveraging the State-of-Art advancements in RISC-V RV32 architectures with support for vectorized 16-bit floating-point (FP16) Single-Instruction Multiple-Data (SIMD) operations. Our approach for the Forward and Backward steps of the Back-Propagation training algorithm is composed of specialized shape transform operators and Matrix Multiplication (MM) kernels, accelerated with parallelization and loop unrolling. When evaluated on a single training step of a 2D Convolution layer, the SIMD-optimized FP16 primitives result up to 1.72× faster than the FP32 baseline on a RISC-V-based 8+1-core MCU. An average computing efficiency of 3.11 Multiply and Accumulate operations per clock cycle (MAC/clk) and 0.81 MAC/clk is measured for the end-to-end training tasks of a ResNet8 and a DS-CNN for Image Classification and Keyword Spotting, respectively – requiring 17.1 ms and 6.4 ms on the target platform to compute a training step on a single sample. Overall, our approach results more than two orders of magnitude faster than existing ODL software frameworks for single-core MCUs and outperforms by 1.6× previous FP32 parallel implementations on a Continual Learning setup. - Accelerating agent-based demand-responsive transport simulations with GPUsItem type: Journal Article
Future Generation Computer SystemsSaprykin, Aleksandr; Chokani, Ndaona; Abhari, Reza S. (2022)A novel GPU-accelerated simulation model of large-scale fleet deployment, which can run country-wide, multi-modal scenarios with millions of agents and fleets of tens of thousands of vehicles within a couple of minutes, is presented. Multiple scenarios of the deployment of fleets of automated vehicles in Switzerland’s largest city, Zurich, are assessed. The simulations include the whole population of Switzerland (3.5 million car owners and 1.7 million public transit users) with their detailed travel demand, the road network (1.1 million links and 0.5 million intersections), and public transit (30 000 stops and 20 000 routes). It is demonstrated that in Zurich one automated vehicle could replace 7–8 private cars with an average increase in the road travel time of 44% and with wait times in the range of 10–15 min, provided travel demand remains constant. Furthermore, for the same fleet size, this novel accelerated simulation model runs up to 9 times faster compared to existing state-of-the-art tools. - SPEEDUP workshop on distributed computing and high-speed networksItem type: Other Journal Item
Future Generation Computer SystemsArbenz, Peter; Braun, Torsten (2003) - Optimization of privacy-utility trade-offs under informational self-determinationItem type: Journal Article
Future Generation Computer SystemsAsikis, Thomas; Pournaras, Evangelos (2020)The pervasiveness of Internet of Things results in vast volumes of personal data generated by smart devices of users (data producers) such as smart phones, wearables and other embedded sensors. It is a common requirement, especially for Big Data analytics systems, to transfer these large in scale and distributed data to centralized computational systems for analysis. Nevertheless, third parties that run and manage these systems (data consumers) do not always guarantee users’ privacy. Their primary interest is to improve utility that is usually a metric related to the performance, costs and the quality of service. There are several techniques that mask user-generated data to ensure privacy, e.g. differential privacy. Setting up a process for masking data, referred to in this paper as a ‘privacy setting’, decreases on the one hand the utility of data analytics, while, on the other hand, increases privacy. This paper studies parameterizations of privacy settings that regulate the trade-off between maximum utility, minimum privacy and minimum utility, maximum privacy, where utility refers to the accuracy in the estimations of aggregation functions. Privacy settings can be universally applied as system-wide parameterizations and policies (homogeneous data sharing). Nonetheless they can also be applied autonomously by each user or decided under the influence of (monetary) incentives (heterogeneous data sharing). This latter diversity in data sharing by informational self-determination plays a key role on the privacy-utility trajectories as shown in this paper both theoretically and empirically. A generic and novel computational framework is introduced for measuring privacy-utility trade-offs and their Pareto optimization. The framework computes a broad spectrum of such trade-offs that form privacy-utility trajectories under homogeneous and heterogeneous data sharing. The practical use of the framework is experimentally evaluated using real-world data from a Smart Grid pilot project in which energy consumers protect their privacy by regulating the quality of the shared power demand data, while utility companies make accurate estimations of the aggregate load in the network to manage the power grid. Over 20,000 differential privacy settings are applied to shape the computational trajectories that in turn provide a vast potential for data consumers and producers to participate in viable participatory data sharing systems. - Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergenceItem type: Journal Article
Future Generation Computer SystemsEjarque, Jorge; Badia, Rosa M.; Albertin, Loïc; et al. (2022)The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project. - HazardNet: A thermal hazard prediction framework for datacentersItem type: Journal Article
Future Generation Computer SystemsArdebili, Mohsen Seyedkazemi; Acquaviva, Andrea; Benini, Luca; et al. (2024)Modern scientific discoveries rely on an insatiable demand for computational resources. To meet this ever-growing computing demand, the datacenters have been established, which are complex controlled environments that host thousands of computing nodes, storage, high-performance communication networks, cooling systems, etc. A datacenter consumes a large amount of electrical power (in the range of megawatts), which gets completely transformed into heat, creating complex spatial and temporal thermal dissipation problems. Therefore, although a datacenter contains sophisticated cooling systems, minor thermal issues/anomalies can potentially trigger a chain of events that leads to an imbalance between the heat generated by computing nodes and the heat removed by the cooling system, leading to thermal hazards. Thermal hazards are detrimental to datacenter operations as they can lead to IT and facility equipment damage as well as an outage of the datacenter, with severe societal and business losses. So, predicting the thermal hazard/anomaly is critical to prevent future disasters. In doing so, collecting and analyzing large-scale monitoring signals and methodology for anomaly detection and prediction are challenging tasks. In this manuscript, after providing a methodology for defining the thermal anomaly, we proposed HazardNet, a thermal hazard prediction framework that consists of a complete pipeline of deep learning models. We evaluated the proposed framework in two different scenarios. In the first scenario, we evaluated the model's performance over the entire study period, resulting in an F1-score of 0.98. In the second scenario, we enforced causality in the collected data by training and testing the model in two disjunct and consecutive periods, resulting in an F1-score of 0.87. Thanks to these promising results, HazardNet can capture the complex spatial and temporal dependency between datacenter operational parameters and thermal hazards and predict them in advance.
Publications1 - 10 of 27