Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Loading...
Abbreviation
IEEE trans. comput.-aided des. integr. circuits syst.
Publisher
IEEE
35 results
Search Results
Publications 1 - 10 of 35
- Graceful Performance Modulation for Power-Neutral Transient Computing SystemsItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsBalsamo, Domenico; Das, Anup; Weddell, Alex S.; et al. (2016) - YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN AccelerationItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsAndri, Renzo; Cavigelli, Lukas; Rossi, Davide; et al. (2018) - Polyhedral Compilation for Racetrack MemoriesItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsKhan, Asif A.; Mewes, Hauke; Grosser, Tobias; et al. (2020)Traditional memory hierarchy designs, primarily based on SRAM and DRAM, become increasingly unsuitable to meet the performance, energy, bandwidth, and area requirements of modern embedded and high-performance computer systems. Racetrack memory (RTM), an emerging nonvolatile memory technology, promises to meet these conflicting demands by offering simultaneously high speed, higher density, and nonvolatility. RTM provides these efficiency gains by not providing immediate access to all storage locations, but by instead storing data sequentially in the equivalent to nanoscale tapes called tracks . Before any data can be accessed, explicit shift operations must be issued that cost energy and increase access latency. The result is a fundamental change in memory performance behavior: the address distance between subsequent memory accesses now has a linear effect on memory performance. While there are first techniques to optimize programs for linear-latency memories, such as RTM, existing automatic solutions treat only scalar memory accesses. This work presents the first automatic compilation framework that optimizes static loop programs over arrays for linear-latency memories. We extend the polyhedral compilation framework Polly to generate code that maximizes accesses to the same or consecutive locations, thereby minimizing the number of shifts. Our experimental results show that the optimized code incurs up to 85% fewer shifts (average 41%), improving both performance and energy consumption by an average of 17.9% and 39.8%, respectively. Our results show that automatic techniques make it possible to effectively program linear-latency memory architectures such as RTM. - Automated Design Space Exploration for Optimized Deployment of DNN on Arm Cortex-A CPUsItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systemsde Prado, Miguel; Mundy, Andrew; Saeed, Rabia; et al. (2021)The spread of deep learning on embedded devices has prompted the development of numerous methods to optimize the deployment of deep neural networks (DNNs). Works have mainly focused on: 1) efficient DNN architectures; 2) network optimization techniques, such as pruning and quantization; 3) optimized algorithms to speed up the execution of the most computational intensive layers; and 4) dedicated hardware to accelerate the data flow and computation. However, there is a lack of research on cross-level optimization as the space of approaches becomes too large to test and obtain a globally optimized solution. Thus, leading to suboptimal deployment in terms of latency, accuracy, and memory. In this work, we first detail and analyze the methods to improve the deployment of DNNs across the different levels of software optimization. Building on this knowledge, we present an automated exploration framework to ease the deployment of DNNs. The framework relies on a reinforcement learning search that, combined with a deep learning inference framework, automatically explores the design space and learns an optimized solution that speeds up the performance and reduces the memory on embedded CPU platforms. Thus, we present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory with negligible loss in accuracy with respect to the BLAS floating-point implementation. - Maestro: Autonomous QoS Management for Mobile Applications Under Thermal ConstraintsItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsSahin, Onur; Thiele, Lothar; Coskun, Ayse K. (2019) - Frequency Scaling as a Security Threat on Multicore SystemsItem type: Conference Paper
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsMiedl, Philipp; He, Xiaoxi; Meyer, Matthias; et al. (2018)Most modern processors use Dynamic Voltage and Frequency Scaling (DVFS) for power management. DVFS allows to optimize power consumption by scaling voltage and frequency depending on performance demand. Previous research has indicated that this frequency scaling might pose a security threat in the form of a covert channel, which could leak sensitive information. However, an analysis able to determine whether DVFS is a serious security issue is still missing. In this paper, we conduct a detailed analysis of the threat potential of a DVFS-based covert channel. We investigate two multicore platforms representative of modern laptops and hand-held devices. Furthermore, we develop a channel model to determine an upper bound to the channel capacity, which is in the order of 1 bit per channel use. Last, we perform an experimental analysis using a novel transceiver implementation. The neural network based receiver yields packet error rates between 1% and 8% at average throughputs of up to 1.83 and 1.20 bits per second for platforms representative of laptops and hand-held devices, respectively. Considering the well-known small message criterion, our results show that a relevant covert channel can be established by exploiting the behaviour of computing systems with DVFS. - FlexFloat: A Software Library for Transprecision ComputingItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsTagliavini, Giuseppe; Marongiu, Andrea; Benini, Luca (2020) - An Energy-Efficient Integrated Programmable Array Accelerator and Compilation flow for Near-Sensor Ultra-low Power ProcessingItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsDas, Satyajit; Martin, Kevin J.M.; Rossi, Davide; et al. (2019) - SPICE Modeling in Verilog-A for Photo-Response in UTC-Photodiodes Targeting Beyond-5G Circuit DesignItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsMukherjee, Chhandak; Guendouz, Djeber; Deng, Marina; et al. (2023)This article reports the first accurate and physics-based Verilog-A implementation of the fully analytic form of photocurrent in SPICE compact models for uni-traveling carrier (UTC) photodiodes (PDs). To overcome the limitations of single-pole network implementations for modeling frequency dependence of the photo-response, especially at frequencies beyond 100 GHz, we explored different solutions for the complete analytic equation of the dynamic photocurrent. A new implementation has been proposed which requires three additional nodes in the UTC-PD electrical equivalent circuit and offers the best tradeoff between accuracy and computational efficiency. Model validation has been performed against on-wafer measurements from two UTC-PD technologies depicting very good accuracy over the entire frequency range. - Optimizing the NoC Slack Through Voltage and Frequency Scaling in Hard Real-Time Embedded SystemsItem type: Journal Article
IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsZhan, Jia; Stoimenov, Nikolay; Ouyang, Jin; et al. (2014)
Publications 1 - 10 of 35