Open access
Author
Date
2023-05Type
- Master Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Profiling an application at runtime with hardware support plays an irreplaceable role in monitoring, debugging, runtime verification, and optimizing application performance. Most modern processors are designed with dedicated hardware resources that produce on-the-fly profiling data while incurring negligible overhead. However, the volume of the resulting profiling data can enormous, where a single core can produce upwards of 100 MB/s of data, leading to high storage and offline processing costs. Current approaches focus on reducing profiling data bandwidth with sampling-based techniques at the cost of execution details.
In this work, we introduce a hardware-accelerated profile decoder and analyzer for the ARM Coresight debug and trace architecture that can process execution traces produced by Coresight components in real-time. Profiles collected from a CPU are forwarded to a decoder implementation on a Field Programmable Gate Array (FPGA) that is part of a hybrid FPGA/CPU platform, meaning profiling data is processed on-chip. This technique can produce richer profiling data, potentially enabling more powerful optimizations, while simultaneously eliminating the associated storage and post-processing costs and avoiding the need for throttling the bandwidth of profile collection.
Our novel trace decoding process can handle the compressed instruction trace data stream that follows the Embedded Trace Macrocell (ETM) specification at 1 GB/s per core, increasing the throughput over prior work by 8×. We also greatly increase the reliability of the decoding process by removing the possibility of dropping data and the requirement for buffers, while keeping the area costs comparable. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000612599Publication status
publishedPublisher
ETH ZurichOrganisational unit
03757 - Roscoe, Timothy / Roscoe, Timothy
More
Show all metadata
ETH Bibliography
yes
Altmetrics