Nora Hossle
Loading...
2 results
Filters
Reset filtersSearch Results
Publications 1 - 2 of 2
- Memory, Address Spaces, Intra-Process-Isolation and NoodlesItem type: Doctoral ThesisHossle, Nora (2025)Traditionally, both operating systems and hardware only offered support for inter-process-isolation. However, today there exist a number of application types that are also in need of (lightweight) intra-process-isolation: Examples are web servers, applications written in a memory-safe language that use native libraries, applications using third-party libraries and applications handling sensitive data (e.g. cryptographic keys). In addition to these applications, new and upcoming lightweight multi-tenant serverless platforms, e.g. GraalOS [53] and Cloudflare Workers [9], also need intra-process-isolation mechanisms. But even though hardware support for intra-process-isolation is now emerging (e.g. Intel’s MPK [27] and Arm’s PIE/POE [6]) it is not available on all machines yet and usually very limited in the number of protection domains supported (e.g. 16 for MPK and Arm’s PIE/POE). Processes on the other hand are often a too corse grained abstraction to be an alternative, imposing high performance penalties on each protection domain switch. Threads are – though sufficiently lightweight – not meaningfully isolated. An additional challenge is posed by the non-trivial complexity of real-life systems, making it hard to retrofit them with a new intra-process-isolation primitive (such as e.g. MPK). In this thesis I investigate this in the context of Oracle’s GraalVM [55] and present Mistletoe, a lean C-runtime enabling a developer to tap into GraalVM’s execution flow with little knowledge of JVM internals. I also analyze GraalVM’s isolates [102] and their applicability to intra-process-isolation and present a prototype application built atop Mistletoe for per-isolate cycle accurate billing. Even though all JVMs (including GraalVM) deliberately abstract the hardware it is possible to directly access and experiment with new hardware features, e.g. make use of MPK, by using Mistletoe. This thesis then presents Noodles, a novel intra-process-isolation mechanism that is primarily intended for lightweight multi-tenant serverless platforms. Noodles allows a thread to switch to its own virtual address space (while still remaining part of the same process context) – creating a so called fat thread. Noodles’s flexible API allows the caller to easily share relevant data with the newly created fat thread while removing access to all unrelated data – by not just disallowing access to the memory in question but by not having the physical pages mapped into the virtual address space altogether. I implemented Noodles as a patch to the recent 5.15 Linux LTS kernel. Benchmark results show that creating a noodle is only 10%-11% as expensive as the only other option to create a new virtual address space – fork(2) – in relevant scenarios. To demonstrate Noodles applicability to real-world applications I present an example HTTPS server application using Noodles’s fine grained memory isolation capabilities to prevent the exploitation of a Heartbleed [23] inspired bug.
- High Throughput Hardware Accelerated CoreSight Trace DecodingItem type: Conference Paper
2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)Weingarten, Matthew Edwin; Hossle, Nora; Roscoe, Timothy (2024)A single tracing component embedded into a high-frequency processor may produce up to 1 GB/s of trace data or more. These data are vital in debugging, monitoring, verification, and performance analysis in System-on-chip and heterogeneous system development. Hardware trace decoders and analyzers have emerged to support online processing of trace data for real-time applications. However, the existing hardware trace decoders designed for the Embedded Trace Macrocell version 4 (ETMv4), a standard feature in most modern ARM processors, can only process trace data at a maximum rate of 250 MB/s. This paper proposes an optimized and parallelized trace decoder for the ETMv4 specification implemented on a Xilinx Ultrascale+ processing up to 1 GB/s of trace data from a single ETM.
Publications 1 - 2 of 2