On Fault-Tolerance and Tolerated Failures in Communication Networks
OPEN ACCESS
Loading...
Author / Producer
Date
2025
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
The Internet has become an essential core technology in our modern society. As a fundamentally heterogeneous collection of independently designed and managed systems, it is challenging to operate networks such that they are reliable—despite an increasing reliance, or even dependency, on networks’ availability by their users.
In response to these rising demands, this dissertation focuses on two aspects towards increasing the reliability in today’s communication networks:
On the application layer, we propose to increase fault-tolerant protocols’ resilience for scenarios when facing faulty behavior. To that end, we present two novel byzantine fault tolerant (BFT) protocols. PermitBFT improves the lowest possible commit latency despite tolerating faulty behavior. It relaxes the traditional consensus properties to only ordering committed transactions—potentially leaving conflicting transactions uncommitted. PermitBFT is the first totally-ordering BFT protocol that achieves a commit latency of only 2 message delays while tolerating a third of the nodes to act maliciously. In the same setting, FnF-BFT constitutes the first BFT protocol with provable performance even under attack, guaranteeing a constant fraction of its best-case throughput as long as the network remains stable. To achieve that, FnF-BFT allows all nodes to act as leaders in parallel, ensuring that at least the correct nodes make steady progress continuously.
On the network layer, we present a framework for facilitating the analysis of network instabilities caused by routing events, both in lab environments and in live networks. To that end, we (i) develop a measurement framework for studying the effects of transient forwarding anomalies in a lab environment, (ii) show how to infer transient forwarding anomalies in live networks from both control-plane messages or router logs with our system Trix, and (iii) propose a design to explore networks’ various convergence behaviors with simulation—facilitating the systematic analysis and potential prevention of network instabilities in the future.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Vanbever, Laurent
Examiner : Wattenhofer, Roger
Examiner : Schmid, Stefan
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
09477 - Vanbever, Laurent / Vanbever, Laurent