On Fault-Tolerance and Tolerated Failures in Communication Networks


Loading...

Author / Producer

Date

2025

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

The Internet has become an essential core technology in our modern society. As a fundamentally heterogeneous collection of independently designed and managed systems, it is challenging to operate networks such that they are reliable—despite an increasing reliance, or even dependency, on networks’ availability by their users. In response to these rising demands, this dissertation focuses on two aspects towards increasing the reliability in today’s communication networks: On the application layer, we propose to increase fault-tolerant protocols’ resilience for scenarios when facing faulty behavior. To that end, we present two novel byzantine fault tolerant (BFT) protocols. PermitBFT improves the lowest possible commit latency despite tolerating faulty behavior. It relaxes the traditional consensus properties to only ordering committed transactions—potentially leaving conflicting transactions uncommitted. PermitBFT is the first totally-ordering BFT protocol that achieves a commit latency of only 2 message delays while tolerating a third of the nodes to act maliciously. In the same setting, FnF-BFT constitutes the first BFT protocol with provable performance even under attack, guaranteeing a constant fraction of its best-case throughput as long as the network remains stable. To achieve that, FnF-BFT allows all nodes to act as leaders in parallel, ensuring that at least the correct nodes make steady progress continuously. On the network layer, we present a framework for facilitating the analysis of network instabilities caused by routing events, both in lab environments and in live networks. To that end, we (i) develop a measurement framework for studying the effects of transient forwarding anomalies in a lab environment, (ii) show how to infer transient forwarding anomalies in live networks from both control-plane messages or router logs with our system Trix, and (iii) propose a design to explore networks’ various convergence behaviors with simulation—facilitating the systematic analysis and potential prevention of network instabilities in the future.

Publication status

published

Editor

Contributors

Examiner : Vanbever, Laurent
Examiner : Wattenhofer, Roger
Examiner : Schmid, Stefan

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09477 - Vanbever, Laurent / Vanbever, Laurent

Notes

Funding

Related publications and datasets