Fault-Tolerant Distributed Directories


Loading...

Date

2024

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Many fundamental distributed computing problems require coordinated access to a shared resource. A distributed directory is an overlay data structure on an asynchronous graph G that helps to access a shared token t. The directory supports three basic operations: publish, to initialize the directory, lookup, to read the contents of the token, and move, to get exclusive update access to the token. There are known directory schemes that achieve message complexity within polylog factors of the optimal cost with respect to the number of nodes n and the diameter D of G. Motivated by fault-tolerant distributed computing implementations, we consider the impact of edge failures on distributed directories. We give a distributed directory overlay data structure that can tolerate edge failures without disrupting the directory operations. The directory can be repaired concurrently while it processes directory operations. We analyze the impact of the faults on the amortized cost of the three directory operations compared to the optimal cost. We show that f edges failures increase the amortized competitive ratio of the operations by at most factor f. We also analyze the message complexity to repair the overlay structure, in terms of the number of messages that are sent and the maximum distance a message traverses. For an edge failure, the repair mechanism uses messages of size O(log n) that traverse distance at most D′, the graph diameter after the fault. To our knowledge, this is the first asymptotic analysis of a fault-tolerant distributed directory.

Publication status

published

Book title

3rd Symposium on Algorithmic Foundations of Dynamic Networks (SAND 2024)

Volume

292

Pages / Article No.

5

Publisher

Schloss Dagstuhl – Leibniz-Zentrum für Informatik

Event

3rd Symposium on Algorithmic Foundations of Dynamic Networks (SAND 2024)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

distributed directory; sparse partition; fault tolerance; message complexity; path dilation

Organisational unit

03604 - Wattenhofer, Roger / Wattenhofer, Roger check_circle

Notes

Funding

Related publications and datasets