error
Kurzer Serviceunterbruch am Donnerstag, 3. März 2026, 12 bis 13 Uhr. Sie können in diesem Zeitraum keine neuen Dokumente hochladen oder bestehende Einträge bearbeiten. Das Login wird in diesem Zeitraum deaktiviert. Grund: Wartungsarbeiten // Short service interruption on Thursday, March 3, 2026, 12.00 – 13.00. During this time, you won’t be able to upload new documents or edit existing records. The login will be deactivated during this time. Reason: maintenance work
 

Continual Benchmarking of LLM-Based Systems on Networking Operations


Loading...

Author / Producer

Date

2025-09-10

Publication Type

Conference Poster

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

The inherent complexity of operating modern network infrastructures has led to growing interest in using Large Language Models (LLMs) to support network operators, particularly in the area of Incident Management (IM). Yet, the absence of standardized benchmarks for evaluating such systems poses challenges in tracking progress, comparing approaches, and uncovering their limitations. As LLM-based tools become widespread, there is a clear need for a comprehensive benchmarking suite that reflects the diversity and complexity of operational tasks encountered in real-world networks. This poster outlines our vision for designing such a modular benchmarking suite. We describe an approach for generating operational tasks of varying complexity and discuss how to evaluate LLMs on these tasks and assess system-level performance. As a preliminary evaluation, we benchmark three LLMs --- GPT-4.1, Gemini 2.5-Pro, and Claude 3.7 Sonnet --- across over 100 test cases and two pipeline variants.

Publication status

accepted

External links

Editor

Book title

Journal / series

Volume

Pages / Article No.

Publisher

Event

SIGCOMM '25

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09477 - Vanbever, Laurent / Vanbever, Laurent

Notes

Funding

Related publications and datasets