Evaluation of NTP/PTP fine-grain synchronization performance in HPC clusters
OPEN ACCESS
Loading...
Author / Producer
Date
2018-11-04
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Fine-grain time synchronization is important to address several challenges in today and future High Performance Computing (HPC) centers. Among the many, (i) co-scheduling techniques in parallel applications with sensitive bulk synchronous workloads, (ii) performance analysis tools and (iii) autotuning strategies that want to exploit State-of-the-Art (SoA) high resolution monitoring systems, are three examples where synchronization of few microseconds is required. Previous works report custom solutions to reach this performance without incurring in extra cost of dedicated hardware. On the other hand, the benefits to use robust standards which are widely supported by the community, such as Network Time Protocol (NTP) and Precision Time Protocol (PTP), are evident. With today's software and hardware improvements of these two protocols and off-the-shelf integration in SoA HPC servers no expensive extra hardware is required anymore, but an evaluation of their performance in supercomputing clusters is needed. Our results show NTP can reach on computing nodes an accuracy of 2.6us and a precision below 2.7us, with negligible overhead. These values can be bounded below microseconds, with PTP and low-cost switches (no needs of GPS antenna). Both protocols are also suitable for data time-stamping in SoA HPC monitoring infrastructures. We validate their performance with two real use-cases, and quantify scalability and CPU overhead. Finally, we report software settings and low-cost network configuration to reach these high precision synchronization results.
Permanent link
Publication status
published
External links
Editor
Book title
ANDARE '18 Proceedings of the 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems
Journal / series
Volume
Pages / Article No.
3
Publisher
Association for Computing Machinery
Event
2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems (ANDARE 2018)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
NTP; PTP; HPC Clusters; MPI; Fine Grain Synchronization; Power and Performance Monitoring
Organisational unit
03996 - Benini, Luca / Benini, Luca