On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization
Metadata only
Author
Show all
Date
2021-02Type
- Other Conference Item
ETH Bibliography
yes
Altmetrics
Abstract
Dense linear algebra kernels are fundamental components of many scientific computing applications. In this work we present a novel method of deriving parallel I/O lower bounds for this broad family of programs. Based on the X-Partitioning abstraction, our method explicitly captures inter-statement dependencies. Applying our analysis to LU factorization, we derive COnfLUX, an LU algorithm with the parallel I/O cost of N3/([EQUATION]) communicated elements per processor - only 1/3× over our established lower bound. We evaluate COnfLUX on various problem sizes, demonstrating empirical results that match our theoretical analysis, communicating less than Cray ScaLAPACK, SLATE, and the asymptotically-optimal CANDMC library. Running on 1,024 nodes of Piz Daint, COnfLUX communicates 1.6× less than the second-best implementation and is expected to communicate 2.1× less on a full-scale run on Summit. Show more
Publication status
publishedExternal links
Book title
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages / Article No.
Publisher
ACMEvent
Notes
Due to the Coronavirus (COVID-19) the conference was conducted virtually.More
Show all metadata
ETH Bibliography
yes
Altmetrics