Journal: Journal of Parallel and Distributed Computing

Loading...

Abbreviation

J. parallel distrib. comput.

Publisher

Elsevier

Journal Volumes

ISSN

0743-7315
1096-0848

Description

Search Results

Publications 1 - 10 of 13
  • Gonzalez, Jorge; Palma, Mauricio G.; Hattink, Maarten; et al. (2022)
    Journal of Parallel and Distributed Computing
    Recent advances in integrated photonics enable the implementation of reconfigurable, high-bandwidth, and low energy-per-bit interconnects in next-generation data centers. We propose and evaluate an Optically Connected Memory (OCM) architecture that disaggregates the main memory from the computation nodes in data centers. OCM is based on micro-ring resonators (MRRs), and it does not require any modification to the DRAM memory modules. We calculate energy consumption from real photonic devices and integrate them into a system simulator to evaluate performance. Our results show that (1) OCM is capable of interconnecting four DDR4 memory channels to a computing node using two fibers with 1.02 pJ energy-per-bit consumption and (2) OCM performs up to 5.5× faster than a disaggregated memory with 40G PCIe NIC connectors to computing nodes.
  • Arbenz, Peter; Říha, Lubomír (2020)
    Journal of Parallel and Distributed Computing
  • Wattenhofer, Roger; Widmayer, Peter (1998)
    Journal of Parallel and Distributed Computing
    A distributed counter allows each processor in an asynchronous message passing network to access the counter value and increment it. We study the problem of implementing a distributed counter so that no processor is a communication bottleneck. We prove a lower bound of Ω(logn/log logn) on the number of messages that some processor must exchange in a sequence ofncounting operations spread overnprocessors. We propose a counter that achieves this bound when each processor increments the counter exactly once. Hence, the lower bound is tight. Because most algorithms and data structures count in some way, the lower bound holds for many distributed computations. We feel that the proposed concept of a communication bottleneck is a relevant measure of efficiency for a distributed algorithm and data structure, because it indicates the achievable degree of distribution.
  • Khanchandani, Pankaj; Wattenhofer, Roger (2020)
    Journal of Parallel and Distributed Computing
    Herlihy showed that multiprocessors must support advanced atomic objects, such as compare-and-swap, to be able to solve any arbitrary synchronization task among any number of processes (Herlihy, 1991). Elementary objects such as read-write registers and fetch-and-add are fundamentally limited to at most two processes with respect to solving an arbitrary synchronization task. Later, it was also shown that simulating an advanced atomic object using elementary objects is impossible. However, Ellen et al. observed that the above impossibility assumes computation by synchronization objects instead of synchronization instructions applied on memory locations, which is how the actual multiprocessors compute (Ellen et al., 2016). Building on that observation, we show that two elementary instructions, such as max-write and half-max, can be much better than the advanced compare-and-swap instruction. Concretely, we show the following. • [1.] Half-max and max-write instructions are elementary, i.e., have consensus number one. • [2.] Half-max and max-write instructions can simulate compare-and-swap instruction in O(1) steps. • [3.] For a pipelined butterfly interconnect, concurrent throughput of half-max and max-write instructions exceeds the concurrent throughput of compare-and-swap by a factor n — the number of processes. • [4.] The family of instructions max-write-or-⊙ are also elementary, where ⊙ is a commutative and an associative operation. • [5.] It takes Ω(logn) steps to simulate max-write-or-add using compare-and-swap but O(1) steps to simulate compare-and-swap using max-write-or-add and half-max. © 2020 Elsevier Inc.
  • Flocchini, Paola; Pagli, L.; Prencipe, Giuseppe; et al. (2008)
    Journal of Parallel and Distributed Computing
  • Mastoras, Aristeidis; Manis, George (2015)
    Journal of Parallel and Distributed Computing
  • Arbenz, Peter; Flaig, Cyril; Kellenberger, Daniel (2014)
    Journal of Parallel and Distributed Computing
  • Pauli, Sefan; Arbenz, Peter; Schwab, Christoph (2015)
    Journal of Parallel and Distributed Computing
  • Cheng, Daning; Li, Shigang; Zhang, Yunquan (2020)
    Journal of Parallel and Distributed Computing
    © 2020 Elsevier Inc. Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD (Zinkevich, 2010), often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; low-performance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WP-SGD). WP-SGD combines weighted model parameters from different nodes in the system to produce the final output. WP-SGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WP-SGD does not require that all nodes consume equal quantities of data. We also propose the methods of running two other parallel SGD algorithms combined with WP-SGD in a heterogeneous environment. The experimental results show that WP-SGD significantly outperforms the traditional parallel SGD algorithms on distributed training systems with an unbalanced workload.
  • Cecilia, José M.; Llanes, Antonio; Abellán, José L.; et al. (2018)
    Journal of Parallel and Distributed Computing
Publications 1 - 10 of 13