Zijian Zhang


Loading...

Last Name

Zhang

First Name

Zijian

Organisational unit

Search Results

Publications 1 - 1 of 1
  • Zhang, Zijian (2024)
    Large Language Models (LLM) has become a promising piece of new technology. However, the power of LLMs can only be unleashed with a lot of computation. In this work, we firstly investigate the computational characteristics of LLMs. We study the behaviors of the underlying neural network architecture during both training and inference. We perform both theoretical analysis and comprehensive benchmarks to identify the importance of self attention mechanism and introduce potential hardware efficient algorithmic optimizations. Since Graphics Processing Unit (GPU) has become the major powerhouse for neural networks, we then introduce the major architectural advances in recent GPUs from both Nvidia and AMD. We emphasis the invention of Tensor Core (or Matrix Core for AMD GPUs) and the novel design of memory systems which is vital. We also compare the similarities and differences between Nvidia and AMD GPUs. To demonstrate the benefits of these hardware architectural innovations, we progressively build a series of high performance kernels to approach state of the art performance. Since the majority of computation in self attention mechanism is spent in matrix multiplication, we build matrix multiplications for both Nvidia and AMD GPUs. We carefully implement, evaluate and analyze these kernels. Then we introduce optimizations that are much more specific to self attention mechanism. For all the implemented kernels, we also look into the compatibility across hardware architecture and software stack. Finally, we introduce future directions that help to create high performance kernels for modern GPU architectures.
Publications 1 - 1 of 1