Zhongnan Qu


Loading...

Last Name

Qu

First Name

Zhongnan

Organisational unit

Search Results

Publications 1 - 10 of 12
  • Dong, Xin; De Salvo, Barbara; Li, Meng; et al. (2022)
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints. To achieve an optimal balance among computation, communication, and performance, a split-aware neural architecture search framework, SplitNets, is introduced to conduct model designing, splitting, and communication reduction simultaneously. We further extend the framework to multi-view systems for learning to fuse inputs from multiple camera sensors with optimal performance and systemic efficiency. We validate SplitNets for single-view system on ImageNet as well as multi-view system on 3D classification, and show that the SplitNets framework achieves state-of-the-art (SOTA) performance and system latency compared with existing approaches.
  • Chen, Guang; Cao, Hu; Ye, Canbo; et al. (2019)
    Frontiers in Neurorobotics
    Neuromorphic vision sensors are bio-inspired cameras that naturally capture the dynamics of a scene with ultra-low latency, filtering out redundant information with low power consumption. Few works are addressing the object detection with this sensor. In this work, we propose to develop pedestrian detectors that unlock the potential of the event data by leveraging multi-cue information and different fusion strategies. To make the best out of the event data, we introduce three different event-stream encoding methods based on Frequency, Surface of Active Event (SAE) and Leaky Integrate-and-Fire (LIF). We further integrate them into the state-of-the-art neural network architectures with two fusion approaches: the channel-level fusion of the raw feature space and decision-level fusion with the probability assignments. We present a qualitative and quantitative explanation why different encoding methods are chosen to evaluate the pedestrian detection and which method performs the best. We demonstrate the advantages of the decision-level fusion via leveraging multi-cue event information and show that our approach performs well on a self-annotated event-based pedestrian dataset with 8,736 event frames. This work paves the way of more fascinating perception applications with neuromorphic vision sensors.
  • Deep Partial Updating
    Item type: Working Paper
    Qu, Zhongnan; Liu, Cong; Guo, Junfeng; et al. (2020)
    arXiv
    Emerging edge intelligence applications require the server to continuously retrain and update deep neural networks deployed on remote edge nodes in order to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects only a subset of weights to update at each server-to-edge communication round, while achieving a similar performance compared to full updating. Our method is established through analytically upper-bounding the loss difference between partial updating and full updating, and only updates the weights which make the largest contributions to the upper bound. Extensive experimental results demonstrate the efficacy of our partial updating methodology which achieves a high inference accuracy while updating a rather small number of weights.
  • Cao, Hu; Qu, Zhongnan; Chen, Guang; et al. (2024)
    IEEE Transactions on Artificial Intelligence
    Vision Transformers (ViTs) have recently achieved promising results in various computer vision tasks. However, ViTs have high computation costs and a large number of parameters due to the stacked multihead self-attention (MHSA) and expanded feed-forward network (FFN) modules. Since the complexity of Transformer-based models is quadratic with the length of the input tokens, most current efforts focus on reducing the number of tokens in ViTs to improve the model efficiency. Unlike previous studies, we argue that diverse redundant features help ViTs understand the data comprehensively. In this article, we propose ghost vision Transformer (GhostViT), which achieves both computation and storage efficiency. The key concept of GhostViT is to generate more diverse features using cheap operations in the MHSA and FFN modules. We experimentally demonstrate that our GhostViT can significantly reduce both the parameters and floating point operations (FLOPs) of ViTs while achieving similar or better accuracy. For example, about 14% of parameters and 17% of FLOPs of the DeiT-tiny model are reduced without any accuracy loss on the ImageNet-1 K dataset.
  • Heim, Lennart; Biri, Andreas; Qu, Zhongnan; et al. (2021)
    arXiv
    With the surge of inexpensive computational and memory resources, neural networks (NNs) have experienced an unprecedented growth in architectural and computational complexity. Introducing NNs to resource-constrained devices enables cost-efficient deployments, widespread availability, and the preservation of sensitive data. This work addresses the challenges of bringing MachineLearning to microcontroller units (MCUs), where we focus on the ubiquitous ARM Cortex-M architecture. The detailed effects and trade-offs that optimization methods, software frameworks, and MCU hardware architecture have on key performance metrics such as inference latency and energy consumption have not been previously studied in depth for state-of-the-art frameworks such as TensorFlow Lite Micro. We find that empirical investigations which measure the perceptible metrics –performance as experienced by the user– are indispensable, as the impact of specialized instructions and layer types can be subtle. To this end, we propose an implementation-aware design as a cost-effective method for verification and benchmarking. Employing our developed toolchain, we demonstrate how existing NN deployments on resource-constrained devices can be improved by systematically optimizing NNs to their targeted application scenario.
  • Qu, Zhongnan; Zhou, Zimu; Cheng, Yun; et al. (2020)
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on low-resource mobile and embedded platforms. We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy. Unlike previous MBN quantization solutions that train a quantizer by minimizing the error to reconstruct full precision weights, ALQ directly minimizes the quantization-induced error on the loss function involving neither gradient approximation nor full precision maintenance. ALQ also exploits strategies including adaptive bitwidth, smooth bitwidth reduction, and iterative trained quantization to allow a smaller network size without loss in accuracy. Experiment results on popular image datasets show that ALQ outperforms state-of-the-art compressed networks in terms of both storage and accuracy.
  • Li, Bin; Cao, Hu; Qu, Zhongnan; et al. (2020)
    Frontiers in Neurorobotics
    Robotic grasping plays an important role in the field of robotics. The current state-of-the-art robotic grasping detection systems are usually built on the conventional vision, such as the RGB-D camera. Compared to traditional frame-based computer vision, neuromorphic vision is a small and young community of research. Currently, there are limited event-based datasets due to the troublesome annotation of the asynchronous event stream. Annotating large scale vision datasets often takes lots of computation resources, especially when it comes to troublesome data for video-level annotation. In this work, we consider the problem of detecting robotic grasps in a moving camera view of a scene containing objects. To obtain more agile robotic perception, a neuromorphic vision sensor (Dynamic and Active-pixel Vision Sensor, DAVIS) attaching to the robot gripper is introduced to explore the potential usage in grasping detection. We construct a robotic grasping dataset named Event-Grasping dataset with 91 objects. A spatial-temporal mixed particle filter (SMP Filter) is proposed to track the LED-based grasp rectangles, which enables video-level annotation of a single grasp rectangle per object. As LEDs blink at high frequency, the Event-Grasping dataset is annotated at a high frequency of 1 kHz. Based on the Event-Grasping dataset, we develop a deep neural network for grasping detection that considers the angle learning problem as classification instead of regression. The method performs high detection accuracy on our Event-Grasping dataset with 93% precision at an object-wise level split. This work provides a large-scale and well-annotated dataset and promotes the neuromorphic vision applications in agile robot.
  • Chen, Guang; Liu, Shu; Ren, Kejia; et al. (2019)
    IET Intelligent Transport Systems
  • Qu, Zhongnan; Liu, Cong; Thiele, Lothar (2022)
    Lecture Notes in Computer Science ~ Computer Vision – ECCV 2022
    Emerging edge intelligence applications require the server to retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects a small subset of weights to update in each server-to-edge communication round, while achieving a similar performance compared to full updating. Our method is established through analytically upper bounding the loss difference between partial updating and full updating, and only updates the weights which make the largest contributions to the upper bound. Extensive experimental results demonstrate the efficacy of our partial updating methodology which achieves a high inference accuracy while updating a rather small number of weights.
  • Qu, Zhongnan; Zhou, Zimu; Tong, Yongxin; et al. (2022)
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Publications 1 - 10 of 12