BiVM: Accurate Binarized Neural Network for Efficient Video Matting


METADATA ONLY
Loading...

Date

2025-10

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Web of Science:
Scopus:
Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Deep neural networks for real-time video matting suffer significant computational limitations on edge devices, hindering their adoption in widespread applications such as online conferences and short-form video production. Binarization emerges as one of the most common compression approaches with compact 1-bit parameters and efficient bitwise operations. However, accuracy and efficiency limitations exist in the binarized video matting network due to its degenerated encoder and redundant decoder. Following a theoretical analysis based on the information bottleneck principle, the limitations are mainly caused by the degradation of prediction-relevant information in the intermediate features and the redundant computation in prediction-irrelevant areas. We present BiVM, an accurate and resource-efficient Binarized neural network for Video Matting. First, we present a series of binarized computation structures with elastic shortcuts and evolvable topologies, enabling the constructed encoder backbone to extract high-quality representations from input videos for accurate prediction. Second, we sparse the intermediate feature of the binarized decoder by masking homogeneous parts, allowing the decoder to focus on representation with diverse details while alleviating the computation burden for efficient inference. Furthermore, we construct a localized binarization-aware mimicking framework with the information-guided strategy, prompting matting-related representation in fullprecision counterparts to be accurately and fully utilized. Comprehensive experiments show that the proposed BiVM surpasses alternative binarized video matting networks, including state-of-the-art (SOTA) binarization methods, by a substantial margin. Moreover, our BiVM achieves significant savings of 14.3x and 21.6x in computation and storage costs, respectively. We also evaluate BiVM on ARM CPU hardware.

Publication status

published

Editor

Book title

Volume

47 (10)

Pages / Article No.

9250 - 9265

Publisher

IEEE

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Video matting; binarization; model compression; low-bit quantization

Organisational unit

01225 - D-ITET Zentr. f. projektbasiertes Lernen / D-ITET Center for Project-Based Learning

Notes

Funding

219943 - Neuromorphic Attention Models for Event Data (NAMED) (SNF)

Related publications and datasets