Search
Results
-
Injecting Descriptive Meta-Information into Pre-Trained Language Models with Hypernetworks
(2021)Proceedings of Interspeech 2021There is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the output sequence. A promising technique to accelerate these applications on resource-constrained devices is network pruning, which compresses ...Conference Paper -
Pruning-Aware Merging for Efficient Multitask Inference
(2021)Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21)Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks ...Conference Paper -
MapTransfer: Urban air quality map generation for downscaled sensor deployments
(2020)2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI)Dense deployments of commodity air quality sensors have proven effective to provide spatially-resolved information on urban air pollution in real-time. However, long-term operation of a dense sensor deployment incurs enormous maintenance expenses and efforts. A cost-effective alternative is to first collect measurements with an initial dense deployment and then rely on a small subset of sensors for air quality map generation. To avoid ...Conference Paper -
Rethinking Pruning for Accelerating Deep Inference At the Edge
(2020)KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningThere is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the output sequence. A promising technique to accelerate these applications on resource-constrained devices is network pruning, which compresses ...Conference Paper