Single-shot pruning and quantization for hardware-friendly neural network acceleration

Published in Engineering Applications of Artificial Intelligence, 2023

Recommended citation: Bofeng Jiang*, Jun Chen* and Yong Liu. " Single-shot pruning and quantization for hardware-friendly neural network acceleration. " Engineering Applications of Artificial Intelligence. 2023.

Abstract

Applying CNN on embedded systems is challenging due to model size limitations. Pruning and quantization can help, but are time-consuming to apply separately. Our Single-Shot Pruning and Quantization strategy addresses these issues by quantizing and pruning in a single process. We evaluated our method on CIFAR-10 and CIFAR-100 datasets for image classification. Our model is 69.4% smaller with little accuracy loss, and runs 6-8 times faster on NVIDIA Xavier NX hardware.

Download paper here