Pruning vs Quantization: Which is Better?

Kuzmin, Andrey; Nagel, Markus; van Baalen, Mart; Behboodi, Arash; Blankevoort, Tijmen

Computer Science > Machine Learning

arXiv:2307.02973 (cs)

[Submitted on 6 Jul 2023 (v1), last revised 16 Feb 2024 (this version, v2)]

Title:Pruning vs Quantization: Which is Better?

Authors:Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort

View PDF HTML (experimental)

Abstract:Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2307.02973 [cs.LG]
	(or arXiv:2307.02973v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.02973

Submission history

From: Andrey Kuzmin [view email]
[v1] Thu, 6 Jul 2023 13:18:44 UTC (1,654 KB)
[v2] Fri, 16 Feb 2024 09:52:58 UTC (2,015 KB)

Computer Science > Machine Learning

Title:Pruning vs Quantization: Which is Better?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pruning vs Quantization: Which is Better?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators