2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00225
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
28
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(28 citation statements)
references
References 19 publications
0
28
0
Order By: Relevance
“…An early work [9] considers pruning and quantization in separated steps for network compression. Some works [19,40] are proposed to jointly optimize unstructured pruning and quantization. However, it is hard to implement unstructured pruning with typical hardware.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…An early work [9] considers pruning and quantization in separated steps for network compression. Some works [19,40] are proposed to jointly optimize unstructured pruning and quantization. However, it is hard to implement unstructured pruning with typical hardware.…”
Section: Related Workmentioning
confidence: 99%
“…How to choose a suitable combination of different compression criteria for each CNN layer is a challenge and has not been extensively studied. Some recent works like [19,40] leverage unstructured pruning and quantization in a joint framework. However, unstructured pruning leads to irregular weight parameters, which are not favorable for computation acceleration in real applications on hardware.…”
mentioning
confidence: 99%
“…One of the most used combinations is to apply compressions sequentially, most notably first to prune weights and then to quantize the remaining ones [11,13,13,39,47], which may possibly be further compressed via lossless coding algorithms (e.g., Huffman coding). Additive combination of quantizations [3,45,53], where weights are the sum of quantized values, as well as low-rank + sparse combination [2,52] has been used to compress neural networks.…”
Section: Usage Of Combinationsmentioning
confidence: 99%
“…Moreover, Yu et al [56] further presented a barrier penalty to ensure that the searched models were within the complexity constraint. Yang et al [55] decoupled the constrained optimization via Alternating Direction Method of Multipliers (ADMM), and Wang et al [53] utilized the variational information bottleneck to search for the proper bitwidth and pruning ratio. Habi et al [13] and Van et al [48] directly optimized the quantization intervals for bitwidth selection of mixed-precision networks.…”
Section: Mixed-precision Quantizationmentioning
confidence: 99%
“…Our GMPQ can be leveraged as a plug-and-play module for both non-differentiable and differentiable search methods. Since differentiable methods achieve the competitive accuracy-complexity trade-off compared with nondifferentiable approaches, we employ the differentiable search framework [3,56,55] to select the optimal mixedprecision quantization policy. We design a hypernet with N k a and N k w parallel branches for convolution filters and feature maps in the k th layer.…”
Section: Generalizable Mixed-precision Quantization Via Attribution R...mentioning
confidence: 99%