Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach

Yang, Haichuan; Gui, Shupeng; Zhu, Yuhao; Liu, Ji

doi:10.1109/cvpr42600.2020.00225

Cited by 48 publications

(28 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An early work [9] considers pruning and quantization in separated steps for network compression. Some works [19,40] are proposed to jointly optimize unstructured pruning and quantization. However, it is hard to implement unstructured pruning with typical hardware.…”

Section: Related Workmentioning

confidence: 99%

“…How to choose a suitable combination of different compression criteria for each CNN layer is a challenge and has not been extensively studied. Some recent works like [19,40] leverage unstructured pruning and quantization in a joint framework. However, unstructured pruning leads to irregular weight parameters, which are not favorable for computation acceleration in real applications on hardware.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Hybrid Network Compression via Meta-Learning

Zhang

Wang

2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Neural network pruning and quantization are two major lines of network compression. This raises a natural question that whether we can find the optimal compression by considering multiple network compression criteria in a unified framework. This paper incorporates two criteria and seeks layer-wise compression by leveraging the meta-learning framework. A regularization loss is applied to unify the constraint of input and output channel numbers, bit-width of network activations and weights, so that the compressed network can satisfy a given Bit-OPerations counts (BOPs) constraint. We further propose an iterative compression constraint for optimizing the compression procedure, which effectively achieves a high compression rate and maintains the original network performance. Extensive experiments on various networks and vision tasks show that the proposed method yields better performance and compression rates than recent methods. For instance, our method achieves better image classification accuracy and compactness than the recent DJPQ. It achieves similar performance with the recent DHP in image super-resolution, meanwhile saves about 50% computation. CCS CONCEPTS• Computing methodologies → Computer vision.

show abstract

Section: Related Workmentioning

confidence: 99%

mentioning

confidence: 99%

Hybrid Network Compression via Meta-Learning

Zhang

Wang

2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…One of the most used combinations is to apply compressions sequentially, most notably first to prune weights and then to quantize the remaining ones [11,13,13,39,47], which may possibly be further compressed via lossless coding algorithms (e.g., Huffman coding). Additive combination of quantizations [3,45,53], where weights are the sum of quantized values, as well as low-rank + sparse combination [2,52] has been used to compress neural networks.…”

Section: Usage Of Combinationsmentioning

confidence: 99%

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Carreira-Perpiñán,

Idelbayev

2021

Preprint

View full text Add to dashboard Cite

Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a given model? Or even better: can we improve by combining compressions in a suitable way? We formulate this generally as a problem of optimizing the loss but where the weights are constrained to equal an additive combination of separately compressed parts; and we give an algorithm to learn the corresponding parts' parameters. Experimentally with deep neural nets, we observe that 1) we can find significantly better models in the error-compression space, indicating that different compression types have complementary benefits, and 2) the best type of combination depends exquisitely on the type of neural net. For example, we can compress ResNets and AlexNet using only 1 bit per weight without error degradation at the cost of adding a few floating point weights. However, VGG nets can be better compressed by combining low-rank with a few floating point weights.

show abstract

“…Moreover, Yu et al [56] further presented a barrier penalty to ensure that the searched models were within the complexity constraint. Yang et al [55] decoupled the constrained optimization via Alternating Direction Method of Multipliers (ADMM), and Wang et al [53] utilized the variational information bottleneck to search for the proper bitwidth and pruning ratio. Habi et al [13] and Van et al [48] directly optimized the quantization intervals for bitwidth selection of mixed-precision networks.…”

Section: Mixed-precision Quantizationmentioning

confidence: 99%

“…Our GMPQ can be leveraged as a plug-and-play module for both non-differentiable and differentiable search methods. Since differentiable methods achieve the competitive accuracy-complexity trade-off compared with nondifferentiable approaches, we employ the differentiable search framework [3,56,55] to select the optimal mixedprecision quantization policy. We design a hypernet with N k a and N k w parallel branches for convolution filters and feature maps in the k th layer.…”

Section: Generalizable Mixed-precision Quantization Via Attribution R...mentioning

confidence: 99%

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

Wang

Han²,

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we propose a generalizable mixedprecision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy optimality, leading to heavy search cost on challenging largescale datasets in realistic applications. On the contrary, our GMPQ searches the mixedquantization policy that can be generalized to largescale datasets with only a small amount of data, so that the search cost is significantly reduced without performance degradation. Specifically, we observe that locating network attribution correctly is general ability for accurate visual analysis across different data distribution. Therefore, despite of pursuing higher model accuracy and complexity, we preserve attribution rank consistency between the quantized models and their full-precision counterparts via efficient capacity-aware attribution imitation for generalizable mixed-precision quantization strategy search. Extensive experiments show that our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks in significantly reduced search cost. The code is available at https://github.com/ZiweiWangTHU/GMPQ.git.

show abstract

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach

Cited by 48 publications

References 19 publications

Hybrid Network Compression via Meta-Learning

Hybrid Network Compression via Meta-Learning

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

Contact Info

Product

Resources

About