ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model

Yang, Haichuan; Zhu, Yuhao; Ji, Liu

doi:10.1109/cvpr.2019.01146

Cited by 37 publications

(24 citation statements)

References 36 publications

(70 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pruning from later layers that process smaller input resolution might not achieve as much speedup as pruning from early layers. Constraint aware optimization using Alternating Direction Method of Multipliers (ADMM) [2] such as proposed in [44] can be further integrated with our method to optimize over latency instead of FLOPs.…”

Section: Theoretical Vs Practical Speedupmentioning

confidence: 99%

Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Elkerdawy¹,

Elhoushi²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization based methods lack transparent tradeoff hyperparameter selection to realize computational budget. Our contribution is two fold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. Inspired by the Hebbian theory in Neuroscience: "neurons that fire together wire together", we propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood for each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve lower drop in accuracy with up to 13% improvement in FLOPs reduction.

show abstract

Section: Theoretical Vs Practical Speedupmentioning

confidence: 99%

Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Elkerdawy¹,

Elhoushi²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In contrast, pruning-based methods construct smaller networks from a pretrained over-parameterized neural network by gradually removing the least important neurons. Various pruning strategies have been developed based on different heuristics (e.g., Han et al, 2016;Luo et al, 2017;He et al, 2017b;Peng et al, 2019), including energy-aware pruning methods that use energy consumption related metrics to guide the pruning process (e.g., Gordon et al, 2018;He et al, 2018;Yang et al, 2019). However, a common issue of these methods is to alter the standard training objective with sparsity-induced regularization which necessities sensitive hyperparameters tuning.…”

Section: Related Workmentioning

confidence: 99%

Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent

Wang,

Li,

et al. 2019

Preprint

View full text Add to dashboard Cite

Designing energy-efficient networks is of critical importance for enabling stateof-the-art deep learning in mobile and edge settings where the computation and energy budgets are highly limited. Recently, Liu et al. (2019b) framed the search of efficient neural architectures into a continuous splitting process: it iteratively splits existing neurons into multiple off-springs to achieve progressive loss minimization, thus finding novel architectures by gradually growing the neural network. However, this method was not specifically tailored for designing energyefficient networks, and is computationally expensive on large-scale benchmarks. In this work, we substantially improve Liu et al. (2019b) in two significant ways: 1) we incorporate the energy cost of splitting different neurons to better guide the splitting process, thereby discovering more energy-efficient network architectures; 2) we substantially speed up the splitting process of Liu et al. (2019b), which requires expensive eigen-decomposition, by proposing a highly scalable Rayleighquotient stochastic gradient algorithm. Our fast algorithm allows us to reduce the computational cost of splitting to the same level of typical back-propagation updates and enables efficient implementation on GPU. Extensive empirical results show that our method can train highly accurate and energy-efficient networks on challenging datasets such as ImageNet, improving a variety of baselines, including the pruning-based methods and expert-designed architectures.

show abstract

“…Liu et al [29] used a hyper-network in the ES algorithm to find the layer-wise sparsity for channel pruning. Instead of regarding the layer-wise sparsity as hyper-parameters, recently proposed energy-constrained compression methods [43,44] used optimization-based approaches to prune the DNNs under a given energy budget. Besides the above, there are some methods on searching efficient neural architectures [2,36], while our work mainly concentrates on compressing a given architecture.…”

Section: Automated Model Compressionmentioning

confidence: 99%

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach

Yang

Gui

Zhu

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is an increased demand for deploying DNNs on devices that do not have abundant resources such as memory and computation units. Recently, network compression through a variety of techniques such as pruning and quantization have been proposed to reduce the resource requirement. A key parameter that all existing compression techniques are sensitive to is the compression ratio (e.g., pruning sparsity, quantization bitwidth) of each layer. Traditional solutions treat the compression ratios of each layer as hyper-parameters, and tune them using human heuristic. Recent researchers start using black-box hyper-parameter optimizations, but they will introduce new hyper-parameters and have efficiency issue. In this paper, we propose a framework to jointly prune and quantize the DNNs automatically according to a target model size without using any hyper-parameters to manually set the compression ratio for each layer. In the experiments, we show that our framework can compress the weights data of ResNet-50 to be 836× smaller without accuracy loss on CIFAR-10, and compress AlexNet to be 205× smaller without accuracy loss on ImageNet classification.

show abstract

ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model

Cited by 37 publications

References 36 publications

Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach

Contact Info

Product

Resources

About