“…One of the most used combinations is to apply compressions sequentially, most notably first to prune weights and then to quantize the remaining ones [11,13,13,39,47], which may possibly be further compressed via lossless coding algorithms (e.g., Huffman coding). Additive combination of quantizations [3,45,53], where weights are the sum of quantized values, as well as low-rank + sparse combination [2,52] has been used to compress neural networks.…”