July 7, 2022
Build & Learn

Deep Learning Paper Recap - Language Models

This week’s Deep Learning Paper Recap is Prune Once For All: Sparse Pre-Trained Language Models

By 
Taufiquzzaman Peyash
Deep Learning Engineer
Taufiquzzaman Peyash
Deep Learning Engineer
Taufiquzzaman Peyash
Deep Learning Engineer

This week’s Deep Learning Paper Recap is Prune Once For All: Sparse Pre-Trained Language Models

What’s Exciting About This Paper?

Model pruning is one of the key ways to compress a Deep Learning model, and the pruning techniques differ based on the model architectures. This paper introduces an architecture-agnostic method of training sparse pre-trained language models.

This method enables us to prune only once during the pre-training phase and not worry about pruning during the fine-tuning. The researchers also propose a fine-tuning mechanism that leverages distillation to achieve the best compression-to-accuracy ratio.

Key Findings

Fine-tuning pruned (sparse) models usually leads to either poor results or a low sparsity ratio. That’s why modern pruning approaches like Gradual Magnitude Pruning (GMP) apply pruning during the fine-tuning phase.

But the problem with this approach is that each time we fine-tune, we have to consider both the task and model architecture to choose the pruning technique.

With the proposed pre-training and fine-tuning mechanism, we can save time by pruning only once. Here is what the whole pipeline looks like:

Source

This technique leads to the best compression-to-accuracy ratio for BERT-base, BERT-Large, and Distil-BERT. Best scores were achieved with 85% and 90% weight pruning.

Source

They also tried Quantized Aware Training (QAT) with 85% pruning, which led to an even more accurate and smaller model than the 90% pruned model.

Our Takeaways

These pre-trained pruned models can be used to obtain fine-tuned pruned models without the burden of task-specific pruning.

This approach saves us time and effort of pruning the model, similar to a lot of pre-trained Deep Learning models out there where we don’t have to train from scratch. We just use the pruned model for fine-tuning instead in this case.

Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
AI Concepts