This week’s Deep Learning Paper Reviews are Decision Transformer: Reinforcement Learning via Sequence Modeling and SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Decision Transformer: Reinforcement Learning via Sequence Modeling
What’s Exciting About this Paper
Traditionally, Reinforcement Learning (RL) is solved using Temporal Difference (TD) learning. Decision Transformer casts RL as a sequence modeling task, where the goal is to model the next action given the both the current and past states, actions, and reward along with the expected future reward from current time step. This method trains the agent to identify the optimal action to take given all its history and the goal (future cumulative reward) it is asked to achieve.
Key Findings
The authors evaluated the Decision Transformer (DT) on Atari and OpenAI Gym benchmarks by training on trajectories collected from expert and mediocre policies. They found that DT outperforms behavior cloning and is competitive with State-of-Art Temporal Difference learning techniques. In addition, the authors found that DT is particularly capable in tasks involving long-term credit assignment. For example, on the Key-to-Door task, DT significantly outperformed both baselines and was only trained using random rollouts.
Our Takeaways
Decision Transformer (DT) is one of the first successful attempts of casting RL as a sequence modeling problem and is able to achieve State-of-the-Art performance.
The policy learned by DT is often superior to the policy used to generate the offline data because it is able to “stitch together” different parts of the suboptimal trajectories to find optimal actions.
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
What’s Exciting About this Paper
SPIRAL, draws inspiration from BYOL and offers an alternative ASR speech pre-training method that claims to reduce training costs by 80% compared to wav2vec2 for the base model and 65% for the large model.
Key Findings
The researchers use a gradual down-sampling strategy in the model to reduce the amount of computation required during training. They speculate that this removes redundancy in way that is similar to quantization. The LL-60k SPIRAL model needed 500k training steps and 232 GPU days while a standard wave2vec2 model needs 1000k training steps and 665.6 GPU days. They also showed that the SPIRAL model handled noisy inputs better by evaluating both models on the Amazon CHiME dataset.
Our Takeaways
SPIRAL is an alternative Self Supervised Learning (SSL) pre-training method that, through aggressive downsampling, allows for up to 80% training costs savings while still matching the performance of wav2vec2. Additionally, the noise perturbation mechanism they propose might be helpful for other SSL pipelines.