Weight Initialization, even though a minor concern, has serious effects on the deep feedforward neural networks we train.
Thanks to Xavier Glorot and Yoshua Bengio, we are aware that using a normal distribution for initializing weights with mean of 0 and variance of 1 contributes to the unstable gradients problem. That's why new techniques have been proposed to overcome these issues.
In this video, we learn what these techniques are, how they are different from each other, and what their perfect activation function matches are.