2.7 Write the weight gradient and input gradient calculations


After I published this course I found a bug in the computation of the weight gradients. The details of the bug are less important than the implications. Learning by building is a creative endeavor and by its nature involves a lot of trial and error. Due to their complexity, machine learning algorithms make excellent hiding places for bugs. So even though the code ran and appeared to perform well, it wasn’t as healthy as I believed. I’m leaving the original walkthrough video in place with the link to the updated code here, in an effort to model transparency and public error correction.

If you are curious, the bug was in how I implemented the cross correlation between the output gradient and the inputs in the convolution layer to get the weight gradient (around 2:55 in the video above). Unlike convolution, the order of the arguments matters in cross correlation. I got them in the wrong order, and as a result the calculated weight gradient was reversed. I also needed to add some padding, as in the calculation for the input gradient, in order to get everything to work out right. This bug is fairly low level and doesn’t obscure the concepts involved, so I am less worried about it from a teaching point of view.

Still, it’s pretty embarrassing. My apologies if it tripped you up in any way.

Complete and Continue