Understanding the back-propagation algorithm for training neural networks can sometimes be challenging, because often there is a lot of confusing terminology which varies between sources. Also it is commonly described just in terms of the mathematics. Here I present a diagrammatic explanation of back-propagation for the visually inclined. I also summarize the non-linear stages that are commonly used, and provide some philosophical insight.

The forward pass though a neural net consists of alternating stages of linear multiplication by a weight matrix and non-linear activation functions which transform the output of each linear unit independently. We can write the transformation in vector form as , and where is the input, is the output of the linear stage, is the output of the non-linear stage, and is the activation function which acts on each element of independently. For subsequent stages, the input is the output of the previous stage.

Continue reading