Understanding the back-propagation algorithm for training neural networks can sometimes be challenging, because often there is a lot of confusing terminology which varies between sources. Also it is commonly described just in terms of the mathematics. Here I present a diagrammatic explanation of back-propagation for the visually inclined. I also summarize the non-linear stages that are commonly used, and provide some philosophical insight.

The forward pass though a neural net consists of alternating stages of linear multiplication by a weight matrix and non-linear activation functions which transform the output of each linear unit independently. We can write the transformation in vector form as

, and

where

is the input,

is the output of the linear stage,

is the output of the non-linear stage, and

is the activation function which acts on each element of

independently. For subsequent stages, the input

is the output

of the previous stage.

Continue reading →