Understanding back-propogation

Understanding the back-propagation algorithm for training neural networks can sometimes be challenging, because often there is a lot of confusing terminology which varies between sources. Also it is commonly described just in terms of the mathematics. Here I present a diagrammatic explanation of back-propagation for the visually inclined. I also summarize the non-linear stages that are commonly used, and provide some philosophical insight.

The forward pass though a neural net consists of alternating stages of linear multiplication by a weight matrix and non-linear activation functions which transform the output of each linear unit independently. We can write the transformation in vector form as {\bf z}={\bf Wx}, and {\bf y}=g({\bf z}) where {\bf x} is the input, {\bf z} is the output of the linear stage, {\bf y} is the output of the non-linear stage, and g({\bf z}) is the activation function which acts on each element of {\bf z} independently. For subsequent stages, the input {\bf x} is the output {\bf y} of the previous stage.
Continue reading