# Understanding back-propogation

Understanding the back-propagation algorithm for training neural networks can sometimes be challenging, because often there is a lot of confusing terminology which varies between sources. Also it is commonly described just in terms of the mathematics. Here I present a diagrammatic explanation of back-propagation for the visually inclined. I also summarize the non-linear stages that are commonly used, and provide some philosophical insight.

The forward pass though a neural net consists of alternating stages of linear multiplication by a weight matrix and non-linear activation functions which transform the output of each linear unit independently. We can write the transformation in vector form as ${\bf z}={\bf Wx}$, and ${\bf y}=g({\bf z})$ where ${\bf x}$ is the input, ${\bf z}$ is the output of the linear stage, ${\bf y}$ is the output of the non-linear stage, and $g({\bf z})$ is the activation function which acts on each element of ${\bf z}$ independently. For subsequent stages, the input ${\bf x}$ is the output ${\bf y}$ of the previous stage.