At the moment I’m writing an integer-based library to bring neural networks to micro-controllers. This is intended to support the ARM and AVR devices. The idea here is that even though we might think of neural networks as the domain of super computers, for small scale robots we can do a lot of interesting things with smaller neural networks. For example a four layer convolutional neural network with about 18,000 parameters can process a 32×32 video frame at 8 frames per second on the ATmega328, according to code that I implemented last year.
For small networks, there can be some on-line learning, which might be useful to learn control systems with a few inputs and outputs, connecting for example IMU axes or simple sensors to servos or motors, trained with deep reinforcement learning. This is the scenario that I’m experimenting with and trying to enable for small, low power, and cheap interactive robots and toys.
For more complex processing where insufficient RAM is available to store weights, a fixed network can be stored in ROM built from weights that have been trained off line using python code.
Having trained a two layer neural network to recognize handwritten digits with reasonable accuracy, as described in my previous blog post, I wanted to see what would happen if neurons were forced to pool the outputs of pairs of rectified units according to a fixed weight schedule.
I created a network which is almost a three layer network where the output of pairs of the first layer rectified units are combined additively before being passed to the second fully connected layer. This means that the first layer has a 28×28 input and a 50 unit output (hidden layer) with rectified linear units, and then pairs of these units are averaged to reduce the neuron count to 25, and then the second fully connected layer reduces this down to 10. Finally the softmax classifier is applied.
Recently I have been experimenting with a C++ deep learning library that I have written by testing it out on the MNIST handwritten digits data set. In this dataset there are 60,000 training images and 10,000 test images which are of size 28×28 pixels. I have been trying to reproduce some of the error rates that Yann LeCun reports on the MNIST site. The digits written in many different styles and some of them are quite hard to classify, and so it makes a good test for neural net learning.
I read with interest the recent paper out of Baidu about scaling up image recognition. In it they talk about creating a supercomputer to carry out the learning phase of training a deep convolutional network. Training such things is terribly slow, with their typical example taking 212 hours on a single GPU machine because of the enormous number of weight computations that need to be evaluated and the slow stochastic gradient process over large training sets.
Baidu has built a dedicated machine with 36 servers connected by an InfiniBand switch, each server with four GPUs. In the paper they describe different ways of partitioning the problem to run on this machine. They end up being able to train the model using 32 GPUs in 8.6 hours.
In recent years the concept of deep learning has been gaining widespread attention. The media frequently reports on talent acquisitions in this field, such as those by Google and Facebook, and startups which claim to employ deep learning are met with enthusiasm. Gratuitous comparisons with the human brain are frequent. But is this just a trendy buzz word? What exactly is deep learning and how is it relevant to developments in machine intelligence?
For many researchers, deep learning is simply a continuation of the multi-decade advancement in our ability to make use of large scale neural networks. Let’s first take a quick tour of the problems that neural networks and related technologies are trying to solve, and later we will examine the deep learning architectures in greater detail.
Machine learning generally breaks down into two application areas which are closely related: classification and regression. Continue reading