When training neural networks it is a good idea to have a training set which has examples that are randomly ordered. We want to ensure that any sequence of training set examples, long or short, has statistics that are representative of the whole. During training we will be adjusting weights, often by using stochastic gradient descent, and so we ideally would like the source statistics to remain stationary.
During on-line training, such as with a robot, or when people learn, adjacent training examples are highly correlated. Visual scenes have temporal coherence and people spend a long time at specific tasks, such as playing a card game, where their visual input, over perhaps hours, is not representative of the general statistics of natural scenes. During on-line training we would expect that a neural net weights would become artificially biased by having highly correlated consecutive training examples so that the network would not be as effective at tasks requiring balanced knowledge of the whole training set.
In my last blog post I talked about trying out my code for training neural nets on a simple one-layer network which consists of a single weight layer and a softmax output. In this post I share results for training a fully connected two-layer network.
In this network, the input goes from 28×28 image pixels down to 50 hidden units. Then there is a rectified linear activation function. The second layer goes from the 50 hidden units down to 10 units, and finally there is the softmax output stage for classification.
When I train this network on the MNIST handwriting dataset I get a test error rate of 2.89% which is pretty good and actually lower than other results quoted on the MNIST web site. It is interesting to inspect the patterns of the weights for the first layer below (here I organized the weights for the 50 hidden units as a 10×5 matrix):
Recently I have been experimenting with a C++ deep learning library that I have written by testing it out on the MNIST handwritten digits data set. In this dataset there are 60,000 training images and 10,000 test images which are of size 28×28 pixels. I have been trying to reproduce some of the error rates that Yann LeCun reports on the MNIST site. The digits written in many different styles and some of them are quite hard to classify, and so it makes a good test for neural net learning.
Recent strides in artificial intelligence from big name players such as Google, Facebook, and Baidu, as well as increasingly successful heterogenous systems like IBM’s Watson have provoked fear and excitement amongst the intelligentsia in equal measures. Public figures, such as Steven Hawking, are concerned, and not surprisingly the popular press are excited to cover it. Recently, Elon Musk has become worried that AI might eventually spell doom for the human race. He donated $10 million to fund the Future of Life organization whose stated goal is to ensure AI remains beneficial and does not threaten our wellbeing. An open letter by this organization, titled “Research Priorities for Robust and Beneficial Artificial Intelligence,” was signed by hundreds of research leaders. Influential futurist, Ray Kurzweil, has popularized the idea of the technological singularity where intelligent systems surpass human capabilities and leave us marginalized at best.
I read with interest the recent paper out of Baidu about scaling up image recognition. In it they talk about creating a supercomputer to carry out the learning phase of training a deep convolutional network. Training such things is terribly slow, with their typical example taking 212 hours on a single GPU machine because of the enormous number of weight computations that need to be evaluated and the slow stochastic gradient process over large training sets.
Baidu has built a dedicated machine with 36 servers connected by an InfiniBand switch, each server with four GPUs. In the paper they describe different ways of partitioning the problem to run on this machine. They end up being able to train the model using 32 GPUs in 8.6 hours.
In recent years the concept of deep learning has been gaining widespread attention. The media frequently reports on talent acquisitions in this field, such as those by Google and Facebook, and startups which claim to employ deep learning are met with enthusiasm. Gratuitous comparisons with the human brain are frequent. But is this just a trendy buzz word? What exactly is deep learning and how is it relevant to developments in machine intelligence?
For many researchers, deep learning is simply a continuation of the multi-decade advancement in our ability to make use of large scale neural networks. Let’s first take a quick tour of the problems that neural networks and related technologies are trying to solve, and later we will examine the deep learning architectures in greater detail.
Machine learning generally breaks down into two application areas which are closely related: classification and regression. Continue reading
Recently NASA has been in the news with the successful launch and recovery of the Orion space craft. This was a four hour two orbit test of the new capsule that is intended to support future manned missions beyond the Earth. In addition, there has been a huge growth of the space industry in the last decade including commercial ventures such as Space X and Blue Origin, as well as proposals to mine the asteroids. There is always a tremendous interest in sending people to space, and in fact it seems to be an imperative for the human race to escape potential future disaster scenarios on the Earth by seeking solace among the stars.
I’m doing some simple exploration of image statistics on a large database of natural images. The first thing that I tried was computing the histogram of neighboring image pixel intensity differences. Here is the graph for that using a log y axis, for a few pixel separations.
It is clear that large differences occur much more rarely and that the most probable pixel to pixel spatial change in intensity is zero. However the tails are heavy, so it is nothing like a Gaussian distribution. The adjacent pixel intensity difference log probabilities were fairly well fitted by a function that goes like , and the pixels further apart require a smaller exponent.
Can we scan, store and simulate the human brain, and with it bring to life all our memories, experiences, emotions, and personality? This is a big philosophical question, but here I assume for the moment that this is a problem with only technical challenges, and not ones that relate to mystical aspects of being.
The human brain contains around 75 billion neurons. If each one has at most 100,000 inputs, then we could store information about the sources of all Continue reading