At the moment I’m writing an integer-based library to bring neural networks to micro-controllers. This is intended to support the ARM and AVR devices. The idea here is that even though we might think of neural networks as the domain of super computers, for small scale robots we can do a lot of interesting things with smaller neural networks. For example a four layer convolutional neural network with about 18,000 parameters can process a 32×32 video frame at 8 frames per second on the ATmega328, according to code that I implemented last year.
For small networks, there can be some on-line learning, which might be useful to learn control systems with a few inputs and outputs, connecting for example IMU axes or simple sensors to servos or motors, trained with deep reinforcement learning. This is the scenario that I’m experimenting with and trying to enable for small, low power, and cheap interactive robots and toys.
For more complex processing where insufficient RAM is available to store weights, a fixed network can be stored in ROM built from weights that have been trained off line using python code.
Anyway watch this space because I’m currently working on this library and intend to make it open source through my company Impressive Machines.
Microsoft just recently presented the paper “High quality streamable free-viewpoint video” at SIGGRAPH. In this presentation, they are capturing live 3D views of actors on a stage using multiple cameras and using computer vision to construct detailed texture mapped mesh models which are then compressed for live viewing. On the viewer you have the freedom to move around the model in 3D.
I contributed to this project for a year or so when I was employed at Microsoft, working on 3D reconstruction from multiple infra-red camera views, so it was nice to get an acknowledgment. Some of this work was inspired by our earlier work at Microsoft Research which I co-presented at SIGGRAPH in 2004.
It’s very nice to see how far they have progressed with this project and to see the possible links that it can have with the Hololens virtual reality system.
Having trained a two layer neural network to recognize handwritten digits with reasonable accuracy, as described in my previous blog post, I wanted to see what would happen if neurons were forced to pool the outputs of pairs of rectified units according to a fixed weight schedule.
I created a network which is almost a three layer network where the output of pairs of the first layer rectified units are combined additively before being passed to the second fully connected layer. This means that the first layer has a 28×28 input and a 50 unit output (hidden layer) with rectified linear units, and then pairs of these units are averaged to reduce the neuron count to 25, and then the second fully connected layer reduces this down to 10. Finally the softmax classifier is applied.
In my last blog post I talked about trying out my code for training neural nets on a simple one-layer network which consists of a single weight layer and a softmax output. In this post I share results for training a fully connected two-layer network.
In this network, the input goes from 28×28 image pixels down to 50 hidden units. Then there is a rectified linear activation function. The second layer goes from the 50 hidden units down to 10 units, and finally there is the softmax output stage for classification.
When I train this network on the MNIST handwriting dataset I get a test error rate of 2.89% which is pretty good and actually lower than other results quoted on the MNIST web site. It is interesting to inspect the patterns of the weights for the first layer below (here I organized the weights for the 50 hidden units as a 10×5 matrix):
Recently I have been experimenting with a C++ deep learning library that I have written by testing it out on the MNIST handwritten digits data set. In this dataset there are 60,000 training images and 10,000 test images which are of size 28×28 pixels. I have been trying to reproduce some of the error rates that Yann LeCun reports on the MNIST site. The digits written in many different styles and some of them are quite hard to classify, and so it makes a good test for neural net learning.
When deriving sensory data from IMU chips it is always an issue that the gain and offset of the readings is not known, and varies from chip to chip. I have written a short Python script which uses a least squares fit to calibrate these devices. All you need to do is capture a set of XYZ readings while moving the device through different orientations, and put the readings in a text file. You can get this script from my github.
I am passionate about machine learning, intelligence, and robotics. I have a number of robot projects on the go. I wanted to build a platform that would allow me to do a lot of complex experiments on sensor fusion and creating intelligent emergent behaviors. I needed to make a robot that has quite a number of sensor inputs, but not so many that it would overload the processing capability to do anything useful. I decided to make a simple two-wheeled robotic platform that has a lot of flexibility and load it up with appropriate sensors.
One of the aspects of my robotics philosophy is that information from simple sensors can be highly informative and that current robot designs jump too quickly to complex high bandwidth data sources and they then do a marginal job of interpreting the information from those sources in software. I am inspired by insects and other small creatures that seem to have small numbers of sensors, for example eyes with only a few photoreceptors, but still have very complex adaptive behaviors which are often leagues beyond what we can do with today’s machines. Part of this is due to the efficiency with which they extract every little bit of useful information out of the sensory data, including correlations we would never think of. I am interested in applying experience gained from machine learning in order to extract from sensors information that could not easily be determined by using hand coded algorithms.
My rolling robot has two wheels and these have wheel encoders to give a feedback of position or wheel rotation speed. It also has an infra red range finder that can indicate the Continue reading
I’m struggling with a health issue at the moment so I’m doing some some small projects to stay sane…
I’ve been helping a friend fix old pinball games which typically make use of 8-bit micros like the 6800 or 6502. Often we want to know what’s on these old ROM chips that even some modern device readers can’t easily scan. I built a shield for Arduino that can read them by listing the file over the serial link. The only components other than the Arduino Uno and a prototyping shield were a couple of 74HCT573s and a 24-pin socket.
Recent strides in artificial intelligence from big name players such as Google, Facebook, and Baidu, as well as increasingly successful heterogenous systems like IBM’s Watson have provoked fear and excitement amongst the intelligentsia in equal measures. Public figures, such as Steven Hawking, are concerned, and not surprisingly the popular press are excited to cover it. Recently, Elon Musk has become worried that AI might eventually spell doom for the human race. He donated $10 million to fund the Future of Life organization whose stated goal is to ensure AI remains beneficial and does not threaten our wellbeing. An open letter by this organization, titled “Research Priorities for Robust and Beneficial Artificial Intelligence,” was signed by hundreds of research leaders. Influential futurist, Ray Kurzweil, has popularized the idea of the technological singularity where intelligent systems surpass human capabilities and leave us marginalized at best.
I read with interest the recent paper out of Baidu about scaling up image recognition. In it they talk about creating a supercomputer to carry out the learning phase of training a deep convolutional network. Training such things is terribly slow, with their typical example taking 212 hours on a single GPU machine because of the enormous number of weight computations that need to be evaluated and the slow stochastic gradient process over large training sets.
Baidu has built a dedicated machine with 36 servers connected by an InfiniBand switch, each server with four GPUs. In the paper they describe different ways of partitioning the problem to run on this machine. They end up being able to train the model using 32 GPUs in 8.6 hours.