How to Make a Random Orthonormal Matrix

To initialize neural networks it’s often desirable to generate a set of vectors which span the space. In the case of a square weights matrix this means that we want a random orthonormal basis.

The code below generates such a random basis by concatenating random Householder transforms.


import numpy
import random
import math

def make_orthonormal_matrix(n):
	"""
	Makes a square matrix which is orthonormal by concatenating
	random Householder transformations
	"""
	A = numpy.identity(n)
	d = numpy.zeros(n)
	d[n-1] = random.choice([-1.0, 1.0])
	for k in range(n-2, -1, -1):
		# generate random Householder transformation
		x = numpy.random.randn(n-k)
		s = math.sqrt((x**2).sum()) # norm(x)
		sign = math.copysign(1.0, x[0])
		s *= sign
		d[k] = -sign
		x[0] += s
		beta = s * x[0]
		# apply the transformation
		y = numpy.dot(x,A[k:n,:]) / beta
		A[k:n,:] -= numpy.outer(x,y)
	# change sign of rows
	A *= d.reshape(n,1)
	return A

n = 100
A = make_orthonormal_matrix(n)

# test matrix
maxdot = 0
maxlen = 0.0
for i in range(n-1):
	maxlen = max(math.fabs(math.sqrt((A[i,:]**2).sum())-1.0), maxlen)
	for j in range(i+1,n):
		maxdot = max(math.fabs(numpy.dot(A[i,:],A[j,:])), maxdot)
print("max dot product = %g" % maxdot)
print("max vector length error = %g" % maxlen)

Another way to do this is to do a QR decomposition of a random Gaussian matrix. However the code above avoids calculating the R matrix.

Postscript:

I did some timing tests and it seems like the QR method is 3 times faster in python3:

import numpy
from scipy.linalg import qr

n = 4
H = numpy.random.randn(n, n)
Q, R = qr(H)
print(Q)

Mr Average Does Not Exist

Let’s say that we make measurements of a large group of people. Such measurements might include height, weight, IQ, blood pressure, credit score, hair length, preference, personality traits, etc. You can imagine obtaining a mass of data about people like this where each measurement is taken to lie on a continuous scale. Typically the distribution of the population along each one of these measurements will be a bell curve. Most people have average height for example. The interesting fact is that the more measurements you take, the less likely it is that you will find anyone who is simultaneously average along all the dimensions that you consider. All of us are abnormal if you consider enough personal attributes.

This brings us to the shell property of high dimensional spaces.

Let’s consider a normal (Gaussian) distribution in D-dimensions. In 1D it is obvious that all the probability bulk is in the middle, near zero. In 2D the peak is also in the middle. One might imagine that for any number of dimensions this would continue to hold, but this is false. The shell property of high dimensional spaces shows that the probability mass of a D-dimensional Gaussian distribution where D>>3 is all concentrated in a thin shell at a distance of sqrt(D) away from the origin, and the larger the value of D, the thinner that shell becomes. This is because the volume of the shell grows exponentially with D compared with the volume around the origin, and so with large D there is essentially zero probability that a point will end up near the center: Mr Average does not exist. Continue reading

Deep Neural Nets for Micro-controllers

mind machineAt the moment I’m writing an integer-based library to bring neural networks to micro-controllers. This is intended to support the ARM and AVR devices. The idea here is that even though we might think of neural networks as the domain of super computers, for small scale robots we can do a lot of interesting things with smaller neural networks. For example a four layer convolutional neural network with about 18,000 parameters can process a 32×32 video frame at 8 frames per second on the ATmega328, according to code that I implemented last year.

For small networks, there can be some on-line learning, which might be useful to learn control systems with a few inputs and outputs, connecting for example IMU axes or simple sensors to servos or motors, trained with deep reinforcement learning. This is the scenario that I’m experimenting with and trying to enable for small, low power, and cheap interactive robots and toys.

For more complex processing where insufficient RAM is available to store weights, a fixed network can be stored in ROM built from weights that have been trained off line using python code.

Anyway watch this space because I’m currently working on this library and intend to make it open source through my company Impressive Machines.

Nixie display module is now available

Nixie Module

I have worked hard to bring a production version of my Nixie display controller to market. You can now actually order these units from my Etsy store here

Features

  • High quality gold plated surface mount PCB
  • Four digit Nixie display; product includes tubes.
  • RGB LED back-lighting on each tube independently programmable to generate multiple colors
  • The colon indicator can also be turned on and off
  • Modules can be stacked next to each other for more digits
  • Runs from 9-12V, with on-board 180V power supply
  • Easily controlled by a serial line from the Arduino or any micro-controller or laptop to display any digits
  • The board can also function as a stand-alone voltmeter
  • Based on the familiar ATMega328
  • Comes pre-programmed with open source display software
  • Easily customized via the ISP port using standard tools
  • Most spare micro-controller pins are accessible at the connector
  • Based on plug-in IN-4 Nixies which are easily replaced
  • Schematics and code are available for easy hacking

Nixie display module

I recently finished my design for a Nixie display module. This has four digits that can be controlled from a serial link, or alternatively it can act as a voltmeter. Also the colon and backlighting can be controlled to give different effects and colors. Its useful for a variety of maker projects and I am about to manufacture a quantity of them, so let me know if you’d like to be an early adopter.

Sign up here to keep up to date.

High quality streamable free-viewpoint video

free viewpointMicrosoft just recently presented the paper “High quality streamable free-viewpoint video” at SIGGRAPH. In this presentation, they are capturing live 3D views of actors on a stage using multiple cameras and using computer vision to construct detailed texture mapped mesh models which are then compressed for live viewing. On the viewer you have the freedom to move around the model in 3D.

I contributed to this project for a year or so when I was employed at Microsoft, working on 3D reconstruction from multiple infra-red camera views, so it was nice to get an acknowledgment. Some of this work was inspired by our earlier work at Microsoft Research which I co-presented at SIGGRAPH in 2004.

It’s very nice to see how far they have progressed with this project and to see the possible links that it can have with the Hololens virtual reality system.

 

A role for sleep and dreaming in neural networks

dreamingWhen training neural networks it is a good idea to have a training set which has examples that are randomly ordered. We want to ensure that any sequence of training set examples, long or short, has statistics that are representative of the whole. During training we will be adjusting weights, often by using stochastic gradient descent, and so we ideally would like the source statistics to remain stationary.

During on-line training, such as with a robot, or when people learn, adjacent training examples are highly correlated. Visual scenes have temporal coherence and people spend a long time at specific tasks, such as playing a card game, where their visual input, over perhaps hours, is not representative of the general statistics of natural scenes. During on-line training we would expect that a neural net weights would become artificially biased by having highly correlated consecutive training examples so that the network would not be as effective at tasks requiring balanced knowledge of the whole training set.
Continue reading

Natural image patch database

patchesIf you are training neural networks or experimenting with natural image statistics, or even just making art, then you may want a database of natural images.

I generated an image patch database that contains 500,000 28×28 or 64×64 sized monochrome patches that were randomly sampled from 5000 representative natural images, including a mix of landscape, city, and indoor photos. I am offering them here for download from Dropbox. There are two files:

image_patches_28x28_500k_nofaces.dat (334MB compressed)
image_patches_64x64_500k_nofaces.dat (1.66GB compressed)

The first file contains 28×28 pixel patches and the second one contains 64×64 patches. The patches were sampled from a corpus of personal photographs at many different locations and uniformly in log scale. A concerted effort was made to avoid images with faces, so that these could be used as a non-face class for face detector training. However there are occasional faces that have slipped through but the frequency is less than one in one thousand.  Continue reading

Understanding back-propogation

Understanding the back-propagation algorithm for training neural networks can sometimes be challenging, because often there is a lot of confusing terminology which varies between sources. Also it is commonly described just in terms of the mathematics. Here I present a diagrammatic explanation of back-propagation for the visually inclined. I also summarize the non-linear stages that are commonly used, and provide some philosophical insight.

The forward pass though a neural net consists of alternating stages of linear multiplication by a weight matrix and non-linear activation functions which transform the output of each linear unit independently. We can write the transformation in vector form as {\bf z}={\bf Wx}, and {\bf y}=g({\bf z}) where {\bf x} is the input, {\bf z} is the output of the linear stage, {\bf y} is the output of the non-linear stage, and g({\bf z}) is the activation function which acts on each element of {\bf z} independently. For subsequent stages, the input {\bf x} is the output {\bf y} of the previous stage.
Continue reading

Energy pooling in neural networks for digit recognition

NeuronsHaving trained a two layer neural network to recognize handwritten digits with reasonable accuracy, as described in my previous blog post, I wanted to see what would happen if neurons were forced to pool the outputs of pairs of rectified units according to a fixed weight schedule.

I created a network which is almost a three layer network where the output of pairs of the first layer rectified units are combined additively before being passed to the second fully connected layer. This means that the first layer has a 28×28 input and a 50 unit output (hidden layer) with rectified linear units, and then pairs of these units are averaged to reduce the neuron count to 25, and then the second fully connected layer reduces this down to 10. Finally the softmax classifier is applied.
Continue reading