How to Make a Random Orthonormal Matrix

To initialize neural networks it’s often desirable to generate a set of vectors which span the space. In the case of a square weights matrix this means that we want a random orthonormal basis.

The code below generates such a random basis by concatenating random Householder transforms.

import numpy
import random
import math

def make_orthonormal_matrix(n):
	Makes a square matrix which is orthonormal by concatenating
	random Householder transformations
	A = numpy.identity(n)
	d = numpy.zeros(n)
	d[n-1] = random.choice([-1.0, 1.0])
	for k in range(n-2, -1, -1):
		# generate random Householder transformation
		x = numpy.random.randn(n-k)
		s = math.sqrt((x**2).sum()) # norm(x)
		sign = math.copysign(1.0, x[0])
		s *= sign
		d[k] = -sign
		x[0] += s
		beta = s * x[0]
		# apply the transformation
		y =,A[k:n,:]) / beta
		A[k:n,:] -= numpy.outer(x,y)
	# change sign of rows
	A *= d.reshape(n,1)
	return A

n = 100
A = make_orthonormal_matrix(n)

# test matrix
maxdot = 0
maxlen = 0.0
for i in range(n-1):
	maxlen = max(math.fabs(math.sqrt((A[i,:]**2).sum())-1.0), maxlen)
	for j in range(i+1,n):
		maxdot = max(math.fabs([i,:],A[j,:])), maxdot)
print("max dot product = %g" % maxdot)
print("max vector length error = %g" % maxlen)

Another way to do this is to do a QR decomposition of a random Gaussian matrix. However the code above avoids calculating the R matrix.


I did some timing tests and it seems like the QR method is 3 times faster in python3:

import numpy
from scipy.linalg import qr

n = 4
H = numpy.random.randn(n, n)
Q, R = qr(H)

Mr Average Does Not Exist

Let’s say that we make measurements of a large group of people. Such measurements might include height, weight, IQ, blood pressure, credit score, hair length, preference, personality traits, etc. You can imagine obtaining a mass of data about people like this where each measurement is taken to lie on a continuous scale. Typically the distribution of the population along each one of these measurements will be a bell curve. Most people have average height for example. The interesting fact is that the more measurements you take, the less likely it is that you will find anyone who is simultaneously average along all the dimensions that you consider. All of us are abnormal if you consider enough personal attributes.

This brings us to the shell property of high dimensional spaces.

Let’s consider a normal (Gaussian) distribution in D-dimensions. In 1D it is obvious that all the probability bulk is in the middle, near zero. In 2D the peak is also in the middle. One might imagine that for any number of dimensions this would continue to hold, but this is false. The shell property of high dimensional spaces shows that the probability mass of a D-dimensional Gaussian distribution where D>>3 is all concentrated in a thin shell at a distance of sqrt(D) away from the origin, and the larger the value of D, the thinner that shell becomes. This is because the volume of the shell grows exponentially with D compared with the volume around the origin, and so with large D there is essentially zero probability that a point will end up near the center: Mr Average does not exist. Continue reading

Natural Image Statistics

I’m doing some simple exploration of image statistics on a large database of natural images. The first thing that I tried was computing the histogram of neighboring image pixel intensity differences. Here is the graph for that using a log y axis, for a few pixel separations.

Pixel differences histogramIt is clear that large differences occur much more rarely and that the most probable pixel to pixel spatial change in intensity is zero. However the tails are heavy, so it is nothing like a Gaussian distribution. The adjacent pixel intensity difference log probabilities were fairly well fitted by a function that goes like -|kx|^{0.5} , and the pixels further apart require a smaller exponent.
Continue reading

Analyzing the Market – Part 3

BitcoinThis is part 3 of my series of posts on the statistics of financial markets. Part 1 is here.

In previous posts, I have found that working in log prices makes sense and that the double exponential distribution is a good fit to price change data. In this post, I will look at correlations over time in price changes.

Let’s ask a simple question: Does yesterday’s price change predict today’s price change? Continue reading

Analyzing the Market – Part 2

BitcoinThis is part 2 of my series of posts on the statistics of financial markets. Part 1 is here.

I have established that a double exponential distribution fits price movements when they are converted to log prices, at least for bitcoin, Apple, and Dell. (Actually I have checked it on a few other NASDAQ stocks too.)

Once we have a statistical model, we can generate some data to see if it produces results that look like the actual price graph. Below you can see the real 2 month bitcoin price graph, together with two graphs that were obtained by using a model based on the Continue reading

Analyzing the Market – Part 1

BitcoinThis series of blog posts is intended to document some mathematical analysis that I have been doing on the bitcoin price graph and on price histories of securities in the stock market. The purpose is to understand something about the statistics of these price movements, and to learn about the behavior of the stock market in general.

One thing that is useful about bitcoin is that trading is never stopped. Because everything runs 24 hours 7 days per week, there are no artifacts to do with starting and stopping trading on specific exchanges and transitioning between financial Continue reading

Regression and Conspiracy Theories

RegressionThis post is about fitting a curve through a set of points. This is called regression: It is also the classic problem of coming up with a generalization from a discrete training data set in machine learning. We have a set of points that are observations at specific places and we want to make a system that predicts what the likely observations should be at all places within the domain of interest. We use the given observations (a training set) to train a model and then when we get more observations (a test set) we can evaluate how much error there is in our Continue reading

Developments in Fractal Art

Fractal Art

Recently, I was looking through the art web site of Sven Geier. He has a lot of fractal images that he has been creating since 2001. Many of the images are quite beautiful, particularly the recent ones if you scroll down to 2011.

What is interesting though is to see the progression as the power of home computers has increased over the last decade. Fractal art around 2001 was mostly 2D with bold colors, lower resolutions, and fairly raw in that the images come with little post processing. They tend to make use of complex number sequence sets such as Mandelbrot with familiar fractal spirals. Then around 2003 the fractal flame algorithm really became popular Continue reading

Formal Indecision


A formal system in mathematics is a system which contains a set of axioms (unquestionable statements of truth), and then adds to these a set of production rules by which new true statements can be generated. The process of production can go on forever allowing a never-ending list of truths to be derived.

A famous result called Gödel’s incompleteness theorem shows that in such a system, there are some true statements that are nevertheless un-provable (undecidable) using the rules of the system. It shows that a self-consistent Continue reading