To initialize neural networks it’s often desirable to generate a set of vectors which span the space. In the case of a square weights matrix this means that we want a random orthonormal basis.
The code below generates such a random basis by concatenating random Householder transforms.
Makes a square matrix which is orthonormal by concatenating
random Householder transformations
A = numpy.identity(n)
d = numpy.zeros(n)
d[n-1] = random.choice([-1.0, 1.0])
for k in range(n-2, -1, -1):
# generate random Householder transformation
x = numpy.random.randn(n-k)
s = math.sqrt((x**2).sum()) # norm(x)
sign = math.copysign(1.0, x)
s *= sign
d[k] = -sign
x += s
beta = s * x
# apply the transformation
y = numpy.dot(x,A[k:n,:]) / beta
A[k:n,:] -= numpy.outer(x,y)
# change sign of rows
A *= d.reshape(n,1)
n = 100
A = make_orthonormal_matrix(n)
# test matrix
maxdot = 0
maxlen = 0.0
for i in range(n-1):
maxlen = max(math.fabs(math.sqrt((A[i,:]**2).sum())-1.0), maxlen)
for j in range(i+1,n):
maxdot = max(math.fabs(numpy.dot(A[i,:],A[j,:])), maxdot)
print("max dot product = %g" % maxdot)
print("max vector length error = %g" % maxlen)
Another way to do this is to do a QR decomposition of a random Gaussian matrix. However the code above avoids calculating the R matrix.
I did some timing tests and it seems like the QR method is 3 times faster in python3:
from scipy.linalg import qr
n = 4
H = numpy.random.randn(n, n)
Q, R = qr(H)
Let’s say that we make measurements of a large group of people. Such measurements might include height, weight, IQ, blood pressure, credit score, hair length, preference, personality traits, etc. You can imagine obtaining a mass of data about people like this where each measurement is taken to lie on a continuous scale. Typically the distribution of the population along each one of these measurements will be a bell curve. Most people have average height for example. The interesting fact is that the more measurements you take, the less likely it is that you will find anyone who is simultaneously average along all the dimensions that you consider. All of us are abnormal if you consider enough personal attributes.
This brings us to the shell property of high dimensional spaces.
Let’s consider a normal (Gaussian) distribution in D-dimensions. In 1D it is obvious that all the probability bulk is in the middle, near zero. In 2D the peak is also in the middle. One might imagine that for any number of dimensions this would continue to hold, but this is false. The shell property of high dimensional spaces shows that the probability mass of a D-dimensional Gaussian distribution where D>>3 is all concentrated in a thin shell at a distance of sqrt(D) away from the origin, and the larger the value of D, the thinner that shell becomes. This is because the volume of the shell grows exponentially with D compared with the volume around the origin, and so with large D there is essentially zero probability that a point will end up near the center: Mr Average does not exist. Continue reading