To initialize neural networks it’s often desirable to generate a set of vectors which span the space. In the case of a square weights matrix this means that we want a random orthonormal basis.
The code below generates such a random basis by concatenating random Householder transforms.
Makes a square matrix which is orthonormal by concatenating
random Householder transformations
A = numpy.identity(n)
d = numpy.zeros(n)
d[n-1] = random.choice([-1.0, 1.0])
for k in range(n-2, -1, -1):
# generate random Householder transformation
x = numpy.random.randn(n-k)
s = math.sqrt((x**2).sum()) # norm(x)
sign = math.copysign(1.0, x)
s *= sign
d[k] = -sign
x += s
beta = s * x
# apply the transformation
y = numpy.dot(x,A[k:n,:]) / beta
A[k:n,:] -= numpy.outer(x,y)
# change sign of rows
A *= d.reshape(n,1)
n = 100
A = make_orthonormal_matrix(n)
# test matrix
maxdot = 0
maxlen = 0.0
for i in range(n-1):
maxlen = max(math.fabs(math.sqrt((A[i,:]**2).sum())-1.0), maxlen)
for j in range(i+1,n):
maxdot = max(math.fabs(numpy.dot(A[i,:],A[j,:])), maxdot)
print("max dot product = %g" % maxdot)
print("max vector length error = %g" % maxlen)
Another way to do this is to do a QR decomposition of a random Gaussian matrix. However the code above avoids calculating the R matrix.
I did some timing tests and it seems like the QR method is 3 times faster in python3:
from scipy.linalg import qr
n = 4
H = numpy.random.randn(n, n)
Q, R = qr(H)
Let’s say that we make measurements of a large group of people. Such measurements might include height, weight, IQ, blood pressure, credit score, hair length, preference, personality traits, etc. You can imagine obtaining a mass of data about people like this where each measurement is taken to lie on a continuous scale. Typically the distribution of the population along each one of these measurements will be a bell curve. Most people have average height for example. The interesting fact is that the more measurements you take, the less likely it is that you will find anyone who is simultaneously average along all the dimensions that you consider. All of us are abnormal if you consider enough personal attributes.
This brings us to the shell property of high dimensional spaces.
Let’s consider a normal (Gaussian) distribution in D-dimensions. In 1D it is obvious that all the probability bulk is in the middle, near zero. In 2D the peak is also in the middle. One might imagine that for any number of dimensions this would continue to hold, but this is false. The shell property of high dimensional spaces shows that the probability mass of a D-dimensional Gaussian distribution where D>>3 is all concentrated in a thin shell at a distance of sqrt(D) away from the origin, and the larger the value of D, the thinner that shell becomes. This is because the volume of the shell grows exponentially with D compared with the volume around the origin, and so with large D there is essentially zero probability that a point will end up near the center: Mr Average does not exist. Continue reading
I’m doing some simple exploration of image statistics on a large database of natural images. The first thing that I tried was computing the histogram of neighboring image pixel intensity differences. Here is the graph for that using a log y axis, for a few pixel separations.
It is clear that large differences occur much more rarely and that the most probable pixel to pixel spatial change in intensity is zero. However the tails are heavy, so it is nothing like a Gaussian distribution. The adjacent pixel intensity difference log probabilities were fairly well fitted by a function that goes like , and the pixels further apart require a smaller exponent.
Make magazine published an article about my relay calculating machine project. Click on the picture for a high-res version. You can watch a video on the project page here.
This is part 3 of my series of posts on the statistics of financial markets. Part 1 is here.
In previous posts, I have found that working in log prices makes sense and that the double exponential distribution is a good fit to price change data. In this post, I will look at correlations over time in price changes.
Let’s ask a simple question: Does yesterday’s price change predict today’s price change? Continue reading
This is part 2 of my series of posts on the statistics of financial markets. Part 1 is here.
I have established that a double exponential distribution fits price movements when they are converted to log prices, at least for bitcoin, Apple, and Dell. (Actually I have checked it on a few other NASDAQ stocks too.)
Once we have a statistical model, we can generate some data to see if it produces results that look like the actual price graph. Below you can see the real 2 month bitcoin price graph, together with two graphs that were obtained by using a model based on the Continue reading
This series of blog posts is intended to document some mathematical analysis that I have been doing on the bitcoin price graph and on price histories of securities in the stock market. The purpose is to understand something about the statistics of these price movements, and to learn about the behavior of the stock market in general.
One thing that is useful about bitcoin is that trading is never stopped. Because everything runs 24 hours 7 days per week, there are no artifacts to do with starting and stopping trading on specific exchanges and transitioning between financial Continue reading
This post is about fitting a curve through a set of points. This is called regression: It is also the classic problem of coming up with a generalization from a discrete training data set in machine learning. We have a set of points that are observations at specific places and we want to make a system that predicts what the likely observations should be at all places within the domain of interest. We use the given observations (a training set) to train a model and then when we get more observations (a test set) we can evaluate how much error there is in our Continue reading
Recently, I was looking through the art web site of Sven Geier. He has a lot of fractal images that he has been creating since 2001. Many of the images are quite beautiful, particularly the recent ones if you scroll down to 2011.
What is interesting though is to see the progression as the power of home computers has increased over the last decade. Fractal art around 2001 was mostly 2D with bold colors, lower resolutions, and fairly raw in that the images come with little post processing. They tend to make use of complex number sequence sets such as Mandelbrot with familiar fractal spirals. Then around 2003 the fractal flame algorithm really became popular Continue reading
A formal system in mathematics is a system which contains a set of axioms (unquestionable statements of truth), and then adds to these a set of production rules by which new true statements can be generated. The process of production can go on forever allowing a never-ending list of truths to be derived.
A famous result called Gödel’s incompleteness theorem shows that in such a system, there are some true statements that are nevertheless un-provable (undecidable) using the rules of the system. It shows that a self-consistent Continue reading