Mathematics struggles to describe physics

TLDR mathematical representations of physical things are not the things themselves – they are insufficiently abstract, and may introduce nonsense which needs to be “discovered” and fixed later.
 
It occurs to me (and is often done in various ways) that one should be able to write all physical laws as an abstract function that defines the “physical system” in question, that when passed through a (possibly non linear) abstract functional (function of a function, or “operator”) gives the result equal to zero.
 
Then later we can argue about what basis is best to use for a particular application to represent the function and functional, knowing that choosing bases and origins may introduce fake degrees of freedom, where we then have to say the answer is such and such times some generator of a bunch of group invariance nonsense.
 
For example in quantum physics we might begin by saying a single particle has the wave function Ψ. As soon as we say its Ψ(x,y,z,t) we are already on a loser because we fixed four bases and four origins which have to be “unfixed” later. And for a start that means it’s not Lorenz invariant.
 
But even before then we are still assuming non-physical things by saying Ψ is a complex number. Because actually it should be normalized over the interval of interest in order to give correct probability values – so it’s in a projective space; and also there should be no absolute phase; or if gauge invariance applies, then we should’t be fixing local phase either by assuming the 12 o’clock phase position of one location is the same as every other – especially in the context of space time curvature.
 
The above discussion show that the usual assumptions about Ψ introduce at least two and possibly an infinite number of spurious mathematical degrees of freedom in the representation of reality.
 
General relativity while a wonder of beauty, is also terrible, in that it only fixes the second derivative of the metric, and the Ricci tensor is a reduction of the Riemann curvature tensor, so any solution that represents a particular space time is just one of an infinite family of equivalent solutions which also satisfy the same equations and describe the same physics, even if you stick to one coordinate system.
 
If the math hadn’t introduced non-physical degrees of freedom then Higgs wouldn’t have had to discover/introduce the Higgs field and boson because it would have already been present in the solutions.
 
I could jokingly claim that the history of physics is a history of people not realizing they are assuming extra degrees of freedom in their equations, and making great discoveries about physics later, which are actually in fact discoveries about math.

How to Make a Random Orthonormal Matrix

To initialize neural networks it’s often desirable to generate a set of vectors which span the space. In the case of a square weights matrix this means that we want a random orthonormal basis.

The code below generates such a random basis by concatenating random Householder transforms.


import numpy
import random
import math

def make_orthonormal_matrix(n):
	"""
	Makes a square matrix which is orthonormal by concatenating
	random Householder transformations
	"""
	A = numpy.identity(n)
	d = numpy.zeros(n)
	d[n-1] = random.choice([-1.0, 1.0])
	for k in range(n-2, -1, -1):
		# generate random Householder transformation
		x = numpy.random.randn(n-k)
		s = math.sqrt((x**2).sum()) # norm(x)
		sign = math.copysign(1.0, x[0])
		s *= sign
		d[k] = -sign
		x[0] += s
		beta = s * x[0]
		# apply the transformation
		y = numpy.dot(x,A[k:n,:]) / beta
		A[k:n,:] -= numpy.outer(x,y)
	# change sign of rows
	A *= d.reshape(n,1)
	return A

n = 100
A = make_orthonormal_matrix(n)

# test matrix
maxdot = 0
maxlen = 0.0
for i in range(n-1):
	maxlen = max(math.fabs(math.sqrt((A[i,:]**2).sum())-1.0), maxlen)
	for j in range(i+1,n):
		maxdot = max(math.fabs(numpy.dot(A[i,:],A[j,:])), maxdot)
print("max dot product = %g" % maxdot)
print("max vector length error = %g" % maxlen)

Another way to do this is to do a QR decomposition of a random Gaussian matrix. However the code above avoids calculating the R matrix.

Postscript:

I did some timing tests and it seems like the QR method is 3 times faster in python3:

import numpy
from scipy.linalg import qr

n = 4
H = numpy.random.randn(n, n)
Q, R = qr(H)
print(Q)

Mr Average Does Not Exist

Let’s say that we make measurements of a large group of people. Such measurements might include height, weight, IQ, blood pressure, credit score, hair length, preference, personality traits, etc. You can imagine obtaining a mass of data about people like this where each measurement is taken to lie on a continuous scale. Typically the distribution of the population along each one of these measurements will be a bell curve. Most people have average height for example. The interesting fact is that the more measurements you take, the less likely it is that you will find anyone who is simultaneously average along all the dimensions that you consider. All of us are abnormal if you consider enough personal attributes.

This brings us to the shell property of high dimensional spaces.

Let’s consider a normal (Gaussian) distribution in D-dimensions. In 1D it is obvious that all the probability bulk is in the middle, near zero. In 2D the peak is also in the middle. One might imagine that for any number of dimensions this would continue to hold, but this is false. The shell property of high dimensional spaces shows that the probability mass of a D-dimensional Gaussian distribution where D>>3 is all concentrated in a thin shell at a distance of sqrt(D) away from the origin, and the larger the value of D, the thinner that shell becomes. This is because the volume of the shell grows exponentially with D compared with the volume around the origin, and so with large D there is essentially zero probability that a point will end up near the center: Mr Average does not exist. Continue reading

Natural Image Statistics

I’m doing some simple exploration of image statistics on a large database of natural images. The first thing that I tried was computing the histogram of neighboring image pixel intensity differences. Here is the graph for that using a log y axis, for a few pixel separations.

Pixel differences histogramIt is clear that large differences occur much more rarely and that the most probable pixel to pixel spatial change in intensity is zero. However the tails are heavy, so it is nothing like a Gaussian distribution. The adjacent pixel intensity difference log probabilities were fairly well fitted by a function that goes like -|kx|^{0.5} , and the pixels further apart require a smaller exponent.
Continue reading

Analyzing the Market – Part 3

BitcoinThis is part 3 of my series of posts on the statistics of financial markets. Part 1 is here.

In previous posts, I have found that working in log prices makes sense and that the double exponential distribution is a good fit to price change data. In this post, I will look at correlations over time in price changes.

Let’s ask a simple question: Does yesterday’s price change predict today’s price change? Continue reading

Analyzing the Market – Part 2

BitcoinThis is part 2 of my series of posts on the statistics of financial markets. Part 1 is here.

I have established that a double exponential distribution fits price movements when they are converted to log prices, at least for bitcoin, Apple, and Dell. (Actually I have checked it on a few other NASDAQ stocks too.)

Once we have a statistical model, we can generate some data to see if it produces results that look like the actual price graph. Below you can see the real 2 month bitcoin price graph, together with two graphs that were obtained by using a model based on the Continue reading

Analyzing the Market – Part 1

BitcoinThis series of blog posts is intended to document some mathematical analysis that I have been doing on the bitcoin price graph and on price histories of securities in the stock market. The purpose is to understand something about the statistics of these price movements, and to learn about the behavior of the stock market in general.

One thing that is useful about bitcoin is that trading is never stopped. Because everything runs 24 hours 7 days per week, there are no artifacts to do with starting and stopping trading on specific exchanges and transitioning between financial Continue reading

Regression and Conspiracy Theories

RegressionThis post is about fitting a curve through a set of points. This is called regression: It is also the classic problem of coming up with a generalization from a discrete training data set in machine learning. We have a set of points that are observations at specific places and we want to make a system that predicts what the likely observations should be at all places within the domain of interest. We use the given observations (a training set) to train a model and then when we get more observations (a test set) we can evaluate how much error there is in our Continue reading

Developments in Fractal Art

Fractal Art

Recently, I was looking through the art web site of Sven Geier. He has a lot of fractal images that he has been creating since 2001. Many of the images are quite beautiful, particularly the recent ones if you scroll down to 2011.

What is interesting though is to see the progression as the power of home computers has increased over the last decade. Fractal art around 2001 was mostly 2D with bold colors, lower resolutions, and fairly raw in that the images come with little post processing. They tend to make use of complex number sequence sets such as Mandelbrot with familiar fractal spirals. Then around 2003 the fractal flame algorithm really became popular Continue reading