The new European GDPR personal privacy data laws allow users to ask any company to delete all their personal data and to provide a copy on demand. Non-compliance leads to harsh penalties.
Those laws don’t make any sense (in that it is impossible to comply) for companies that are developing any kind of machine learning / neural networks / artificial intelligence that learn global models of any kind from attributes gathered from multiple users. This is why:
Lawyers expect that personal data is localized and understandable. But increasingly we are aggregating personal data into all kinds of computer models about users where that data becomes diffuse and incomprehensible.
Just think of it as someone asking you to forget they ever existed and to roll yourself back to whatever you would have been like if you had never had any contact with them, and also they want an exhaustive list of the personal neural mental data you are currently holding on them in a form that they can understand.
It’s important for users to know that, as technology is progressing, their data is being utilized in ways that cannot be undone, and that a request for the stored data is becoming impossible to fulfill. However lawyers and regulators should also understand that aggregating personal data in machine learning algorithms can be an effective form of anonymization.
To initialize neural networks it’s often desirable to generate a set of vectors which span the space. In the case of a square weights matrix this means that we want a random orthonormal basis.
The code below generates such a random basis by concatenating random Householder transforms.
Makes a square matrix which is orthonormal by concatenating
random Householder transformations
A = numpy.identity(n)
d = numpy.zeros(n)
d[n-1] = random.choice([-1.0, 1.0])
for k in range(n-2, -1, -1):
# generate random Householder transformation
x = numpy.random.randn(n-k)
s = math.sqrt((x**2).sum()) # norm(x)
sign = math.copysign(1.0, x)
s *= sign
d[k] = -sign
x += s
beta = s * x
# apply the transformation
y = numpy.dot(x,A[k:n,:]) / beta
A[k:n,:] -= numpy.outer(x,y)
# change sign of rows
A *= d.reshape(n,1)
n = 100
A = make_orthonormal_matrix(n)
# test matrix
maxdot = 0
maxlen = 0.0
for i in range(n-1):
maxlen = max(math.fabs(math.sqrt((A[i,:]**2).sum())-1.0), maxlen)
for j in range(i+1,n):
maxdot = max(math.fabs(numpy.dot(A[i,:],A[j,:])), maxdot)
print("max dot product = %g" % maxdot)
print("max vector length error = %g" % maxlen)
Another way to do this is to do a QR decomposition of a random Gaussian matrix. However the code above avoids calculating the R matrix.
I did some timing tests and it seems like the QR method is 3 times faster in python3:
from scipy.linalg import qr
n = 4
H = numpy.random.randn(n, n)
Q, R = qr(H)
At the moment I’m writing an integer-based library to bring neural networks to micro-controllers. This is intended to support the ARM and AVR devices. The idea here is that even though we might think of neural networks as the domain of super computers, for small scale robots we can do a lot of interesting things with smaller neural networks. For example a four layer convolutional neural network with about 18,000 parameters can process a 32×32 video frame at 8 frames per second on the ATmega328, according to code that I implemented last year.
For small networks, there can be some on-line learning, which might be useful to learn control systems with a few inputs and outputs, connecting for example IMU axes or simple sensors to servos or motors, trained with deep reinforcement learning. This is the scenario that I’m experimenting with and trying to enable for small, low power, and cheap interactive robots and toys.
For more complex processing where insufficient RAM is available to store weights, a fixed network can be stored in ROM built from weights that have been trained off line using python code.
When training neural networks it is a good idea to have a training set which has examples that are randomly ordered. We want to ensure that any sequence of training set examples, long or short, has statistics that are representative of the whole. During training we will be adjusting weights, often by using stochastic gradient descent, and so we ideally would like the source statistics to remain stationary.
During on-line training, such as with a robot, or when people learn, adjacent training examples are highly correlated. Visual scenes have temporal coherence and people spend a long time at specific tasks, such as playing a card game, where their visual input, over perhaps hours, is not representative of the general statistics of natural scenes. During on-line training we would expect that a neural net weights would become artificially biased by having highly correlated consecutive training examples so that the network would not be as effective at tasks requiring balanced knowledge of the whole training set.
If you are training neural networks or experimenting with natural image statistics, or even just making art, then you may want a database of natural images.
I generated an image patch database that contains 500,000 28×28 or 64×64 sized monochrome patches that were randomly sampled from 5000 representative natural images, including a mix of landscape, city, and indoor photos. I am offering them here for download from Dropbox. There are two files:
image_patches_28x28_500k_nofaces.dat (334MB compressed)
image_patches_64x64_500k_nofaces.dat (1.66GB compressed)
The first file contains 28×28 pixel patches and the second one contains 64×64 patches. The patches were sampled from a corpus of personal photographs at many different locations and uniformly in log scale. A concerted effort was made to avoid images with faces, so that these could be used as a non-face class for face detector training. However there are occasional faces that have slipped through but the frequency is less than one in one thousand. Continue reading
Having trained a two layer neural network to recognize handwritten digits with reasonable accuracy, as described in my previous blog post, I wanted to see what would happen if neurons were forced to pool the outputs of pairs of rectified units according to a fixed weight schedule.
I created a network which is almost a three layer network where the output of pairs of the first layer rectified units are combined additively before being passed to the second fully connected layer. This means that the first layer has a 28×28 input and a 50 unit output (hidden layer) with rectified linear units, and then pairs of these units are averaged to reduce the neuron count to 25, and then the second fully connected layer reduces this down to 10. Finally the softmax classifier is applied.
Recently I have been experimenting with a C++ deep learning library that I have written by testing it out on the MNIST handwritten digits data set. In this dataset there are 60,000 training images and 10,000 test images which are of size 28×28 pixels. I have been trying to reproduce some of the error rates that Yann LeCun reports on the MNIST site. The digits written in many different styles and some of them are quite hard to classify, and so it makes a good test for neural net learning.