This series of blog posts is intended to document some mathematical analysis that I have been doing on the bitcoin price graph and on price histories of securities in the stock market. The purpose is to understand something about the statistics of these price movements, and to learn about the behavior of the stock market in general.

One thing that is useful about bitcoin is that trading is never stopped. Because everything runs 24 hours 7 days per week, there are no artifacts to do with starting and stopping trading on specific exchanges and transitioning between financial markets. There is just a smooth time sweep of people in different countries around the globe trading when they are awake. It leads to a certain purity in the data that may make it easier to identify features of interest. For example with stock market histories, gaps in trading lead to artificially increased price changes.

I have been using mostly data from the last year of bitcoin trading, starting January 2013. It would be useful to have detailed trading data, e.g., at 15 minute intervals, for the whole period, but I have not found this yet. So for the full year, I have it every 6 hours, and for the period starting November 2013, I have it for every 30 minutes and will keep collecting it at fine intervals. Mostly I use Python and Gnuplot.

This analysis was inspired in part by observing a strong fractal component to the bitcoin price, especially during the last price surge and fall. You can see in the following annotated images that the price graph exhibits a lot of self-similarity over translation and at different scales. This is the kind of fractal behavior that is described in Mandelbrot’s book, “The Misbehavior of Markets: A Fractal View of Financial Turbulence.” In addition, much inspiration is to be derived from the site at Yale.

These graphs show a tendency for the price to go through very similar patterns of movements, and if you zoom into some regions, there appear smaller copies of similar price fluctuations. However its not at all clear if this can be used to actually predict future changes because when one looks to the future, it is not obvious which of the historical patterns is being repeated, if any.

The first thing that interested me was to establish what the basic statistics of the price changes are and to be able to relate changes between different prices. My suspicion was that one should convert the price to logarithmic units to do meaningful analysis, and not just try to use the raw price. This is commonly done in financial analysis, but I wanted to check this. Converting to log prices would be justified if price fluctuations were multiplicative and not additive, i.e., they were changing in percentage terms, so that a change from $10 to $11 was as likely as a change from $100 to $110. This is suggested by the common observation that periods of low prices have lower variance than periods of higher prices. If the price axis was expressed in log units, then the distribution of price differences over time would be constant. It is not clear if a log transform is the right one, but in nature log transforms are common regularizing transforms for natural processes.

Here is one year of the bitcoin price using linear and logarithmic axes. You can see that the size of the fluctuations (time derivatives of price) are similar at the beginning and the end of the year when price is expressed in log units.

To validate a conversion to log price, I examined the graph of time differences of log price below:

Although there is a lot of fluctuation in the variance of the changes, it is apparent that the overall range of changes in log price is similar on the left and the right of the graph, despite there being a large increase in the absolute price from just over $10 on the left, to around $1000 on the right.

To validate this some more, I divided the price range into 4 equal log bands over this year period, and for each band computed a histogram of the time derivatives of prices. In other words, starting from any particular price, there is a change, either up or down, over the next six hour interval for this data set. All starting prices were grouped into one of 4 bands, and the histogram is created of all changes that followed each price in that band. These 4 histograms are plotted below:

You can see that despite the fact that, to the eye, the log price derivatives seem over time to be quite variable, when one looks at these histograms for different price bands, they all have approximately the same shape. The width of the highest band is a little greater, meaning more variability, but this would be much more extreme if we worked directly in raw prices. In conclusion, it is quite reasonable to work in log prices, because this allows changes at low and high prices to be compared more readily. I found this same shape for different sampling time scales, and also for stock market securities listed on the NASDAQ.

Looking at bitcoin data from November 2013 onwards, with finer time steps of 30 minutes, I created a histogram of all the (log) price changes (30 minute finite differences). You can see below that this histogram is well fitted by a double exponential distribution. Actually because the price is on average rising, I had to shift the distribution very slightly to the right along the x-axis.

Often in the stock market, people assume a log Gaussian probability model, for example the Black-Scholes option pricing model. From looking at the shape of the Gaussian, it’s pretty clear that the Gaussian distribution is not correct here, because the price derivative distribution has long tails.

Actually, the exponential fit may still under-estimate low probability large price changes, but it’s hard to know exactly.

Let us also look at the price graphs for Apple and Dell stock over a 20 year period starting in 1988. Below you can see these plotted using a log price axis, and also you can see in the second graph that they too share the same approximately double exponential price change form. In fact I determined the parameters for fits of the Gaussian distribution and for the double exponential distribution, using maximum likelihood estimation, for all of the historical data for stocks in the NASDAQ, and found that all of them were more appropriately accounted for by a double exponential model, with a probability ratio of around 4.5.

Next up, I discuss what this double exponential model means, and what we can do with it.