 # Testing Time Series for Cointegration

Cointegration is an important concept when dealing with time series data. Here’s the corresponding definition on Wikipedia:

Cointegration is a statistical property of time series variables. Two or more time series are cointegrated if they share a common stochastic drift.

In other (rather non-scientific) words, if both time series are non-stationary and they share a trend together (which can be explained through the existence of a common cause), then they are cointegrated.

Cointegration is not the same as correlation!

Correlation measures the co-movement between two time series, it answers the question: How much do they move together? It does not guarantee that the two measures stay close to each other in the long-run.

Cointegration means that two time series will not deviate substantially from each other, yet when they do, the gap will be closed sooner or later again.

For this reason, it is certainly possible for two time series to be correlated but not cointegrated, cointegrated but not correlated, both or none.

Cointegration is an often encountered feature of economic or financial time series. A typical text book example is a country’s consumption and income. The more people earn, the more they have left to consume - of course assuming stable prices. Both time series usually grow over time. We would therefore assume consumption and income to be cointegrated time series.

Or consider an investor who wants to build a portfolio. For cointegrated stocks, a significant deviation between the two stocks will soon close again. An example would probably be gold and silver prices. If the two deviate significantly from each other, an arbitrage opportunity exists. (Someone could buy the relatively cheaper metal and sell the relatively more expensive metal, waiting for the gap to close again.) There are some really interesting articles out there on this topic, see the references section.

What we need is a statistical test for cointegration. There are different such tests, but the most common one is probably the Augmented Dickey-Fuller (ADF) test. The ADF test returns a negative value. The more negative this value is, the higher the probability that the null hypothesis - “There is no cointegration present in the compared time series.” - can be rejected. Whereas the ADF test is available for nearly all statistics software, unfortunately there is no simple Excel formula for it. (There is however an AddIn provided by Kurt Annen.)

For the statistics software R, there is a great introductory article written by Paul Teetor available at http://quanttrader.info/public/testForCoint.html. Besides explaining how to calculate an ADF test, it also shows all the steps how to import your data into R from a CSV file and how to prepare it for analysis.

This involves three steps.

First, we calculate a measure for the “co-movement” of both series. For this purpose, we use a simple linear regression formula between the two time series. It does not really matter which one is selected as the “dependent” and which one as the “independent” series, because we do not claim that there exists a “dependency relation” between the two. Be aware that we are not interested in the intercept, but only in the beta (β), that is the regression coefficient. This beta tells us something about how strongly a change in one time series is accompanied by a corresponding change in the other time series. Therefore, we want to solve the following regression formula:

`Xtime series 1 = (-β) * Xtime series 2`

In R, we can use the `lm` function to solve this regression formula, in Excel 2013 we can perform a regression analysis (under Data -> Data Analysis -> Regression).

Second, we can calculate a new time series of “spreads” or “differences” between values of the two original time series using the formula:

`spreadt = xt, time series 1 - β * xt, time series 2`

Third, we apply the ADF test on the new time series of spreads. Our null hypothesis is: “The spread time series is not-stationary.” If we can reject this null hypothesis at a, let’s say, 95% level, then we can accept the alternative hypothesis: “The spread time series is indeed stationary.” In R, there is for instance the `adf.test` function.

`adf.test(spreads, alternative="stationary", k=0)`

The function returns a Dickey Fuller statistical value, and, thankfully, also a probability value p-value which can be interpreted more easily. If the p-value is < 0.05 (critical 5% threshold) then the spread is likely to be mean-reverting, which means that the two time series are likely to be cointegrated. Otherwise the spread is not mean-reverting, thus the two time series are unlikely to be cointegrated.

This is a very good description on what cointegration between time series is:

Gekkoquant.com has some nice articles on ADF tests, cointegration, statistical arbitrage etc.:

Another excellent blog with various good articles on the topic is by Ernest Chan. Search for “Cointegration” on the blog to find many more articles like these two:

Article on Pairs Trading at Godotfinance.com: