# Testing Time Series for Cointegration

Cointegration is an important concept when dealing with time series data. Here's the corresponding definition on Wikipedia:

Cointegration is a statistical property of time series variables. Two or more time series are cointegrated if they share a common stochastic drift.

In other (rather non-scientific) words, if both time series are non-stationary *and* they share a trend together (which can be explained through the existence of a common cause), then they are *cointegrated*.

Cointegration is not the same as correlation!

For this reason, it is certainly possible for two time series to be correlated but not cointegrated, cointegrated but not correlated, both or none.

Cointegration is an often encountered feature of economic or financial time series. A typical text book example is a country's consumption and income. The more people earn, the more they have left to consume - of course assuming stable prices. Both time series usually grow over time. We would therefore assume consumption and income to be cointegrated time series.

Or consider an investor who wants to build a portfolio. For cointegrated stocks, a significant deviation between the two stocks will soon close again. An example would probably be gold and silver prices. If the two deviate significantly from each other, an arbitrage opportunity exists. (Someone could buy the relatively cheaper metal and sell the relatively more expensive metal, waiting for the gap to close again.) There are some really interesting articles out there on this topic, see the references section.

What we need is a statistical test for cointegration. There are different such tests, but the most common one is probably the Augmented Dickey-Fuller (ADF) test. The ADF test returns a negative value. The more negative this value is, the higher the probability that the null hypothesis - "There is no cointegration present in the compared time series." - can be rejected. Whereas the ADF test is available for nearly all statistics software, unfortunately there is no simple Excel formula for it. (There is however an AddIn provided by Kurt Annen.)

For the statistics software R, there is a great introductory article written by Paul Teetor available at http://quanttrader.info/public/testForCoint.html. Besides explaining how to calculate an ADF test, it also shows all the steps how to import your data into R from a CSV file and how to prepare it for analysis.

This involves three steps.

First, we calculate a measure for the "co-movement" of both series. For this purpose, we use a simple linear regression formula between the two time series. It does not really matter which one is selected as the "dependent" and which one as the "independent" series, because we do not claim that there exists a "dependency relation" between the two. Be aware that we are not interested in the intercept, but only in the *beta (β)*, that is the regression coefficient. This *beta* tells us something about how strongly a change in one time series is accompanied by a corresponding change in the other time series. Therefore, we want to solve the following regression formula:

`X`

_{time series 1} = (-β) * X_{time series 2}

In R, we can use the `lm`

function to solve this regression formula, in Excel 2013 we can perform a regression analysis (under Data -> Data Analysis -> Regression).

Second, we can calculate a new time series of "spreads" or "differences" between values of the two original time series using the formula:

`spread`

_{t} = x_{t, time series 1} - β * x_{t, time series 2}

Third, we apply the ADF test on the new time series of spreads. Our null hypothesis is: "The spread time series is not-stationary." If we can reject this null hypothesis at a, let's say, 95% level, then we can accept the alternative hypothesis: "The spread time series is indeed stationary." In R, there is for instance the `adf.test`

function.

`adf.test(spreads, alternative="stationary", k=0)`

The function returns a *Dickey Fuller* statistical value, and, thankfully, also a probability value *p-value* which can be interpreted more easily. If the *p-value* is < 0.05 (critical 5% threshold) then the spread is likely to be mean-reverting, which means that the two time series are likely to be cointegrated. Otherwise the spread is not mean-reverting, thus the two time series are unlikely to be cointegrated.

# References

This is a very good description on what cointegration between time series is:

Gekkoquant.com has some nice articles on ADF tests, cointegration, statistical arbitrage etc.:

- http://gekkoquant.com/2012/12/17/statistical-arbitrage-testing-for-cointegration-augmented-dicky-fuller/
- http://gekkoquant.com/2012/10/21/statistical-arbitrage-correlation-vs-cointegration/

Another excellent blog with various good articles on the topic is by Ernest Chan. Search for "Cointegration" on the blog to find many more articles like these two:

- http://epchan.blogspot.ch/2006/11/cointegration-is-not-same-as.html
- http://epchan.blogspot.ch/2013/11/cointegration-trading-with-log-prices.html

Article on Pairs Trading at Godotfinance.com: