S&P 500 Sectors – Historical Holdings Data

S&P 500 Sector ETFs“Diversification is protection against ignorance.  It makes little sense if you know what you are doing.”
– Warren Buffett

Well when it comes to selecting individual companies on the basis of value, I certainly don’t know what I am doing and you know what?  I don’t care to learn.

That is the #1 draw card of ETFs; they provide diversification that protects me from my ignorance.  Furthermore by tracking the average of the stocks in an ETF, the noise found in the data of each individual holding is largely canceled out leaving numbers that are easier to decipher through technical analysis.

BUT, the data from an ETF is NOT the data from the underlying assets.  Yes, an ETFs price changes reflect the net asset value (NAV) of its holdings, but nothing more.  Quality breadth data is difficult to come by and historical breadth data going back more than 5-10 years is almost non-existent.  Access to such data is only a dream for most trading system engineers.

We contacted ‘S&P Dow Jones’ looking for such information and discovered that historical constituent data for the S&P 500 would cost $1,800 USD a year… 20 years would cost $36,000 and to include each of the 9 S&P sectors they would do us a deal; just $120,000.00 bucks…  We do have a budget for data, but…

So as luck would have it I managed to make friends with Mr XXXX from State Street who was kind enough to give me monthly S&P sector constituent data back to 2001.  But a lot has changed over the last 12 years.  Many of the S&P 500 holdings have been de-listed, changed names, ticker codes, have merged, been acquired, broken up etc.  Hunting down the last trading name, ticker code and clean data for these stocks is not a task for the faint of heart (or short of patience).

I could write a book about the difficulty of this task but instead will give you one example:

The old ‘General Motors’ (GM) stock was de-listed in March 2011 following bankruptcy.  What was remaining of the old GM at that time was trading under the name ‘Motors Liquidation Company’ (MTLQQ).  You will not find this name or ticker code in any historical holdings data for the S&P 500 or the S&P Consumer Discretionary Index because GM was removed from these indices in June 2009, before the name change.  However in November 2010 the new ‘General Motors’ was re-listed under the same name and symbol and in June 2013 returned to the S&P 500.  Very confusing!  Hundreds of similar yet different scenarios have faced the constituents of the S&P 500 over the last 23 years so you can imagine how difficult it was reconciling this database.

Anyway, with that hard work done we received some help from Frank Hassler over at Engineering Returns who provided us with fairly clean S&P 500 holdings data back to 1990.  Then the hard work began again and after multiple crossover checks it was a matter of researching several hundred stocks individually (many of which had been de-listed for over 15 years) and classifying them into the corresponding sectors.  Several sources were used for this process including:

http://www.moodys.com
http://en.wikipedia.org
http://www.bloomberg.com
http://www.fundinguniverse.com/company-histories/
http://www.nytimes.com
http://www.nndb.com

We logged about 270 hours on the project and now have a very exciting, quality database to work with (proof the data is good).  Realistically, most people wouldn’t know how to use this database even if they wanted to but I am happy to provide you with a copy at no cost on request.  All I ask is three things or your request will be ignored; 1 Let me know what ideas you want to test, 2 I must agree that these ideas are worth testing, 3 I kindly ask that you share your findings 🙂

Over the coming months we will be publishing a variety of tests using this data including:

  • Correlation, Beta and Volume – Does the tail now wag the dog?  Has there been an increase in the correlation of stocks since the proliferation of ETFs?
  • Momentum – Emulating the results seen in published papers on momentum and looking for new findings.
  • Volume – How can an index’s internal volume best be utilised in a trading system?
  • Breadth Data – What is effective?
  • Identifying The Best – A rising tide lifts all boats but how can one identify the best/worst performers within an asset group?

What kinds of tests would you like to see us perform?  Please leave your suggestions below: