Edward McQuarrie shed light on this issue in his paper “The Myth of 1926: How Much Do We Know About Long-Term Returns On U.S. Stocks?”

In the paper, he discusses the mystery of missing datasets and fractured data plus the biases in the early datasets used to construct the historical record. For years, financial planners and stock market pundits told us that we could expect an average return of 12% from the stock market, based on “long-term” historical market returns. Then, we were told we could expect either an 8% or 10% return, based on historical returns. The Center for Research in Security Prices (CRSP), which is considered the authoritative source on historical stock market prices, seems to back up this hypothesis. However, What McQuarrie discovered was its database of stock market prices is incomplete and highly biased.

The published record on stocks up to about 1950 only includes maybe 10%-20% of the total tradable stocks in the universe of stocks. So, when you hear, “the historical returns of the stock market are…”, what’s often left unsaid is the missing data from the database. The data that is published and subsequently used to create financial plans for the overwhelming majority of individuals is but a small slice of the total stocks people have traded and been invested in over the last 80 or so years.

And, 1926, which is usually used as the “start date” of the stock market, is arbitrary. Stocks have been trading for far, far, longer than that.

In the early days of the stock market, most publicly traded stocks were banks and insurance companies. Both are conservative financial institutions. Around 1925, they were not included in any database set. So, if you happened to invest heavily in bank stocks, you did not realize the return suggested by the historical data of “the stock market.” This practice of omission of certain industries and sectors continued up to about 1960. And, from 1926 up to about 1973, the CRSP database excluded more stocks than it included. The CRSP excludes stock market corrections of 1818, 1837, 1857, 1873, 1893, and 1907, and only includes the 5 major crashes of 1929-32, 1937-38, 1973-74, 2000-2002, and of course 2008.

What may surprise readers is obvious to those close to the CRSP dataset… that the database was never meant to be predictive, nor was it designed to be the definitive record of stock market prices that everyone thinks it is.

In fact, if you’re looking for the most comprehensive modern dataset on stock prices, don’t use anything before 1972.

And just in case you’re wondering, the historical return on stocks, using a more complete dataset from 1973 through today, is 6.35% (before investment fees and taxes).

What does the future hold?

Who knows?

But, we do know more and more about the past. And, that past is teaching us that we know less than we think.

