Odd Correlations
December 7th, 2020

A few years ago I discovered a fun website Spurious Correlations. When two variables are highly correlated (negatively or positively), and if their correlation doesn't seem to make any sense, that's when we call it spurious.
I started working on this post a few months ago, and quite frankly collecting data and finding patterns was frustrating. There are a lot more data sources that I'd like to explore, so I'm planning on collecting more data over time and expanding this post. For now, I'll share some exhibits that I found interesting.
For the purpose of this post, I compiled a dataset with over 1,600 of columns from a variety of sources including the World Bank, Federal Reserve Bank of St. Louis, EIA, FBI, Wikipedia, and a few others (full list is below).
Here are a few interesting spurious relationships I found (in case you don't feel like scrolling down). Dairy consumption in the US seems to correlate with a lot of not so good things including increased CO2 emissions in certain countries. While I can buy this trend (not the magnitude of the correlation coefficient), it's also correlated with different types of crimes in the US with milk, cottage cheese and regular ice cream being the worst offenders. Butter and cheese consumption, on the other hand, are negatively correlated with various types of crime in the US. Should American people increase butter and cheese consumption to keep crime at bay? We would need to do a different analysis to study any potential causal relationship.

You can find my Jupyter notebook Here.
# of Songs by the year in which they were written, first performed, published, recorded, or released vs CO2 Emissions (metric tons per capita) in Dominica

China/US Foreign Ecchange Rate vs Renewable Electricity Output (as % of Total Electricity Output) in Arab World

Mexico/US Foreign Exchange Rate vs Air Transport, Registered Carrier Departures in Vietnam

Consumer Price Index (All Items) in Russia vs # of ATMs per 100,000 adults in Pakistan

Government Gross Debt in Russia (% of GDP, Not Seasonally Adjusted) vs Rural Population in South Korea (% of Total Population) in South Korea

Banana Price vs # of ATMs per 100,000 adults in Lebanon

US Milk Consumption vs Rural Population (as % of Total Population) in North Korea

Cottage Cheese Consumption in the US vs US Forgery Crime

US Yogurt Consumption vs CO2 in China

# of Unmanned Space Launches vs # of ATMs per 100,000 adults in Afghanistan

US Motor Vehicle Theft vs China/US Foreign Exchange Rate

US Gambling Crimes vs # of ATMs per 100,000 adults in Denmark

Here you can explore other series (sorry no support for Internet Explorer). You can view JS code that powers this chart Here
Charts may not look exactly as the ones above due to different default y-axis ranges in Matplotlib (Python) and Google Charts (JavaScript)
Here is a full list of data sources I used
- Number of Songs by Year
- The World Bank
- US Annual Unemployment, GDP, and Inflation Data
- Space Launches
- Historical Oil Prices
- EIA data
- USDA Dairy Consumption Data
- US Crime Data
- Federal Reserve Bank of St. Louis (Exchange Rates, Commodity Prices, Russian Time Series)
- Historical Crude Oil Prices
- Uranium Marketing Annual Report
- US Annual Coal Report
- Temperature Information