# Data Science Statistics Correlation vs Causality

## Correlation Does Not Imply Causality

Correlation measures the numerical relationship between two variables.

A high correlation coefficient (close to 1), does not mean that we can for sure conclude an actual relationship between two variables.

A classic example:

• During the summer, the sale of ice cream at a beach increases
• Simultaneously, drowning accidents also increase as well

Does this mean that increase of ice cream sale is a direct cause of increased drowning accidents?

## The Beach Example in Python

Here, we constructed a fictional data set for you to try:

Output:

## Correlation vs Causality – The Beach Example

In other words: can we use ice cream sale to predict drowning accidents?

The answer is – Probably not.

It is likely that these two variables are accidentally correlating with each other.

What causes drowning then?

• Unskilled swimmers
• Waves
• Cramp
• Seizure disorders
• Lack of supervision
• Alcohol (mis)use
• etc.

Let us reverse the argument:

Does a low correlation coefficient (close to zero) mean that change in x does not affect y?

Back to the question:

• Can we conclude that Average_Pulse does not affect Calorie_Burnage because of a low correlation coefficient?