Personal code snippets of @tmasjc

Site powered by Hugo + Blogdown

Image by Mads Schmidt Rasmussen from unsplash.com

Minimal Bootstrap Theme by Zachary Betz

Spurious Correlaton: A Simple Demonstration

Jun 27, 2020

Very often, some business analyst tries to put some sequences on a line graph and determine if there is some correlation. This can be dangerously misleading. Let’s demonstrate this, we flip \(n\) coins, with head gives 1 point and tail minuses 1 point. Logically, there should not (and do not) be any correlation between them. However, merely looking at the graph may tell otherwise.

library(tidyverse)
library(ggthemes)
set.seed(1234)

coins = 4L
flips = 1e3

calc_score <- function(vec) { 
    map_dbl(1:length(vec), 
            ~ sum(vec[1:.x] == 1) - sum(vec[1:.x] == 0))
}

f   <- expression(rbinom(flips, 1, 0.5))

res <- rerun(coins, calc_score(eval(f))) %>% 
    set_names(paste("coin", 1:coins)) %>% 
    bind_rows() %>% 
    mutate(flip = row_number())

df <- res %>%
    pivot_longer(cols = -flip,
                 names_to = "coin",
                 values_to = "score")

df %>% 
    ggplot(aes(flip, score, col = coin)) +
    geom_line(aes(group = coin), size = 1.1) +
    coord_cartesian(ylim = c(-60, 60)) +
    scale_color_wsj() +
    theme_minimal(base_family = "Menlo") +
    labs(x = "# Flips", y = "Score", col = "Coin", 
         title = "Tossing 4 Coins Randomly",
         subtitle = "an example of spurious correlation")