Dunning-Kruger effect – J M Barbone

Note: I originally wrote this in February 2019. Also, don’t hate me for putting some links to Wikipedia, you were going to go there anyway. This isn’t a research paper.

Introduction

In my Organization Psychology graduate class at West Chester University, one of our assigned readings (among others) for our week on emotions and moods was (Sheldon, Dunning, and Ames 2014). This article focused on another finding relating to the Dunning-Kruger effect in the workplace. This time, in a task related to emotional intelligence (EI). During the time I remember vaguely hearing somewhere I will never recall, some mathematical issues relating to this well-known psychological phenomenon mentioned in many introductory text books.

The Dunning-Kruger effect is founding on the concept that an individual that lacks expertise will be more confident in their abilities than they really are, or overestimate their performance on a task. Yet experts may underestimate their own performance or abilities or be more accurate in their estimations. One thing we can derive from this is possibly that those with lower skill will overestimate their abilities while those more skilled will underestimate their abilities.

The following is in direct relation to an article by (Sheldon, Dunning, and Ames 2014). They report a significant relationship between an individual’s actual performance and the difference between their perceived ability and actual performance in three conditions (\(r_1 = -0.83\), \(p_1 < .001\); \(r_2 = -0.87\), \(p_2 < .001\); \(r_3 = -0.84\), \(p_3 < .001\)).

They also used these two graphs to representing their findings:

*Figure 1*. Overestimation of emotional intelligence (left panel) and performance on the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT; right panel) as a function of actual performance on the MSCEIT

We’ll go through and understand why this can be misleading and how to replicate the Dunning-Kruger effect with random data. Yes, random data. Data that are random.

Set up

So let’s place with some data and see what we get. First, let’s setup our .Rmd file and choose a specific randomization seed so we can come back to our results (Douglas 1989):

Code

set.seed(42)

options(tidyverse.quiet = TRUE) ## silences warnings
library(tidyverse)
library(mark)                   ## percentile_rank() | github.com/jmbarbone/mark
#> 
#> Attaching package: 'mark'
#> The following object is masked from 'package:purrr':
#> 
#>     none
library(broom)                  ## tidying statistical outputs into tables
theme_set(theme_minimal())

Code

f_nbins <- function(x, n = 6) {
  dplyr::ntile(x, n) / n
}

Random data

We’ll start by creating a data frame with two vectors of independent, random data. These will be our randomly assigned percentile ranks of actual and estimate’d performance.

To clarify, the calculation of percentile rank is as follows:

\[\text{PR}_i = \frac{c_\ell + 0.5 f_i}{N} \times 100%\]

Where \(c_\ell\) is the count of scores lower than the score of interest, \(f_i\) is the frequency of the score of interest, and \(N\) is the total number of scores. With this formula, our percentile ranks will always be 0 < \(PR_i\) < 100.

Code

random_data <- 
  tibble(
    actual = rnorm(1000),
    estimate = rnorm(1000)
  ) %>% 
  mutate(across(everything(), percentile_rank))

random_data
#> # A tibble: 1,000 × 2
#>    actual estimate
#>     <dbl>    <dbl>
#>  1  0.920   0.988 
#>  2  0.292   0.704 
#>  3  0.656   0.846 
#>  4  0.738   0.646 
#>  5  0.670   0.150 
#>  6  0.462   0.274 
#>  7  0.946   0.576 
#>  8  0.464   0.0005
#>  9  0.980   0.198 
#> 10  0.480   0.798 
#> # … with 990 more rows

We also want to bin our data together just like in the article.

Code

bins <- 
  random_data %>% 
  mutate(
    difference = estimate - actual,
    bin = f_nbins(actual, 5)
  ) %>% 
  group_by(bin) %>% 
  summarise(
    n = n(),
    mean = mean(difference)
  )

bins
#> # A tibble: 5 × 3
#>     bin     n    mean
#>   <dbl> <int>   <dbl>
#> 1   0.2   200  0.398 
#> 2   0.4   200  0.208 
#> 3   0.6   200  0.0120
#> 4   0.8   200 -0.215 
#> 5   1     200 -0.402

Now we’ll plot the data and take a look at this.

Code

ggplot(random_data, aes(x = actual, y = estimate - actual)) +
  geom_point(alpha = .2) +
  geom_smooth(formula = "y ~ x", method = lm, se = FALSE, col = "red") +
  geom_point(data = bins, aes(x = bin, y = mean), col = "blue", shape = 1, size = 5) +
  geom_line( data = bins, aes(x = bin, y = mean), col = "blue", size = 1) +
  geom_hline(yintercept = 0, linetype = 2) +
  labs(
    title = "Independent random samples of 'Actual' and 'Estimate' performance",
    x = "'Actual' performance (Percentile Rank)",
    y = "Percentile Overestimation\n(estimate - actual)"
  )

Already we’re seeing a trend very similar to that reported in the article. What we also notice is that there are bounds to the overestimation value as a factor of the individual’s actual performance. An individual that performs at the 99th percentile cannot overestimate their own performance (but can be accurate) - much like an individual in the lower percentiles would unlikely underestimate. These is additionally worse by the use of a score derived in reference to others.

Adjusting random data

So now we’re going to take some data and use some rough estimates for means. We’ll use the results from the study of interest. So simplicity, I’ll just use the rough means of the n, means, and sd reported from the first two studies.

We’ll shape our normal distributions around the values found in the paper. These values, to be clear, are the percentile ranks either estimated from the participant or the actual ones as they compare to percentile ranking among U.S. adults in EI. As such, we won’t need to use the percentile_rank() again.

Code

adj_random <- tibble(
  actual     = rnorm(161, 42.2, sd = 25.1) / 100,
  estimate   = rnorm(161, 77.5, sd = 13.1) / 100,
  difference = actual - estimate,
  bin        = f_nbins(actual, 5)
)

adj_bins <- 
  adj_random  %>% 
  group_by(bin) %>% 
  summarise(
    n = n(),
    mean = mean(difference)
  )

Let’s also take a look at the correlations we have. As expected, we have no correlation with random data. The article reported correlations of .20 and .19 between estimated and actual performance. Clearly, people are not that great at estimating their own performance.

Code

cor.test(~ actual + estimate, data = adj_random)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  actual and estimate
#> t = 1.3, df = 159, p-value = 0.1955
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.05296245  0.25321085
#> sample estimates:
#>       cor 
#> 0.1025525

Well, no surprise that that our correlations are a weaker and less statistically significant, we’re using random data after all.

Now we’re going to run a correlation on the actual scores and the difference between the estimated and actual performance.

Code

cor_test_result <- cor.test(~ actual + difference, data = adj_random)
cor_test_result
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  actual and difference
#> t = 21.717, df = 159, p-value < 2.2e-16
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.8197724 0.8991906
#> sample estimates:
#>       cor 
#> 0.8647931

Now, look at that. We have found an even more significant, negative correlation. This is roughly similar to those reported by in this article. This is with data that has absolutely no relationship between the two variables, as we have justed established.

So why is this?

Plotting adjusted random data

Let’s graph out our results with a little more care this time.

Code

ggplot(adj_random, aes(x = actual, y = difference)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_point(alpha = .1) +
  geom_smooth(
    formula = "y ~ x",
    method = "lm",
    se = FALSE,
    col = "red"
  ) +
  geom_point(
    data = adj_bins,
    aes(x = bin, y = mean),
    col = "blue",
    shape = 1,
    size = 5
  ) +
  geom_line(
    data = adj_bins,
    aes(x = bin, y = mean),
    col = "blue",
    size = 1
  ) +
  labs(
    title = "Randomly generated differences in 'actual' vs 'estimated' performance",
    subtitle = "Estimate: M = 75, SD = 15; Actual: M = 5, SD = 25",
    x = "Actual performance",
    y = "Estimated - Actual performance"
  ) +
  annotate(
    geom = "text",
    label = glue::glue_data(cor_test_result, "r = {round(estimate, 3)}, p = {format(p.value)}"),
    x = .65,
    y = .50,
    hjust = "left"
  )

More random data

So what if we repeated this several times?

Code

random_helper <- function(x) {
  set.seed(42 + x)
  tibble(
    actual   = rnorm(161, 42.2, sd = 25.1) / 100,
    estimate = rnorm(161, 77.5, sd = 13.1) / 100,
  ) %>%
    mutate(across(everything(), percentile_rank))
}

sev_random <- 
  as.list(seq(100)) %>% 
  map(random_helper) %>% 
  bind_rows(.id = "id") %>% 
  mutate(id = as.numeric(id))

sev_bins <- sev_random %>% 
  group_by(id) %>% 
  mutate(
    difference = estimate - actual,
    bin = f_nbins(actual, 5) 
  ) %>% 
  group_by(id, bin) %>% 
  summarise(
    n = n(),
    mean_est = mean(estimate),
    mean_diff = mean(difference)
  )

ggplot(sev_bins, aes(x = bin, y = mean_est, col = factor(id))) +
  geom_point() +
  geom_line() +
  # scale_color_discrete(name = "Randomization") +
  scale_color_discrete(guide = FALSE) +
  scale_y_continuous(limits = c(0, 1)) + 
  labs(x = "Actual", y = "Estimate")
#> Warning: It is deprecated to specify `guide = FALSE` to remove a guide. Please
#> use `guide = "none"` instead.

So what if we actually run a correlation on these numbers? We’ll create a nested function and install the broom package to help tidy up our results.

Code

run_correlations <- function(x, item_x, item_y) {
  corr_helper <- function(x, item_x, item_y) {
    formula <- str_c("~", item_x, "+", item_y, sep = " ")
    cor.test(eval(parse(text = formula)), data = x)
  }
  
  x %>% 
    nest(data = -id) %>% 
    mutate(
      corr = map(data, corr_helper, item_x, item_y),
      tidy = map(corr, tidy)
    ) %>% 
    unnest(tidy) %>% 
    select(where(negate(is.list)))
}

(x <- run_correlations(sev_random, "actual", "estimate") %>% arrange(p.value))
#> # A tibble: 100 × 9
#>       id estimate statistic p.value parameter conf.low conf.high method  alter…¹
#>    <dbl>    <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>   <chr>  
#>  1    35   -0.207     -2.67 0.00830       159 -0.351    -0.0545  Pearso… two.si…
#>  2    24    0.185      2.37 0.0188        159  0.0312    0.330   Pearso… two.si…
#>  3    99    0.170      2.17 0.0314        159  0.0154    0.316   Pearso… two.si…
#>  4    22   -0.164     -2.09 0.0379        159 -0.311    -0.00930 Pearso… two.si…
#>  5    90   -0.159     -2.03 0.0443        159 -0.306    -0.00417 Pearso… two.si…
#>  6     6    0.148      1.89 0.0610        159 -0.00685   0.296   Pearso… two.si…
#>  7     9    0.141      1.79 0.0747        159 -0.0141    0.289   Pearso… two.si…
#>  8    97    0.133      1.70 0.0914        159 -0.0216    0.282   Pearso… two.si…
#>  9    51   -0.127     -1.61 0.109         159 -0.276     0.0285  Pearso… two.si…
#> 10    89   -0.124     -1.57 0.118         159 -0.273     0.0316  Pearso… two.si…
#> # … with 90 more rows, and abbreviated variable name ¹alternative
(y <- run_correlations(sev_bins, "bin", "mean_est") %>% arrange(p.value))
#> # A tibble: 100 × 9
#> # Groups:   id [100]
#>       id estimate statistic p.value parameter conf.low conf.high method  alter…¹
#>    <dbl>    <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>   <chr>  
#>  1    99    0.936      4.62  0.0191         3    0.312     0.996 Pearso… two.si…
#>  2    35   -0.936     -4.61  0.0192         3   -0.996    -0.309 Pearso… two.si…
#>  3    28   -0.907     -3.72  0.0337         3   -0.994    -0.122 Pearso… two.si…
#>  4    41    0.845      2.74  0.0715         3   -0.146     0.990 Pearso… two.si…
#>  5    68    0.836      2.64  0.0780         3   -0.177     0.989 Pearso… two.si…
#>  6     5   -0.804     -2.34  0.101          3   -0.987     0.269 Pearso… two.si…
#>  7     6    0.796      2.28  0.107          3   -0.290     0.986 Pearso… two.si…
#>  8    10    0.786      2.20  0.115          3   -0.314     0.985 Pearso… two.si…
#>  9    89   -0.761     -2.03  0.135          3   -0.983     0.369 Pearso… two.si…
#> 10    13   -0.758     -2.01  0.138          3   -0.983     0.376 Pearso… two.si…
#> # … with 90 more rows, and abbreviated variable name ¹alternative

mean(x$p.value < .05)
#> [1] 0.05
mean(y$p.value < .05)
#> [1] 0.03

When we calculated a correlation with the mean estimates we actually got a significant result from a few of our runs. If fact, about 5% or less are statistically significant… Let’s pull that one out to look at it again.

Code

significant_ids <- x %>% filter(p.value < .05) %>% pull(id) %>% as.character()
temp <- sev_bins %>% filter(id %in% significant_ids)

sev_random %>% 
  filter(id %in% significant_ids) %>% 
  ggplot(aes(x = actual, y = estimate, group = factor(id), color = factor(id))) +
  geom_point(alpha = .2) +
  geom_point(data = temp, aes(x = bin, y = mean_est)) +
  geom_line(data = temp, aes(x = bin, y = mean_est))

So there you have it. A successful replication of this ‘effect’ with random data.

But why is this? This is partly because individuals at the lowest quantiles will have a greater likelihood of over-estimating their performance and those at the highest quantiles will underestimate. An individual that performs at the 99th quantile will have almost no choice but to estimate their performance to be below that of reality (see also (Nuhfer et al. 2016)). This seems to be further worsened by the bound nature of the scores. Were these scores and estimates to be something not bound in such a way (for instance the speed in which an individual could complete an assessment) examning the relationship between actual and estimate performance could yield more valid results. These graphical representations and analyses should be cautioned as they are not very meaningful to understanding their effects.

References

Douglas, Adams. 1989. The Hitchhiker’s Guide to the Galaxy. New York: Harmony Books.

Nuhfer, Edward, Christopher Cogan, Steven Fleisher, Eric Gaze, and Karl Wirth. 2016. “Random Number Simulations Reveal How Random Noise Affects the Measurements and Graphical Portrayals of Self-Assessed Competency.” Numeracy: Advancing Education in Quantitative Literacy 9 (1). https://doi.org/10.5038/1936-4660.9.1.4.

Sheldon, Oliver J, David Dunning, and Daniel R Ames. 2014. “Emotionally Unskilled, Unaware, and Uninterested in Learning More: Reactions to Feedback about Deficits in Emotional Intelligence.” Journal of Applied Psychology 99 (1): 125. https://doi.org/10.1037/a0034138.