A forum posting helpfully combining my next (and last!) 2 papers:
In the last chapter of the text, Cosmides and Tooby proposes, via the frequentist hypothesis, that human cognitive mechanisms are designed to work with frequency representations (both input and output) and thus are ill equipped to work with percentage and probability data. The famous medical diagnosis problem is also presented and it is shown how people perform poorly on the percentage/probability version, better on the frequency information version and best on the active pictorial frequency version.
However, the text mentioned that 45% of experts at Harvard Medical School answered 95% (taken as ignoring base rates of false positives) when the actual answer was 2%. There are actually studies that suggest that this error rate is even higher. Eddy (1982) posed a similar medical diagnosis problem to actual physicians, and 95% of the physicians’ answers were near to 10 times that of the actual probability of the person who tested positive, actually having the disease. In a more recent replication, Hoffrage and Gigerenzer (1998) found similar results with physicians again.
It is probably the case that physicians receive little training in statistics, but a similar problem plagues scientists, including psychologists (who receive sufficiently rigorous training in statistics) in the case of practicing null hypothesis significance testing (NHST) in research. In a nutshell (for those who are unfamiliar) NHST requires that one specify a pair of hypotheses; the first is called the null hypothesis, which is usually a statement of little or no effect (in contrast to the researcher’s aim to find an effect). NHST yields a probability value, p, which is the probability of the data occurring given that the null hypothesis is true. If this probability is low enough (a criterion of 5% is the norm, also known as the alpha level), the null hypothesis is rejected in favour of the alternative hypothesis, which is what most researchers hope for.
Inadequacies of such a system aside, the p value can be expressed as P(data|null is true). However, many scientists wish instead to know P(null is true|data), which is the probability that the null hypothesis is true, given that the data has occurred. The two are not the same, and to find the latter one would need to perform a Bayesian maneuver which requires information about the base rates of the null and alternative hypotheses being true. However, many scientists commit a serious error here by ignoring the requirement that base rates need to be known for the latter to be calculated. For example, Oakes (1986) found that 96 percent of academic psychologists believed that a significance test (i.e. the p value) indicated the probability of the null hypothesis (directly believing this to be the p-value) or the alternative hypothesis (subtracting the p-value from 1) being true.
This is a little different from the medical diagnosis problem, as base rates are usually unknown in psychological research. I think, however, that the failure to acknowledge the requirement of base rates in calculations, as well as poor operational ability with percentage and probabilistic data underlie both problems.
I’m not sure whether there is a good way to portray the information NHST works on in frequency formats similar to what Cosmides and Tooby did, or whether such a way will reduce such inferential errors by NHST users, but I’m quite sure that it would be a really troublesome modification to an already troublesome procedure. Efforts to train students and teachers alike in conditional probability and Bayesian reasoning have met with little success (Sedlmeier, 1999). Perhaps scientists should just realize that they are just as human as everyone else and that these reasoning errors are a constantly looming threat while conducting research. Just my two cents’ worth!
Incidentally "just some thoughts", "my two cents worth" and variants thereof are simultaneously the most annoying and most common forum signatures one can see, especially if followed by a smiley.