8.11 – Potential Errors in Research Decisions

Eric Haas

8.11 – Potential Errors in Research Decisions

Now that we have run a research study, examined the data using inferential statistics, and reported our results, it is critical that we understand the limitations of any conclusions we have made.

Remember that our research decisions or conclusions are made using inference. It is an “educated guess“ about the true meaning of our research results, but still a guess. We don’t know for sure. Instead, we are just using probability to help us draw conclusions that are unlikely to be wrong. It is still possible that we are wrong.

One of the impressive and important aspects of the scientific and statistical processes, however, is that scientists have identified the ways that their research conclusions can be wrong so that they can manage these possibilities. They are called:

Type I Error
Type II Error

Type I Error

A Type I Error occurs when a researcher mistakenly rejects a true null hypothesis. In other words, the researcher is concluding that there is a treatment effect or difference or correlation when, in fact, there isn’t one.

Why would a researcher ever claim that there is an effect when there isn’t one? We never actually know if there truly is an effect or not. If we were gods, we would know the true reality of whether or not there is an effect. However, we are not gods, and thus we need to make guesses about the true nature of the world based on observable results. Because most scientific research can only study a sample of people from a population, we are only able to make a guess about the true nature of the world (what is true in the population) based on a limited amount of information (what happened with our sample).

Type I errors typically happen when the results of a research study appear to show an effect because the sample mean is noticeably different than what you would expect it to be without any treatment effect. However, this difference was simply due to chance, usually sampling error.

Let’s take our study of sleep deprivation’s impact on memory. Based on our inferential statistics, we concluded that sleep deprivation probably had an effect on memory. In other words, we were claiming that there probably was a treatment effect. In terms of our hypothesis and research decision, this means that we rejected the null hypothesis that sleep deprivation has no effect on memory.

Do we know for sure that we are correct? No. Again, we are only making a guess based on our statistical results. It is still possible that sleep deprivation does not truly impact memory. In other words, it is possible that our research results are simply due to sampling error. We might have accidentally sampled a bunch of people with poor memory skills, and thus, they scored lower on the memory test for reasons not related to sleep deprivation. If this were the case, we would have made a Type I Error by concluding that sleep deprivation impacted memory.

Type II Error

A Type II Error occurs when a researcher mistakenly retains a false null hypothesis. In other words, the researcher is concluding that there is no treatment effect or difference or correlation when, in fact, there is one.

Why would a researcher ever claim that there is no effect when there is one? Again, we never actually know if there truly is an effect or not. If we were gods, we would know the true reality of whether or not there is an effect. However, we are not gods, and thus we need to make guesses about the true nature of the world based on observable results.

Type II errors typically happen when the results of a research study appear to show no effect because the sample mean is not noticeably different than what you would expect it to be without any treatment effect.

Let’s take our study of sleep deprivation’s impact on memory. Let’s say that our z-score was not in the critical region. In this case, we would then “retain the null” because the sample mean was not noticeably different from what we expect if the null hypothesis is true. We would then conclude that sleep deprivation probably had no effect on memory. In other words, we are claiming that there probably is no treatment effect. In terms of our hypothesis and research decision, this means that we retained the null hypothesis that sleep deprivation has no effect on memory.

Do we know for sure that we are correct? No. Again, we are only making a guess based on our statistical results. It is still possible that sleep deprivation does truly impact memory. In other words, it is possible that our research results are simply due to chance variability, often sampling error. We might have accidentally sampled a bunch of people with really good memory skills, and thus, while the sleep deprivation may have impaired their memory, they started with much higher levels of memory, and thus, the impairment is less pronounced and didn’t reach the critical region. If this were the case, we would have made a Type II Error by concluding that sleep deprivation did not impact memory.

Another common reason for Type II Errors is when the effect of the treatment is small. It is possible that sleep deprivation impairs memory, but only a little bit. As a result, it wouldn’t be unusual for a sample who experienced sleep deprivation to only have slightly lower memory scores. This small reduction in memory may not be enough for the z-score to reach the critical region, and thus, we would conclude, wrongly, that sleep deprivation didn’t have an impact. This type of situation can happen when sample sizes are relatively small.

Potential Research Decision Outcomes

How do we know whether we have made a Type I or Type II Error? One way to think about it is to understand that once a researcher makes a decision to either “reject the null” or “retain the null,” there are two possible outcomes:

They are right
They are wrong (and have made an error)

Now, as we’ve discussed above, because we are not gods, we don’t actually ever know if we are right or wrong. Instead, we know that there are simply two possible outcomes when we make a research decision, and one of them is that we’ve made an error.

Thus, determining whether we have made a Type I or Type II Error depends on the research decision:

If you “reject the null” and claim an effect/difference/correlation, then the only possible error you could make is a Type I Error
If you “retain the null” and claim no effect/difference/correlation, then the only possible error you could make is a Type II Error

Putting it all together, we can look at all the possible outcomes in a table:

A table of the possible outcomes of research decisions including Type I and Type II Errors

To use the table, it can help to start on the left side, where the researcher makes a decision. There are two possible decisions:

If the results of the test statistic are in the critical region, then the research will “reject the null” and claim that there is probably an effect, difference, or correlation.
If the results of the test statistic are not in the critical region, then the research will “retain the null” and claim that there is probably no effect, difference, or correlation.

Once the researcher makes this decision, there are two possible outcomes: they are right or they are wrong.

Using the table, if a researcher makes the decision to “reject the null,” they are in that row. Looking at the table, you can then see that there are two possible outcomes: Type I Error (they are wrong) or Correct Decision (they are right).

Table highlighting the potential outcomes of a decision to "reject the null."

Alternatively, if a researcher makes the decision to “retain the null,” they are in that row. Looking at the table, you can then see that there are two possible outcomes: Correct Decision (they are right) or Type II Error (they are wrong).

Table highlighting the potential outcomes of a decision to "retain the null."

Type I and Type II Errors and the U.S. Justice System

In order to understand Type I and Type II Errors, it can be helpful to use the analogy of the U.S. justice system because it has a very similar setup to hypothesis testing. Here are the analogous components:

Null Hypothesis (H₀) = Innocent (did not commit the crime)
Alternative Hypothesis (H₁) = Guilty (committed the crime)
Type I Error = An innocent individual is found guilty
Type II Error = A guilty individual is found innocent

Like the scientific method, where the null hypothesis is assumed to be true and the goal is to try to disprove it, the U.S. justice system assumes that the defendant is “innocent until proven guilty.” In other words, the focus will be on trying to disprove the defendant’s innocence. This is likely to be a very important part of the U.S. justice system because, like we saw in the scientific method, by focusing on trying to disprove something, we are much less likely to fall for confirmation bias.

Additionally, jury decisions in the U.S. justice system are essentially inferential decisions because the jury has to make a decision based on a limited amount of information or evidence. Like researchers, they aren’t gods and thus don’t know the true state of affairs, and thus they have to make educated guesses about guilt or innocence. In order to help a jury make the final decision, like the hypothesis testing system when it draws a “line in the sand” to create the critical region(s), the U.S. justice system sets a criterion for when a jury can decide that the defendant is guilty. This line is described by the phrase: “Beyond a reasonable doubt.” In order to find a defendant guilty, the evidence needs to cross the line in the sand and remove any “reasonable doubt.” This doesn’t mean that their decision is correct. Instead, it means that they are “probably correct” because the amount of doubt isn’t enough to change the decision. It is still possible that the jury made a Type I Error; it’s just that, like in hypothesis testing, the likelihood of this is relatively small.

Consequences of Type I and Type II Errors

Because Type I and Type II Errors are “errors,” they can have negative consequences:

Consequences of Type I Errors:
- False findings may get published (research journals tend to publish “significant” findings rather than “non-significant” findings)
- Clinicians, other researchers, and consumers may act based on an assumption that the false finding is true

Consequences of Type II Errors:
- The researchers are disappointed and usually unable to publish their results (research journals tend to publish “significant” findings rather than “non-significant” findings)
- Potential effects are not detected

Because of these negative consequences, researchers work to reduce the chances of these errors. However, there is no perfect way to do this. For one, Type I Errors and Type II Errors are inversely related to each other. In other words, if you decrease the chance of making a Type I Error, you necessarily increase the chance of making a Type II Error, and vice versa. As a result, researchers have to decide which type of error is a bigger problem for their research and then use the following strategies to manage their potential errors.

Factors that Impact Type I and Type II Errors

The first and most direct way that researchers impact the chance of Type I and Type II Errors is through setting their alpha level (α). Technically, the alpha level directly determines the probability of making a Type I Error. For example, if you set your alpha level at α = 0.05, then there is a 0.05 probability (or a 5% chance) that you will make a Type I Error.

Thus, the probability of making a Type I Error is depicted by the Greek letter “alpha.” On the other hand, the probability of making a Type II Error is depicted by the Greek letter “beta.”

α = probability of making a Type I Error
β = probability of making a Type II Error

While researchers can directly set their chance of making a Type I Error through setting the alpha level, the probability of making a Type II Error cannot be set by the researcher directly. Instead, it is measured and influenced by a number of factors:

Alpha Level (α)
Sample Size
Effect Size
Number of Tails (one-tailed or two-tailed)
Variance in the measurement

Note that the first four factors can be controlled by the researcher, and thus they can work to create a β that works for them. Just remember, however, that alpha and beta are inversely related, and so you it is have a very low probability of making a Type I Error while also having a very low probability of making a Type II Error. Instead, researchers have to balance alpha and beta depending on the research topic and the different consequences of making Type I Errors or Type II Errors.

For example, imagine that researchers are testing a drug that is very expensive and has terrible side effects. In this case, the possibility of a Type I Error, where they might claim that the drug works but it actually doesn’t, would be a very problematic outcome. People could end up taking a drug with lots of negative impacts (high cost and bad side effects) that doesn’t actually work. To try to minimize this possibility, the researchers would probably want to decrease the chances that they might make a Type I Error by reducing their alpha level (α). In this situation, making a Type II error, where they might claim that the drug doesn’t work but it actually does, would not be as problematic, and so they could live with a higher beta level (β) and thus a higher chance of making a Type II Error.

On the other hand, imagine that researchers are testing a screening test for pregnancy. This simple and inexpensive test can be used in the privacy of your own home and could be used by individuals as an initial test to determine if they are pregnant. Depending on the test results, the next step would be to confirm their pregnancy with a blood draw with their physician. In this case, the possibility of a Type I Error, where the test says they are pregnant but they actually aren’t, would not be a very problematic outcome. While the individual would falsely think they were pregnant for some amount of time before getting a blood test done, there would be little to no negative health impacts. However, the possibility of making a Type II error, where the test says they are not pregnant but they actually are, would be very problematic because the pregnant mother might consume things (e.g., alcohol, nicotine, etc.) that could have harmful health effects on the fetus. As a result, the researchers in this situation would want to have a low beta level (β) and thus a small chance of making a Type II Error. Thus, they would be okay with a higher alpha level (α).

Interpreting Findings

While it is the job of researchers to design their studies and set their alpha levels with a mind toward managing Type I and Type II errors, it is the job of research consumers to understand the potential limitations of any research conclusions. While many empirical research studies are often phrased as if they “proved” that there was a treatment effect, a difference between groups, or a correlation between variables, it is important for consumers to remember that these findings are just educated guesses based on limited information. There is always the possibility that any research conclusion could be wrong (a Type I error or a Type II error).

While it is useful to maintain a reasonable level of skepticism regarding the results of any one research finding because of the possibility of Type I and Type II errors, it is also useful to remember that the scientific method is an iterative process where a community of researchers are all trying to understand the phenomenon at hand. Ultimately, it is through many research studies that the scientific community is able to truly understand.

Thus, don’t make a big deal out of any one research study, but feel free to make a big deal out of a large number of research studies with very similar results.

License

Icon for the Creative Commons Attribution 4.0 International License

8.11 – Potential Errors in Research Decisions

Type I Error

Type II Error

Potential Research Decision Outcomes

Type I and Type II Errors and the U.S. Justice System

Consequences of Type I and Type II Errors

Factors that Impact Type I and Type II Errors

Interpreting Findings

License

Share This Book

Feedback/Errata

Leave a Reply Cancel reply