"

Limitations of Research Decisions

Now that we have run a research study, examined the data using inferential statistics, and reported our results, it is critical that we understand the limitations of any conclusions we have made.

Remember that our research decisions or conclusions are made using inference. It is an “educated guess about the true meaning of our research results, but still a guess. We don’t know for sure. Instead, we are just using probability to help us draw conclusions that are unlikely to be wrong. It is still possible that we are wrong.

One of the impressive and important aspects of the scientific and statistical processes, however, is that scientists have identified the ways that their research conclusions can be wrong so that they can manage these possibilities. They are called:

  • Type I Error
  • Type II Error

Type I Error

A Type I Error occurs when a researcher mistakenly rejects a true null hypothesis. In other words, the researcher is concluding that there is a treatment effect or difference or correlation when, in fact, there isn’t one.

Why would a researcher ever claim that there is an effect when there isn’t one? We never actually know if there truly is an effect or not. If we were gods, we would know the true reality of whether or not there is an effect. However, we are not gods, and thus we need to make guesses about the true nature of the world based on observable results. Because most scientific research can only study a sample of people from a population, we are only able to make a guess about the true nature of the world (what is true in the population) based on a limited amount of information (what happened with our sample).

Type I errors typically happen when the results of a research study appear to show an effect because the sample mean is noticeably different than what you would expect it to be without any treatment effect. However, this difference was simply due to chance, usually sampling error.

Let’s take our study of sleep deprivation’s impact on memory. Based on our inferential statistics, we concluded that sleep deprivation probably had an effect on memory. In other words, we were claiming that there probably was a treatment effect. In terms of our hypothesis and research decision, this means that we rejected the null hypothesis that sleep deprivation has no effect on memory.

Do we know for sure that we are correct? No. Again, we are only making a guess based on our statistical results. It is still possible that sleep deprivation does not truly impact memory. In other words, it is possible that our research results are simply due to sampling error. We might have accidentally sampled a bunch of people with poor memory skills, and thus, they scored lower on the memory test for reasons not related to sleep deprivation. If this were the case, we would have made a Type I Error by concluding that sleep deprivation impacted memory.

Type II Error

A Type II Error occurs when a researcher mistakenly retains a false null hypothesis. In other words, the researcher is concluding that there is no treatment effect or difference or correlation when, in fact, there is one.

Why would a researcher ever claim that there is no effect when there is one? Again, we never actually know if there truly is an effect or not. If we were gods, we would know the true reality of whether or not there is an effect. However, we are not gods, and thus we need to make guesses about the true nature of the world based on observable results.

Type II errors typically happen when the results of a research study appear to show no effect because the sample mean is not noticeably different than what you would expect it to be without any treatment effect.

Let’s take our study of sleep deprivation’s impact on memory. Let’s say that our z-score was not in the critical region. In this case, we would then “retain the null” because the sample mean was not noticeably different from what we expect if the null hypothesis is true. We would then conclude that sleep deprivation probably had no effect on memory. In other words, we are claiming that there probably is no treatment effect. In terms of our hypothesis and research decision, this means that we retained the null hypothesis that sleep deprivation has no effect on memory.

Do we know for sure that we are correct? No. Again, we are only making a guess based on our statistical results. It is still possible that sleep deprivation does truly impact memory. In other words, it is possible that our research results are simply due to chance variability, often sampling error. We might have accidentally sampled a bunch of people with really good memory skills, and thus, while the sleep deprivation may have impaired their memory, they started with much higher levels of memory, and thus, the impairment is less pronounced and didn’t reach the critical region. If this were the case, we would have made a Type II Error by concluding that sleep deprivation did not impact memory.

Another common reason for Type II Errors is when the effect of the treatment is small. It is possible that sleep deprivation impairs memory, but only a little bit. As a result, it wouldn’t be unusual for a sample who experienced sleep deprivation to only have slightly lower memory scores. This small reduction in memory may not be enough for the z-score to reach the critical region, and thus, we would conclude, wrongly, that sleep deprivation didn’t have an impact. This type of situation can happen when sample sizes are relatively small.

Potential Research Decision Outcomes

How do we know whether we have made a Type I or Type II Error? One way to think about it is to understand that once a researcher makes a decision to either “reject the null” or “retain the null,” there are two possible outcomes:

  1. They are right
  2. They are wrong (and have made an error)

Now, as we’ve discussed above, because we are not gods, we don’t actually ever know if we are right or wrong. Instead, we know that there are simply two possible outcomes when we make a research decision, and one of them is that we’ve made an error.

Thus, determining whether we have made a Type I or Type II Error depends on the research decision:

  • If you “reject the null” and claim an effect/difference/correlation, then the only possible error you could make is a Type I Error
  • If you “retain the null” and claim no effect/difference/correlation, then the only possible error you could make is a Type II Error

Putting it all together, we can look at all the possible outcomes in a table:

A table of the possible outcomes of research decisions including Type I and Type II Errors

To use the table, it can help to start on the left side, where the researcher makes a decision. There are two possible decisions:

  1. If the results of the test statistic are in the critical region, then the research will “reject the null” and claim that there is probably an effect, difference, or correlation.
  2. If the results of the test statistic are not in the critical region, then the research will “retain the null” and claim that there is probably no effect, difference, or correlation.

Once the researcher makes this decision, there are two possible outcomes: they are right or they are wrong.

Using the table, if a researcher makes the decision to “reject the null,” they are in that row. Looking at the table, you can then see that there are two possible outcomes: Type I Error (they are wrong) or Correct Decision (they are right).

Table highlighting the potential outcomes of a decision to "reject the null."

Alternatively, if a researcher makes the decision to “retain the null,” they are in that row. Looking at the table, you can then see that there are two possible outcomes: Correct Decision (they are right) or Type II Error (they are wrong).

Table highlighting the potential outcomes of a decision to "retain the null."

Type I and Type II Errors and the U.S. Justice System

In order to understand Type I and Type II Errors, it can be helpful to use the analogy of the U.S. justice system because it has a very similar setup to hypothesis testing. Here are the analogous components:

  • Null Hypothesis (H0) = Innocent (did not commit the crime)
  • Alternative Hypothesis (H1) = Guilty (committed the crime)
  • Type I Error = An innocent individual is found guilty
  • Type II Error = A guilty individual is found innocent

Like the scientific method where the null hypothesis is assumed to be true and the goal is to try to disprove it, the U.S. justice system assumes that the defendant is “innocent until prove guilty.” In other words, the focus will be on trying to disprove the defendant’s innocence. This is likely to be a very important part of the U.S. justice system because, like we saw in the scientific method, by focusing on trying to disprove something we are much less likely to fall for confirmation bias.

Additionally, jury decisions in the U.S. justice system are essentially inferential decisions because the jury has to make a decision based on a limited amount of information or evidence. Like researchers, they aren’t gods and thus don’t know the true state of affairs, and thus they have to make educated guesses about guilt or innocence. In order to help a jury make the final decision, like the hypothesis testing system when it draws a “line in the sand” to create the critical region(s), the U.S. justice system sets a criterion for when a jury can decide that the defendant is guilty. This line described by the phrase: “Beyond a reasonable doubt.” In order to find a defendant to be guilty, the evidence needs to cross the line in the sand and remove any “reasonable doubt.” This doesn’t mean that their decision is correct. Instead, it means that they are “probably correct” because the amount of doubt isn’t enough to change the decision. It is still possible that the jury made a Type I Error; it’s just that, like in hypothesis testing, the likelihood of this is relatively small.

Consequences of Type I and Type II Errors

Because Type I and Type II Errors are “errors,” they can have negative consequences:

  • Consequences of Type I Errors:
    • False findings may get published (research journals tend to publish “significant” findings rather than “non-significant” findings)
    • Clinicians, other researchers, and consumers may act based on an assumption that the false finding is true
  • Consequences of Type II Errors:
    • The researchers are disappointed and usually unable to publish their results (research journals tend to publish “significant” findings rather than “non-significant” findings)
    • Potential effects are not detected

Because of these negative consequences, researchers work to reduce the chances of these errors. However, there is no perfect way to do this. For one, Type I Errors and Type II Errors are inversely related to each other. In other words, if you decrease the chance of making a Type I Error, you necessarily increase the chance of making a Type II Error, and vice versa. As a result, researchers have to decide which type of error is a bigger problem for their research and then use the following strategies to manage their potential errors.

Factors that Impact Type I and Type II Errors

The first and most direct way that researchers impact the chance of Type I and Type II Errors is through setting their alpha level (α). Technically, the alpha level directly determines the probability of making a Type I Error. For example, if you set you alpha level at α = 0.05, then there is a 0.05 probability (or a 5% chance) that you will make a Type I Error.

Thus, the probability of making a Type I Error is depicted by the Greek letter, “alpha.” On the other hand, the probability of making a Type II Error, is depicted by the Greek letter, “beta.”

  • α = probability of making a Type I Error
  • β = probability of making a Type II Error

While researchers can directly set their chance of making a Type I Error through setting the alpha level, the probability of making a Type II Error cannot be set by the researcher directly. Instead, it is measured, and influenced by a number of factors:

  • Alpha Level (α)
  • Sample Size
  • Effect Size
  • Number of Tails (one-tailed or two-tailed)
  • Variance in the measurement

Note that the first four factors can be controlled by the researcher, and thus they can work to create a β that works for them. Just remember, however, that alpha and beta are inversely related, and so you it is have a very low probability of making a Type I Error while also having a very low probability of making a Type II Error. Instead, researchers have to balance alpha and beta depending on the research topic and the different consequences of making Type I Errors or Type II Errors.

For example, imagine that researchers are testing a drug that is very expensive and has terrible side effects. In this case, a Type I Error, where they claim that the drug probably works but it actually doesn’t, would be a very problematic outcome. People would be taking a drug with lots of negative impacts (expense and side effects) that doesn’t actually work. In this case, the researchers would probably want to lean more toward reducing their alpha level and decreasing the chances that the make a Type I Error. Making a Type II error in this situation, finding that the drug doesn’t work, would not be as problematic and so they could live with a higher beta level and chance of making a Type II Error.

On the other hand, imagine that researchers are testing a screening test for pregnancy. This simple and inexpensive test can be used in the privacy of your own home

or maybe a tx for phobias that is non-traumatizing.

definition

License

Icon for the Creative Commons Attribution 4.0 International License

Introduction to Statistics and Statistical Thinking Copyright © 2022 by Eric Haas is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Feedback/Errata

Leave a Reply

Your email address will not be published. Required fields are marked *