Research Rudiments Part III: Randomized-Controlled Trials

The Gold Standard

Apr 14, 2024

In Parts I and II of the Research Rudiments Series, where we’ve been breaking down different types of research studies, we covered animal and observational research.

Here, we’ll cover what is largely considered to be the, “Gold Standard,” of scientific research: human randomized-controlled trials (RCTs).

What Makes A Human RCT?

Before we get into the pro’s and con’s let’s identify some specifics about human RCTs.

In addition to using humans rather than non-humans, there are at least three defining features that set human RCTs apart from other study designs: intervention, control, and randomization.

First, as opposed to observational studies, where the researchers collect data and identify associations–for example, in a group of 10-30 year-old’s, researchers might find that the older you are the more likely you are to consume alcohol–RCTs entail implementing an actual intervention and following what happens afterwards.

For example, in order to test whether consuming artificially-sweetened beverages increases hunger and/or calorie consumption, researchers might give individuals artificially-sweetened beverages and track how much food they eat in follow-on meals. In this case, giving the artificial sweeteners is the intervention.

Second, as the name implies, in randomized-controlled trials, researchers utilize a control group that doesn’t receive the intervention for comparison with the experimental group–the one receiving the intervention.

Continuing with our artificial sweetener example, the researchers might have the control group follow the same exact instructions and schedule as the experimental group, except the researchers wouldn’t give the subjects in the control group any artificially-sweetened beverages. Then, the researchers would track how much the control subjects ate during the meals and compare that to the experimental group’s consumption.

Since the only difference between the two groups was the artificially sweetened beverages–since the researchers controlled for all of the other variables, or kept them constant–they can conclude that the artificially-sweetened beverage consumption is the cause of any differences in outcomes.

Sometimes, to improve the control even further, researchers will implement some type of standard intervention group to discern whether simply doing something at all, rather than what that something was–in this case consuming the artificially-sweetened beverages–is responsible for any differences in outcomes.

In our artificially-sweetened beverage example, this might look like adding a third group that consumes a similar beverage to the artificially-sweetened beverage group but with actual sugar in it.

To top things off, in a situation like this, the researchers might even use a crossover design, in which the subjects are tested once with each option in random order: no beverage, artificially-sweetened beverage, and sugar-sweetened beverage. (I)

Together, these steps provide a high degree of control, allowing the researchers to attribute any differences in results to the variable of interest–in this case, the artificially-sweetened beverages.

Thirdly, again, as the name implies, randomized-controlled trials involve randomly selecting which subjects go in each group. Commonly, the researchers will first collect baseline information from the subjects, including both demographic data and metrics that are prevalent for the given study (for example, blood pressure, BMI, etc.). Then they will randomly sort subjects into the different study groups in a manner that balances these baseline factors.

In other words, in order to account for those baseline factors, the researchers will match the groups, often using a computer generator , so that each group has statistically similar baseline characteristics: each group has a similar percentage of males and females, similar average age, similar average blood pressure, etc.. (II, III)

“In RCTs the patients are randomly assigned to the different study groups. This is intended to ensure that all potential confounding factors are divided equally among the groups that will later be compared (structural equivalence). These factors are characteristics that may affect the patients’ response to treatment, e.g., weight, age, and sex. Only if the groups are structurally equivalent can any differences in the results be attributed to a treatment effect rather than the influence of confounders.” (III)

Notably, when possible, in order to reduce bias, both the researchers and subjects will be, “blind,” to this randomization, meaning that they won’t know which group each subject is in. (III)

In a double-blind RCT on a given medication, for example, neither the subject nor the researchers will know who is getting the actual medication pill and who is getting a placebo pill with nothing active in it. (III)

Single-blind designs, on the other hand, consist of either the researchers or the subjects not knowing which group is receiving the actual intervention. (III)

Pro’s

If we combine all three of the defining features above–intervention, control, and randomization–we end up with what is probably the greatest advantage of using RCTs: the ability to identify causation.

This is a vital piece of the research puzzle that cannot be filled by non-human models due to a lack of specificity (i.e. just because it happens in rodents doesn’t mean it’ll happen in humans) nor by human observational studies due to lack of control (i.e. we don’t know if confounding variables are behind any differences in outcomes; i.e correlation doesn’t imply causation).

If flawed in their design–such as not controlling for a variable because the researchers aren’t aware at the time of the study that a given variable exists or influences the outcome in question–RCTs can still mess up causation; however, due to their precise implementation of a specific intervention, control of confounding variables, and randomization and equalization of baseline characteristics between groups, RCTs are largely the best tool we have for discerning whether, “X,” causes, “Y,” to occur (if we’re getting technical, I suppose you could argue that meta-analyses of RCTs, when done properly, could be the best tool we have, but that’s a topic for another post).

In addition, RCTs offer the benefit of accurate and precise measurement. Whereas, in many observational studies accuracy and precision are questionable due to much of the data coming from surveys, measurements are taken in real-time and with great care in RCTs.

Consider how accurately and precisely you can answer the question, “How many times per week did you consume dairy products in the past year?” and you’ll understand my point about the potential lack of accuracy and precision of some observational research.

For example, while you may be relying on the subjects’ abilities to correctly recall how much and what subjects ate in the past week in an observational study, researchers can directly record exactly how much and what subjects ate in an RCT setting.

So, RCTs provide greater control, accuracy, and—maybe most importantly—ability to assign causation, but what do they give up in return?

Con’s

From a practical standpoint, RCTs have the drawbacks that they’re expensive and difficult to run. (II)

In order to achieve the control I described above, researchers running an RCT have to spend more time both conceptualizing and executing the study design than they probably would if they were sending out surveys for an observational study.

And, cost comes into play when you consider all of the materials required for the interventions (think supplements, food/drinks, medications, etc.), as well as the testing required for the results–for example, some studies involve MRI, CT scans, DEXA scans, etc..

Additionally, one major limitation of RCTs is that they typically involve a small group of subjects.

Since achieving a high degree of control means getting subjects to follow stringent guidelines–like following a certain diet, refraining from eating certain foods, following a specific exercise routine, etc.–most RCTs have considerably smaller sample sizes than observational studies that, again, sometimes only require subjects to fill out a survey once every few months (think tens to hundreds of subjects vs. thousands to hundreds of thousands of subjects).

Another limitation of RCTs stems from ecological validity, or how applicable the study’s results are to real-life settings.

Since RCTs tend to involve following strict and specific protocols, they often don’t reflect exactly how we operate in the real world. For this reason, it’s questionable whether the results from said RCTs will translate when we apply them in real-world settings.

For example, will results from a study where subjects performed only calf-raises for 12 weeks apply to people who are training the rest of their body in addition to their calves in real life?

Or, for example, though subjects were able to reduce their calorie intake using artificial sweeteners in a study, will free-living subjects be able to do the same if they don’t have researchers checking in on them once per week to make sure they’re adhering to their diet plan?

“Although RCTs are the gold standard with regard to level of evidence, their generalizability, i.e., the extent to which their results can be extrapolated to the wider patient population (external validity) is often questioned, because standardized and controlled study conditions do not adequately reflect clinical reality. Moreover, the patients selected for a study are not necessarily representative, in that those seen in routine daily practice will often have numerous comorbidities and comedications.”(III)

One more limitation of RCTs is generalizability, or whether or not the study results can be applied broadly across the population.

Since, as we described above, RCTs tend to use small sample sizes, their samples of subjects tend to be less diverse than samples in observational studies that include thousands to hundreds of thousands of individuals.

Also, many RCT authors intentionally limit their samples’ diversity as a means of increasing control. For example, they may complete a study solely in males or females, or only on subjects with or without Type 2 diabetes or another medical condition—this removes more variables that could confound the results.

“RCTs can have their drawbacks, including their high cost in terms of time and money, problems with generalisability (participants that volunteer to participate might not be representative of the population being studied) and loss to follow up.” (II)

One way or another, since many RCTs involve smaller and less diverse sample sizes, their results tend to be less generalizable than larger, observational studies.

Conclusions

So, RCTs are largely considered the, “Gold Standard,” of research due to their control, accuracy, and precision, all of which enable them to assign causation or lack thereof. (II, III)

However, it is that same degree of control that breeds drawbacks of RCTs, such as cost, difficult execution, and lack of ecological validity and generalizability.

One way to account for some of these flaws is by combining RCT data in meta-analyses—our next topic in this Research Rudiments Series.

Shortcut U

Discussion about this post