Psychology

Experimental and Control Groups Explained



Tweet
Algy Moncrieff's image for:
"Experimental and Control Groups Explained"
Caption: 
Location: 
Image by: 
©  

Dividing people (or other test subjects) into experimental and control groups is one of the most effective methods of proving causality, and the search for causality is the key issue in much of statistics.  The thing is, as statisticians will tell you over and over again, correlation does not imply causality.  In other words, if everyone who took a drug became less healthy than everyone who did not take it, this produces correlation but it cannot be proved that it was the drug that caused the deterioration.  For example, what if the people who took the drug were less healthy anyway and would have deteriorated whether they took the drug or not?

There are a few tricks that statisticians have of getting around this problem, and one such method is the use of randomised control trials.  The idea behind this is that researchers want to eliminate 'selection bias', that is, people deciding whether to have treatment or not based on choice, not randomness.  However, if the drug was not only given to the very ill, but to a random selection of healthy and unhealthy people, then if there is a change in their health this can be ascribed to the drug.  Allow me to explain.

A randomised control trial works as follows.  A large number of people are randomly allocated to one of two groups: the experimental (or treatment) group and the control group.  The groups can be of any size and will be unlikely to be the same size, as everyone should be allocated according to probability, rather than putting half in one group and half in another.  The experimental group is given the treatment, the control group is given a placebo.  Because of this, nobody knows which group they are in, which further eliminates any psychological effects that taking a drug or placebo might have.

Because the element of choice has been eliminated, any observed effect is now free from selection bias, and therefore researchers can infer causality if they are certain that the trial was indeed random.  This is because there is no reason for their selection of people except random probability.  The only thing these people have in common is the taking of the drug.  Thus, any change in their health must be down to the drug.  This evidence should be supported by the lack of a similar response in the control group.  If the control group responds as well, then something has gone wrong.

Researchers must be certain of three things for the trial to work: firstly subjects must not interact with one another.  Second, they must not swap treatments.  And thirdly they must take the treatment they are assigned.  If any of these three do no hold, the trial will be affected.

Mathematically the above process works as follows:

Researchers want to find out one thing in particular, which is the Average Effect of Treatment on the Treated (ATT).  This measures the effect of treatment, conditional on a person receiving treatment.

Unfortunately, this requires the collection of data that cannot be measured, known as counterfactuals.  The researcher needs to know what the outcome would have been for a person who received treatment if they hadn't received treatment, a data point which of course does not exist.

Now:

ATT = E[Yi (1)| Di=1] - E[Yi (0)| Di=1]

Yi is the outcome (1 for treated, 0 for not treated), and Di is the assignment to treatment or lack thereof (again 1 and 0 respectively). Thus, the observed effect of treatment minus the counterfactual is the ATT.  Also:

Selection bias = E[Yi (0)| Di=1] - E[Yi (0)| Di=0]

The researcher also knows that:

Observed Difference in Averages = ATT + Selection Bias

The ODA is simply the difference between the treatment outcome and the non-treatment outcome, which is what the data tells us.  The problem is the right hand side of the equation; we want to know the ATT in order to infer causality, but both parts involve counterfactuals.

Now, this is the clever bit.  Random assignment implies that:

E[Yi (0)| Di=1] = E[Yi (0)| Di=0]

This is because the outcome is independent of the assignment, since assignment to a group is random.  That means that the conditional probabilities above are equal to the non-conditional and therefore the two terms are equivalent.  If you look back to the equation for selection bias, you can see that it is now zero since it is one term minus itself.  This means that:

Observed Difference in Averages = ATT

And thus the ODA, which the researcher can measure, is equal to the causal effect, which is what they are looking for.

This is a pleasing result, and a very important one for all sorts of situations where researchers want to find causality, including the field of psychology.  But the system is not without its limitations.  Firstly the problems set out above, of non-compliance and contamination, can be hard to prevent.  And perhaps more importantly, it is not always possible to do a randomised control trial. Econometricians for example struggle to infer causality within an economy in this way, since it is usually highly impractical to assign the economies of different countries to different control groups.

Nevertheless, randomised control trials are an important tool in a statistician's toolkit.

Tweet
More about this author: Algy Moncrieff

From Around the Web




ARTICLE SOURCES AND CITATIONS
  • InfoBoxCallToAction ActionArrowhttp://www.bmj.com/content/316/7126/201.full