Suppose you're given a test to determine whether or not you have a rare, incurable, fatal disease. If you do, there's a 99% chance that the test will be positive, but if you don't there's a 99% chance the test will be negative. The test comes back positive. What's the chance you have the disease?
Many would instantly think that since the test is 99% accurate, whether you have the disease or not, the chance you've got it is 99%. But that need not be the case.
To understand why, suppose that only 1% of the population has the disease. Then we know ahead of time that you probably don't have it; this is prior information, something we know even before administering the test. Now suppose we take 10,000 randomly selected people, of whom 100 (1%) are infected while 9,900 (99%) are not. Of the 100 infected people, we expect 99 (99%) to get positive test results while only one (1%) gets a negative test result (a false negative).
Of the 9,900 uninfected, we expect 9,801 (99%) to get a negative test result while 99 (1%) get a positive result (a false positive). There are 99 + 99 = 198 total positive test results, but of those, only 99 are actually infected. So if your test result is positive, the chance you're infected isn't 99%, it's only 50%.
Don't plan your funeral yet.
This phenomenon, that what we know prior to making an observation can profoundly affect the implication of that observation, is an example of Bayes' theorem. It's a way to evaluate an observation in the context of prior information. For the disease testing example, it's crucial to apply Bayes' theorem; without that prior information we'd be led to a false (and very frightening) conclusion.
Bayes' theorem stems from the work of the Scottish mathematician (and Presbyterian minister) Rev. Thomas Bayes in the 18th century. Only recently has it been fully appreciated and applied to a wide array of statistical analyses. In fact, at present it's "all the rage" to use Bayesian analysis when analyzing data.
The older, more traditional approach is called "frequentist" statistics.
Not everyone agrees that the Bayesian approach is the best way to go because its application requires specifying the precise nature of the prior information, the context in which we'll apply the information supplied by our observation. Some would claim that the choice of "prior" more often reflects the opinion of the researcher than a realistic appraisal of circumstances, and that it should only be applied when the prior is known with a high degree of confidence (like the disease testing example). Others think that the power of Bayes' theorem is so great that doing statistical analysis the old-fashioned way is an invitation to the kind of misleading conclusion which the disease-testing example makes obvious.