Bayesian statistics is an alternative to another form of statistics called frequentist statistics. Bayesian statistics is a powerful tool for situations in which frequentist methods fall short.

The main idea: Bayesian analysis lets us reason backwards from one conditional probability to its reverse conditional probability.

Sounds a bit weird but let me explain the idea. It is a very powerful idea and worth understanding. This post will be a bit longer than others. Before we begin we need to review the concept of conditional probability.

­ Conditional Probability

Consider there are two jars each containing red and white candies. Jar 1 has 60 white candies and 40 red candies. Jar 2 has 40 white candies and 60 red candies.

Let’s consider two probability questions:

1. The probability that a white candy will be drawn

There are 200 candies in total. There are 60 white in Jar 1 and 40 white in Jar 2. The probability of picking a white candy is 100/200 which is 50%

2. What is the probability of picking a white candy given that it was picked from Jar 1.

Question 2 is a conditional probability question. We are asked to determine the probability given another event has occurred. The word given helps us with the intuition that some condition is in place. It is written P(A | B). The pipe between A and B represents the word given and is read “the probability of A given B”.  For this question we would write P(White Candy  | Jar 1). The given condition tells us the probability space is reduced to just Jar 1. We know there are 100 candies in Jar 1, 60 of which are white. So the probability is 60%.

Reverse probability of question 2
What if we wanted to determine, what is the probability of a candy coming from Jar 1 given that it is white: P (Jar 1 | White Candy).  This is a reverse conditional probability. That is not as straightforward. But Bayes Theorem lets us get to the answer.

Conditional Probability and Reverse Conditional Probability in Practice

Candies are nice but not a real world situation. Let’s look at a question that could happen to one of us. Let’s say that you are not feeling well. You have a cough and a headache. You go to an internet medical site and find several illnesses that include a cough and headache as symptoms. After reading several articles, you are pretty sure you must have certain disease. Lets call it statiscitis. The article on statiscitis says that

  • 1% of people in the general population get statiscitis
  • The test for statiscitis is 80% accurate. In other words, 8 out of 10 tests correctly identify statiscitis.
  • 10% of tests incorrectly detect statiscitis when it is not present. This is the case that if one doesn’t have statiscitis, the test will be positive.

You go to the doctor and a test is recommended. You are given the test and it comes back positive. You are worried because you remember reading that the test is 80% accurate.

Question: What is the probability you have statisitis given you get a positive test?

Is it 80%? Many people think its 80%. But when we read this carefully we will see that this is a different question. We have some numbers from the article and let’s assign those to some variable names. Let Dpos be that you have the disease and Dneg be that you do not have the disease. Tpos is you get a positive test, and Tneg is you get a negative test. We know

  • P(Dpos) = 0.01
  • P(Tpos | Dpos) = 0.8
  • P (Tpos | Dneg) = 0.1

We know:  P(Tpos| Dpos) = 0.8          The probability of a positive test given you have the disease.

We want to know: P(  Dpos | Tpos)     The probability you have the disease given a positive test.

We don’t know the answer to question #2. But we can figure it out using Bayes Theorem

P(T_p_o_s  \mid D_p_o_s) = \frac{P(D_p_o_s \mid T_p_o_s) \, P(T_p_o_s)}{P(D_p_o_s)}   

Through some calculations I won’t show here, Bayes Theorem lets us get to this reverse conditional probability:

P(D_p_o_s  \mid T_p_o_s) = \frac{P(T_p_o_s \mid D_p_o_s) \, P(D_p_o_s)}{P(T_p_o_s)} 

P(Dpos | Tpos)     Is exactly what we want: the probability you have the disease given a positive test.

Let piece this together intuitively step by step.

Step 1 – Place the data we know into a 2×2 grid showing the truth (does one have Statiscitis or not) and the test result (is it positive or negative)

Statiscitis
Dpos

No Statiscitis
D neg

Total

Positive Test Tpos

80%

10%

 

Neg. Test Tneg

     

Total

1%

 

Step 2: Fill in the blanks that we know must add to 100%

numbers added to table

 

Statiscitis
Dpos

No Statiscitis
D neg

Total

Positive Test Tpos

80%

10%

 
Neg. Test Tneg

20%

90%

 
Total

1%

99%

100%

 

Step 3: Assume a population size, and convert column totals to numbers by multiplying the percentages.

numbers added to table

 

Statiscitis
Dpos

No Statiscitis
D neg

Total

Positive Test Tpos

80%

10%

 
Neg. Test Tneg

20%

90%

 

Total

1%
10 people

99%
990 people

100%
1000 people

Step 4:  Convert the percentages to number of people by multiplying the column total people by the percentages in the table

numbers added to table

 

Statiscitis
Dpos

No Statiscitis
D neg

Total

Positive Test Tpos

80%
8 people

10%
99 people

 
Neg. Test Tneg

20%
2 people

90%
891 people

 
Total

1%
10 people

99%
990 people

100%
1000 people

 

Step 5: Sum the row total number of people

numbers added to table

 

Statiscitis
Dpos

No Statiscitis
D neg

Total

Positive Test Tpos

80%
8 people

10%
99 people

107 people

Neg. Test Tneg

20%
2 people

90%
891 people

893 people

Total

1%
10 people

99%
990 people

100%
1000 people

 

Step 6: Convert the total column to percentages

numbers added to table

 

Statiscitis
Dpos

No Statiscitis
D neg

Total

Positive Test Tpos

80%
8 people

10%
99 people

10.7%
107 people

Neg. Test Tneg

20%
2 people

90%
891 people

89.3%
893 people

Total

1%
10

99%
990

100%
1000

 

Now we have what we need to answer the original question: What is the probability that you have the illness given a positive test. This is P(Dpos | Tpos)

We can now look to this table to see where this probability is stated. It is the intersection of the row Tpos and Dpos which is 8 people.

We divide 8 people out of the 107 people who got a positive test (this is the given condition). 8/103 = 7.8%. P(Dpos | Tpos) = 8/107 = 7.5%

Therefore, the probability of having the disease given a positive test is just 7.5%

Solving this with Bayes Theorem

Remember we stated that Bayes Theorem lets us take this equation

P(T_p_o_s  \mid D_p_o_s) = \frac{P(D_p_o_s \mid T_p_o_s) \, P(T_p_o_s)}{P(D_p_o_s)}   

and get to this equation:

P(D_p_o_s  \mid T_p_o_s) = \frac{P(T_p_o_s \mid D_p_o_s) \, P(D_p_o_s)}{P(T_p_o_s)} 

We now have all the parts we need

P(Tpos| Dpos) = 8 / 10 = 80%

P(Dpos) = 1%

P (Tpos) = 10.7%

We plug these numbers into

    \[P(D_p_o_s  \mid T_p_o_s) = \frac{P(T_p_o_s \mid D_p_o_s) \, P(D_p_o_s)}{P(T_p_o_s)}\]

 

P(Dpos | Tpos) = (0.8  * 0.01) / 0.107 = 7.5%

This is the same number we got above 8/107 = 7.5%

Leave a Reply