Application#
Medical Diagnosis#
This example is taken from Dive into Deep Learning, Section 2.6.5.
Assume that a doctor administers an HIV test to a patient. This test is fairly
accurate and it fails only with
In machine learning lingo, we can treat the diagnosis as a classifier. More
concretely, let
Let
To keep the notation similar to the original example, denote the first test as
Then we can define the following to describe the relationship between
Conditional probability |
||
---|---|---|
1 |
0.01 |
|
0 |
0.99 |
In machine learning, a similar way is to use a confusion matrix.
Indeed, the true positive in the above table is
Note that the column sums are all 1 (but the row sums don’t), since they are conditional probabilities.
For our first task, let’s compute the probability of the patient having HIV if the (first) test (classifier) comes back (predicts) positive,
Intuitively this is going to depend on how common the disease is, since it affects the number of false alarms.
We further assume the prior probability of the patient having HIV is
Then we can invoke Bayes’ theorem,
We already know
A point to note,
This leads us to
In other words, there is only a
What should a patient do upon receiving such terrifying news? Likely, the patient would ask the physician to administer another test to get clarity. The second test has different characteristics and it is not as good as the first one. You can think of it as using a second type of classifier (i.e. a different model) to predict the label of the patient.
Conditional probability |
||
---|---|---|
0.98 |
0.03 |
|
0.02 |
0.97 |
Unfortunately, the second test comes back positive, too. Now our question becomes, what is the probability that the patient has HIV given that both tests come back positive? Formally stated as below:
As usual, we represent (205) using Bayes’ theorem:
Now,
We now need to compute
Note that
Finally, we can compute the probability that the patient has HIV given that both tests come back positive.
That is, the second test allowed us to gain much higher confidence that not all is well. Despite the second test being considerably less accurate than the first one, it still significantly improved our estimate. The assumption of both tests being conditional independent of each other was crucial for our ability to generate a more accurate estimate. Take the extreme case where we run the same test twice. In this situation we would expect the same outcome in both times, hence no additional insight is gained from running the same test again. The astute reader might have noticed that the diagnosis behaved like a classifier hiding in plain sight where our ability to decide whether a patient is healthy increases as we obtain more features (test outcomes).
Simplified Probit Model#
This example is taken from [Chan, 2021], section 5.3.2, example 5.20.
Let
Suppose that
where
such that is a normal random variable with mean 0 and variance 1.Note that
and are independent by definition. This means that happening does not change the probability of happening. Ask why does not imply that is dependent on .Note that
and are not independent. Can we justify or our intuition is wrong.
To find
, we need the following:We first recall that to find the conditional probability, we need to find the conditional PDF
first, or more concretely, .We first note
.This is by definition of conditional probability. We can see that in section 5.3.1 and also in the chapter on conditional probability.
In particular, note that
, so it is indeed the numerator of the conditional probability.Overall, it is not clear how to find
, though we can find .We will leave finding the denominator for later.
Note that
is equivalent to integrating for all . But we will soon hit a wall when we try to find an expression form for this PDF, furthermore, we could make use of the fact that the marginal PDF of is given to solve this problem.Now we instead use Bayes to say that
which translates to finding the RHS of the equation. Note the numerator is a consequence of
, which is the definition of conditional probability. The denominator is the marginal probability of , which we will find later.
Note
is trivially equals to , since is a Bernoulli random variable with . Even though it is not mentioned explicitly, we can assume that is a Bernoulli random variable with since it does seem to fulfil the definition of a Bernoulli random variable provided it is independent trials.Now to find
, we need to find .To find the conditional distribution
, we first must be clear that this is a conditional PDF and not a probability yet, i.e. is found by integrating this PDF! We must also be clear that this probability is all about and therefore we will integrate over only instead of the usual double integral. Why? Because we are given , this means is fixed and there is nothing random about it, you can imagine in the 2D (3D) space PDF where the axis is fixed at 1, and we are integrating over the curve under with , i.e.Now the difficult question is what is
? We can find clues by looking at the equation . In laymen terms, means what is given ? So we can simplify to . We emphasise that this PDF is a function of only, and not . But this does not mean , which we will soon see.Next, by the definition of shifting (linear transformation), if
is a normal random variable of mean and , then shifting it by merely shifts the mean by and the variance remains the same [1]. This shows that is actually still a gaussian family, same as , but with a different mean and same variance.Therefore,
is a normal random variable with mean and variance , . With and , we have .Now we can find
by plugging in into the PDF of , which is . Note that this is a PDF, not a probability yet.To recover the probability, we must integrate over
.this is because
is equivalent to .We can now use the standard normal table to find the probability, which is
. See chan’s solution which is .Similarly, we can find
by plugging in into the PDF of , which is . We can then integrate over to find the probability, .
As of now, we have recovered
and , what is left is the denominator . By the law of total probability, we havewhich is
.Finally, we can now recover
by plugging in the values we have found.which is the same as the answer given in the question.
Last but not least, to find
, it is simply the complement of , which is .which is the same as the answer given in the question.