Independence#
Definition (Independent Events)#
(Independent Events)
Let \(\P\) be a probability function defined over the probability space \(\pspace\).
Two events \(A, B \in \E\) are statiscally independent if
Intuition (Independence)#
The formula above is not at all intuitive.
A more intuitive way to think about it is to think that two events \(A\) and \(B\) are independent if the occurrence of \(B\) does not affect the probability of the occurrence of \(A\). This can be illustrated further with the following narrative.
Let us assume the scenario: given that \(B\) occurred (with or without event \(A\) occurring), what is the probability of event \(A\) occurring? In other words, we are now in the universe in which \(B\) occurred - which is the full right circle. In that right hand side circle (\(B\)), the probability of \(A\) is the area of \(A\) intersect \(B\) divided by the area of the circle - or in other words, the probability of \(A\) is the number of outcomes of \(A\) in the right circle (which is \(n(A \cap B)\), over the number of outcomes of the reduced sample space \(B\). Therefore, if we think of independence as event \(B\) occuring not affecting event \(A\) occurring, then it means that the probability of \(A\) occurring is still the probability of \(A\) occurring. i.e \(\P(A|B) = \P(A)\)
It follows immediately that
So the intuition can be understood by using conditional, say \(\P(A ~|~ B)\), if \(A\) and \(B\) are truly independent, then even if \(B\) happened, the probability of \(A\) should remain unchanged, which means that \(\P(A ~|~ B) = \P(A)\), but also recall the definition of conditionals, \(\P(A ~|~ B) = \dfrac{\P(A \cap B)}{B}\), so equating them we have the nice equation of \(\P(A)\P(B) = \P(A \cap B)\).
See chapter 2, section 2.4.2 of [Chan, 2021] for more details.
Defining Independence in Terms of Conditional Probability#
We formally state the intuition in the previous section as follows.
(Definition of Independence in Terms of Conditional Probability)
Let \(\P\) be a probability function defined over the probability space \(\pspace\).
Let \(A\) and \(B\) be two events in \(\E\) such that \(\P(A) > 0\) and \(\P(B) > 0\), then \(A\) and \(B\) are independent if
Disjoint vs Independence#
(Disjoint)
Let \(\P\) be a probability function defined over the probability space \(\pspace\).
Two events \(A\) and \(B\) are disjoint if \(A \cap B = \emptyset\).
(Disjoint vs Independence)
Let \(\P\) be a probability function defined over the probability space \(\pspace\).
Given two events \(A\) and \(B\) in \(\E\). The only condition when \(\textbf{Disjoint} \iff \textbf{Independence}\) is if \(\P(A) = 0\) or \(\P(B) = 0\).
Exercise (Independence)#
In [Chan, 2021], chapter 2, section 2.4.2, the author gave an example that is not easy to visualize.
Consider the experiment of throwing a die twice. One should be clear from the context that the outcomes are in the form of a tuple \((\textbf{dice_1}, \textbf{dice_2})\) and the sample space is:
Define the three events below:
We want to find out if events \(A\) and \(B\) are independent? How about \(A\) and \(C\)?
We focus on the independence of \(A\) and \(C\) first. The author said that intuitively, given that event \(C\) has happened, will this affect the probability of \(A\) happening? I assume that this means we do have to know the probability of event \(A\) without \(C\) first.
We can enumerate and see that event \(A\) has the following set representation:
which amounts to \(\P(A) = \frac{6}{36} = \frac{1}{6}\). Now if \(C\) happened, we know that the two rolls have a sum of \(8\), and we cannot construct a sum of \(8\) with a roll of \(1\). To me, I immediately know that event \(A\) cannot have the outcome that has a \(1\) in the second roll, and thus the outcomes should only be limited to \(5\) instead of \(6\) and hence dependence is established.
I believe somewhere my intuition is flawed, the author mentioned that:
If you like a more intuitive argument, you can imagine that C has happened, i.e., the sum is 8. Then the probability for the first die to be 1 is 0 because there is no way to construct 8 when the first die is 1. As a result, we have eliminated one choice for the first die, leaving only five options. Therefore, since C has influenced the probability of A, they are dependent.
I think I cannot understand why the author mentioned about “first die” when in event \(A\), the first die is already a \(3\).
One can find more explanation here [1] and here [2].