Concept#

/home/runner/work/omniverse/omniverse/omnivault/utils/reproducibility/seed.py:120: UserWarning: Deterministic mode is activated. This will negatively impact performance and may cause increase in CUDA memory footprint.
  configure_deterministic_mode()

PMF and CDF of Bernoulli Distribution#

Definition 89 (Bernoulli Trials)

A Bernoulli trial is an experiment with two possible outcomes: success or failure, often denoted as 1 or 0 respectively.

The three assumptions for Bernoulli trials are:

  1. Each trial has two possible outcomes: 1 or 0 (success of failure);

  2. The probability of success (\(p\)) is constant for each trial and so is the failure (\(1-p\));

  3. Each trial is independent; The outcome of previous trials has no influence on any subsequent trials.

See more here.

Definition 90 (Bernoulli Distribution (PMF))

Let \(X\) be a Bernoulli random variable with parameter \(p\). Then the probability mass function (PMF) of \(X\) is given by

\[\begin{split} \begin{align} \P(X=x) = \begin{cases} p &\quad \text{ if } x=1 \\ 1-p &\quad \text{ if } x=0 \\ 0 &\quad \text{ otherwise } \end{cases} \end{align} \end{split}\]

where \(0 \leq p \leq 1\) is called the Bernoulli parameter.

A Bernoulli distribution is a Bernoulli trial.

Some conventions:

  1. We denote \(X \sim \bern(p)\) if \(X\) follows a Bernoulli distribution with parameter \(p\).

  2. The states of \(X\) are \(x \in \{0,1\}\). This means \(X\) only has two (binary) states, 0 and 1.

  3. We denote \(1\) as success and \(0\) as failure and consequently \(p\) as the probability of success and \(1-p\) as the probability of failure.

  4. Bear in mind that \(X\) is defined over \(\pspace\), and when we say \(\P \lsq X=x \rsq\), we are also saying \(\P \lsq E \rsq\) where \(E \in \E\). Imagine a coin toss, \(E\) is the event that the coin lands on heads, which translates to \(E = \{X=1\}\).

  5. Note further that a Bernoulli Trial is a single experiment with only two possible outcomes. This will be the main difference when we learn Binomial distribution (i.e. sampling 1 guy vs sampling n guys).

Definition 91 (Bernoulli Distribution (CDF))

Let \(X\) be a Bernoulli random variable with parameter \(p\). Then the cumulative distribution function (CDF) of \(X\) is given by

\[\begin{split} \begin{align} \cdf(x) = \begin{cases} 0 &\quad \text{ if } x < 0 \\ 1-p &\quad \text{ if } 0 \leq x < 1 \\ 1 &\quad \text{ if } x \geq 1 \end{cases} \end{align} \end{split}\]

where \(0 \leq p \leq 1\) is called the Bernoulli parameter.

Plotting PMF and CDF of Bernoulli Distribution#

The PMF and CDF plots are shown below.

Hide code cell source
1from omnivault.utils.probability_theory.plot import plot_bernoulli_pmf
2
3_fig, axes = plt.subplots(1,2, figsize=(8.4, 4.8), sharey=True, dpi=125)
4plot_bernoulli_pmf(p=0.2, ax=axes[0])
5plot_bernoulli_pmf(p=0.8, ax=axes[1])
6plt.show()
../../../_images/4bf897259cb4ee77ea3ecd49e619d39d1e1c635e51bb6aaa623d1aae7f41033f.svg
Hide code cell source
 1from omnivault.utils.probability_theory.plot import plot_bernoulli_pmf, plot_empirical_bernoulli
 2
 3fig, axes = plt.subplots(1, 2, figsize=(8.4, 4.8), sharey=True, dpi=125)
 4plot_bernoulli_pmf(p=0.2, ax=axes[0])
 5plot_empirical_bernoulli(p=0.2, size=100, ax=axes[0])
 6
 7plot_bernoulli_pmf(p=0.2, ax=axes[1])
 8plot_empirical_bernoulli(p=0.2, size=1000, ax=axes[1])
 9
10fig.supylabel("relative frequency")
11fig.suptitle("Histogram of Bernoulli($p=0.2$) based on $100$ and $1000$ samples.")
12plt.show()
../../../_images/3de6657eb63a152733f7113f34e90646d0a65f6a44a1420d12d0faf83984546d.svg

Assumptions#

The three assumptions for Bernoulli trials are:

  1. Each trial has two possible outcomes: 1 or 0 (success of failure);

  2. The probability of success (\(p\)) is constant for each trial and so is the failure (\(1-p\));

  3. Each trial is independent; The outcome of previous trials has no influence on any subsequent trials.

Expectation and Variance#

Property 9 (Expectation of Bernoulli Distribution)

Let \(X \sim \bern(p)\) be a Bernoulli random variable with parameter \(p\). Then the expectation of \(X\) is given by

\[ \begin{align} \expectation \lsq X \rsq = p \end{align} \]

Proof. The proof is as follows

\[ \sum_{x \in X(\S)} x \cdot \P(X=x) = 1 \cdot p + 0 \cdot (1-p) = p \]

Property 10 (Variance of Bernoulli Distribution)

Let \(X \sim \bern(p)\) be a Bernoulli random variable with parameter \(p\). Then the variance of \(X\) is given by

\[ \begin{align} \var \lsq X \rsq = p(1-p) \end{align} \]

Proof. The proof is as follows

\[ \begin{align} \var \lsq X \rsq = \sum_{x \in X(\S)} (x - \expectation \lsq X \rsq)^2 \cdot \P(X=x) = (1 - p)^2 \cdot p + (0 - p)^2 \cdot (1-p) = p(1-p) \end{align} \]

It can also be shown using the second moment of \(X\):

\[ \begin{align} \var \lsq X \rsq = \expectation \lsq X^2 \rsq - \expectation \lsq X \rsq^2 = \expectation \lsq X^2 \rsq - p^2 = p(1-p) \end{align} \]

Maximum Variance#

Minimum and Maximum Variance of Coin Toss#

This example is taken from [Chan, 2021], page 140.

Consider a coin toss, following a Bernoulli distribution. Define \(X \sim \bern(p)\).

If we toss the coin \(n\) times, then we ask ourselves what is the minimum and maximum variance of the coin toss.

Recall in Definition 84 that the variance is basically how much the data deviates from the mean.

If the coin is biased at \(p=1\), then the variance is \(0\) because the coin always lands on heads. The intuition is that the coin is “deterministic”, and hence there is no variance at all. If the coin is biased at \(p=0.9\), then there is a little variance, because the coin will consistently land on heads \(90\%\) of the time. If the coin is biased at \(p=0.5\), then there is a lot of variance, because the coin is fair and has a 50-50 chance of landing on heads or tails. Though fair, the variance is maximum here.

References and Further Readings#

  • Chan, Stanley H. “Chapter 3.5.1. Bernoulli random variable.” In Introduction to Probability for Data Science, 137-142. Ann Arbor, Michigan: Michigan Publishing Services, 2021.