Probability & Statistics

Reading “The Undoing Project” by Michael Lewis at the moment about the relationship and work of Daniel Kahneman and Amos Tversky. There are some examples given of how knowing the underlying probability or base rate of an event and given some initial sampling you can forecast the likely result for your total sample. Bayes Theorem is one tool to look at this and it yields some unexpected results that can differ substantially from the underlying probability due to the small sample size. This seems interesting and important as a lot of sampling is limited and so these outcomes are everywhere. How probabilities can be updated as new information comes in is also interesting. I am pretty sure I have learned this before but I think I will look at this again to understand it's full significance.

Looking again into the world of chance. Below is a work in progress…

Expressing Likelihood: Odds & Probability

Probabilities of an event can be stated as percentages or as a decimal with values between $0$ (won't happen) & $1$ or $100\%$ (certain to happen). Another way of expressing such probabilities is the odds for and against an event. Odds reflect the ratio of an event occurring or not $$Occurance: Non\mbox{-}Occurance$$ where the event will be clearly decided, i.e. one of the options must occur $$Occurance + Non\mbox{-}Occurance=1$$

For example 9 to 1 on - states the event happening is $9$ times more likely to happen than not to happen $1$. This corresponds to a probability of $90\%$.

To convert from probability to odds

$$ Occurrence \mbox{ to } (1 - Occurance) = Non\mbox{-}Occurance$$ So if the probability is 10% or 0.10, then the odds are $0.1$ to $(1-0,1) = 0.9$ or ‘1 to 9’ or '1/9' or '1:9' on.

To convert from odds to a probability

$$\frac{Odds \mbox{ } on}{Odds \mbox{ } on + Odds \mbox{ }against}$$ So to convert odds of 1/9 to a probability, using above relation 1/(1+9) to obtain the probability of 0.10 or 10%.

Betting odds

Fractional odds are using in UK/Ireland in the form of

  • Profit/Stake.

Example 7/4 - means you will achieve 7€ profit for 4€ stake (Total of 11€ (7+4) returned upon winning). These odds are always given in integer form, however not always with lowest denominator to facilitate comparison between different odds e.g. 3/2 given as 6/4 to allow easy comparison with 9/4.

Betting odds include the profit to the bookmaker in addition to the underlying probability.

Decimal odds Total Return (profit+Stake) :Stake. In fractional even odds 1/1 becomes 2.


Combining Events & Conditional Probability

If event $A$ and event $B$ are independent, then $$P(A \mbox{ and } B) = P(A \cap B) = P(A) \cdot P(B) \label{indepprob}\tag{1}$$ Bayes equation is only relevant in situations when the events are dependent, i.e. conditional probability. If $A$ happening is dependent on $B$ happening first, then $P(A | B)$ is the (conditional) probability that you'll see $A$ if you're already seeing $B$. e.g. The probability of getting wet depends on whether or not it's raining. $$ P(A \cap B) = P(A|B) \cdot P(B) \label{condprob}\tag{2}$$ Consider the chances of getting sick when there is a contagion in the local population. If half the people are sick and the probability of you catching the disease if in contact with a sick person is $20\%$, then the probability of a random person having the disease is: $$P(B) = 0.5$$ and the probability of you catching it from them once in contact with them is: $$P(A|B) = 0.2$$ then the probability of you getting sick using equation ($\ref{condprob})$ is: $$P(A \cap B)= 0.5 \times 0.2 = 0.1 = 10\%$$

Also note, if the sick people are either not contagious then $ P(A|B) =0 $ or you do not come into contact with sick people $ P(B) =0 $ then it follows that $ P(A \cap B) = 0 $ and there is no chance you get sick!

Permutations & Combinations

Picking $k$ items from a total of $n$ options, how many variations are there? The different ways of distributing k successes in a sequence of n trials.

Permutations - order does matter $$ P(n,k) = \frac{n\,!}{(n-k)\,!} $$

Combinations - order doesn't matter: $$ C(n,k) = \binom{n}{k} = \frac{n\,!}{(n-k)\,! k\,!} $$

This is also known as the binomial coefficient and the terms for different $n$ & $k$ can be seen graphically in Pascal's_triangle where the rows are numbered $n=0,1,2,3..$ and $k$ increasing left to right in each row $(0,1,2,3,\ldots)$

Bernoulli series, Binomial & Gaussian distribution

Building on the ideas of combinations, you can look at processes that have either single trials with multiple outcomes or multiple trials with a limited number of possible outcomes. An example of the latter, where the outcomes of each trial is binary (yes or no, 0 or 1) is known a Bernoulli process. The with general form or this is given be the Binomial_distribution.

A coin flip is an example of a process with only 2 outcomes (heads or tails) and so is a Bernoulli process. A series of n flips or Bernouli trials and the probability of any particular outcome for a coin can be calculated using Binomial (fair or unfair) can be calculated using the binomial distibution formula

What's the probability of getting 3 heads and 7 tails if one flips a fair coin 10 times.

You do $n=10$ trials. The probability of success in any given trial is $p=\frac{1}{2}$. You want k=3 successes (heads) and $n-k=7$ failures (tails). The probability is:

$$ P(n,k,p) = \binom{n}{k} p^k (1-p)^{n-k} \\ \binom{10}{3} \cdot \bigg ( \frac{1}{2} \bigg )^3 \cdot \bigg ( \frac{1}{2} \bigg )^7 \\ 120 \cdot \bigg ( \frac{1}{8} \bigg ) \cdot \bigg ( \frac{1}{128} \bigg ) = \bigg ( \frac{15}{128} \bigg ) = 0.117 = 11.7\% \\ $$

You want $k$ successes (probability: p^k) and n−k failures (probability: (1−p)n−k). The successes can occur anywhere in the trials, and there are (nk) to arrange k successes in n trials.

The binomial distribution $P(n, k, p)$ is approximately normal or Gaussian with mean $np$ and variance $np(1 - p)$ for large $n$ and for $p$ not too close to zero or one.

$$ f(x| \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{- \frac{(x-\mu)^2}{2 \sigma^2}} $$


  • $\mu$ is mean or expectation of the distribution (and also its median and mode).
  • $\sigma$ is standard deviation
  • $\sigma^{2}$ is variance

For $\mu=5$ & $\sigma=1$, this is a Gaussian Distibution:

Bayesian Vs Frequentist

If the fairness of the coin is not known, then the probability for heads (success) $p$ is unknown. From from a fixed set of trials the fairness of the coin could be determined to a certain confidence level. The larger the sample the better the confidence level. The probability can be updated after each successive trial.

There is then a Frequentist approach which does not take prior experience into account or a Bayesian approach which does take prior experience into account in the form of a prior likelihood function.

updating probabilities as new information comes in..

Consider a family planning to have 5 children. What is the probability of them all being girls? These are 5 independent Bernouli trials with a $50\%$ probability for each trial. Expanding on the equation ($\ref{condprob})$ for independent trials then: $$P_{All Girls} = 0,5^5 = 3.1\% $$ While if you already have $4$ children who are all girls… then the chances of having $5$ girls become $50\%$. With the additional information provided by the birth of the 1st $4$ children the probabilies of having $5$ girls rises considerably. New information changes the probabilities. This in some ways sounds obvious but is the basis of many statistics based problems - e.g. The Monty Hall Problem

Bayes' Theorem

$$ P(H|E) = \frac{ P(E|H) \cdot P(H) } { P(E) } $$

$H$ stands for any hypothesis whose probability may be affected by new evidence $E$ which corresponds to new data that were not used in computing the prior probability. $P(H)$

  • $P(H∣E)$ , the posterior probability, is the probability of $H$ given $E$ i.e., after E is observed. This is what we want to know: the probability of a hypothesis given the observed evidence.
  • $P(E∣H)$ is the probability of observing $E$ given $H$. As a function of $E$ with $H$ fixed, this is the likelihood – it indicates the compatibility of the evidence with the given hypothesis. The likelihood function is a function of the evidence, E while the posterior probability is a function of the hypothesis, H.
  • $P(E)$ is sometimes termed the marginal likelihood or “model evidence”. This factor is the same for all possible hypotheses being considered (as is evident from the fact that the hypothesis H does not appear anywhere in the symbol, unlike for all the other factors), so this factor does not enter into determining the relative probabilities of different hypotheses.

The factor $\frac{P(E|H)}{P(E)}$ can be interpreted as the impact of the evidence $E$ on the hypothesis $H$. Often there are competing hypotheses, and the task is to determine which is the most probable.

Bayes' Theorem - Examples

From “Thinking Fast & Slow” Chapter 6 - example given without good explanation. Happily, I found an explanation here.

Restate the problem: There is a city with 85\% green cabs & 15\% blue cabs and there is an accident where a witness identifies that the cab involved was blue. We are told that witness reliability in these cases is 80\%. What is probability that car was blue?

The example is posed to show the effect of ignoring the baseline probability. If there was no witness - then the chance of the care being blue would be 15\%. However with the witness involved, the temptation is to ignore this baseline and focus on the 80\% reliability - which is needless to say a mistake.

Formulate the question:
Hypothess $H$: Cab is Blue given the evidence $E$ that the witness reported this. What is the probability of the cab being blue given that the witness identified (ID) it as blue (B) - $P(B∣IDB)$

$$ P(B|IDB) = \frac{ P(IDB|B) \cdot P(B) } { P(IDB) } $$

Breaking this down - $$ P(B) = \mbox{Probability of the car being blue} = 0,15 $$ $$ P(IDB|B) = \mbox{Probability that the cab would be ID'd as blue if the car was blue} = 0,8 $$

Then are left with the last term $P(IDB)$. We know that the events B and G are exclusive and cover all possibilities for the cab (the cab is Blue or Green), then:

$$ P(IDB) = P(IDB∣B) \cdot P(B) + P(IDB∣G)\cdot P(G) \\ = 0.8 \cdot 0.15 + 0.2 \cdot 0.85 = 0.12 + 0.17 = 0.29 $$ $$P(IDB|B) = \frac {0,8 * 0,15} {0,29} = 0,41$$

What happens in the case that you have more than one witness.


  • $P(A)$, the prior, is the initial degree of belief in A.
  • $P(A|B)$, the “posterior,” is the degree of belief having accounted for $B$.
  • the quotient $P(B|A) / P(B)$ represents the support B provides for A.
  • $P(A)$ and $P(B)$ are the probabilities of observing $A$ and $B$ without regard to each other.
  • $P(A | B)$, a conditional probability, is the probability of observing event $A$ given that $B$ is true.
  • $P(B | A)$ is the probability of observing event $B$ given that $A$ is true.

Suppose a drug test is 99% sensitive and 99% specific. That is, the test will produce 99% true positive results for drug users and 99% true negative results for non-drug users. Suppose that 0.5% of people are users of the drug. If a randomly selected individual tests positive, what is the probability that he is a user?

  • 1000 individuals are tested,

⇒ there are expected to be 995 non-users and 5 users. ⇒ From the 995 non-users, 0.01 × 995 ≃ 10 false positives are expected. From the 5 users, 0.99 × 5 ≈ 5 true positives are expected. ⇒ Out of 15 positive results, only 5, about 33%, are genuine.

blog/bayesian_statistics.txt · Last modified: 2019/09/09 23:42 (external edit)
Recent changes RSS feed Creative Commons License Donate Minima Template by Wikidesign Driven by DokuWiki