Laws

Law of total expectation (Adam’s Law)

\begin{equation} \mathbb{E}[X] = \mathbb{E}\big[\mathbb{E}[X|N]\big] = \mathbb{E}[\mu N] = \mu\mathbb{E}[N] \end{equation}

e.g.

  • $X$ is the total amount of money all customers spend in a shop in a day
  • $N$ is the number of customers visited that shop
  • $\mu$ is the mean amount of money a customer spends

Law of total variance (Eve’s law)

Inequalities

Cauchy-Schwarz inequality

\begin{equation} \mathbb{E}[XY] \le \sqrt{\mathbb{E}[X^2]\mathbb{E}[Y^2]} \end{equation}

Jensen’s inequality

If $f$ is a convex function

\begin{equation} f\big(\mathbb{E}[X]\big) \le \mathbb{E}[f(x)] \end{equation}

Markov’s inequality

Suppose $X$ is a non-negative random variable, $f$ is the pdf and $a > 0$, then

\[\begin{equation} \mathbb{P}\big(X > a\big) \le \frac{\mathbb{E}[X]}{a} \end{equation}\]

Proof:

\[\begin{align} \mathbb{E}[X] &= \int_0^\infty x f(x) dx \\ &= \int_0^a x f(x) dx + \int_a^\infty x f(x) dx \label{eq:markov_before_inequality} \\ &\ge 0 + \int_a^\infty a f(x) dx \label{eq:markov_to_inequality} \\ &\ge a \int_a^\infty f(x) dx \\ &= a \mathbb{P}(X > a) \\ \mathbb{P}(X > a) &\le \frac{\mathbb{E}[X]}{a} \end{align}\]

Note, from Eq. $\eqref{eq:markov_before_inequality}$ to $\eqref{eq:markov_to_inequality}$, we just used the simple facts:

  • $\int_0^a x f(x) dx \ge 0$ and
  • $x \ge a$ in the integral $\int_a^\infty x f(x) dx$.

The reson that $X$ needs to be non-negative is that otherwise the first integral would become $\int_{-\infty}^{a} x f(x) dx$, which isn’t necessarily $\ge 0$.

Chebyshev’s inequality (specific version)

\begin{equation} \mathbb{P}\left( \left|X - \mu \right| \gt a \right) \le \frac{\mathbb{V}[X]}{a^2} \end{equation}

where $\mu=\mathbb{E}[X]$, and $a \gt 0$.

Proof:

We can derive Chebyshev’s in equality from applying the Markov’s inequality:

\[\begin{align} \frac{\mathbb{V}[X]}{a^2} &=\frac{\mathbb{E}\left[ (X - \mu)^2 \right ]}{a^2} \label{eq:chebyshev_def_var} \\ &\ge \mathbb{P}\left( (X - \mu)^2 > a^2 \right ) \label{eq:chebyshev_apply_markov} \\ &= \mathbb{P}(|X - \mu| > a ) \end{align}\]

Note,

  • Eq. $\eqref{eq:chebyshev_def_var}$ is just the definition of variance.
  • Eq. $\eqref{eq:chebyshev_apply_markov}$ is the application of Markov’s inequality with $(X - \mu)^2$ as the random variable.

Chebyshev’s inequality (general version)

\[\begin{align*} \mathbb{P}(g(X) \ge r) \le \frac{\mathbb{E}[g(X)]}{r} \end{align*}\]

where $g(X)$ is a non-negative function, and $r > 0$.

Proof:

\[\begin{align} \mathbb{E}[g(X)] &= \int_{-\infty}^{\infty} g(x) f(x) dx \\ &= \int_{g(X) < r} g(x) f(x) dx + \int_{g(X) \ge r} g(x) f(x) dx \\ &\ge 0 + r \int_{g(X) > r} f(x) dx \\ &= r\mathbb{P} \left( g(X) > r \right) \\ \mathbb{P} \left( g(X) > r \right) &\le \frac{\mathbb{E}[g(X)]}{r} \end{align}\]

The proof uses the same idea as that for Markov’s inequality, but more general.

  • When $g(X) = |X|$, the the general-version Chebyshev’s inequality becomes Markov’s inequality.
  • When $g(X) = (X - \mu)^2$, then it becomes the specific-version Chebyshev’s inequality

Proof

Hoeffding’s inequality

TODO, see All of Statistics book

Approximation

Approximate binomial distribution with

  • Possion distribution when $n$ is large and $p$ is small ($\to 0$).
  • Normal distribution when $n$ is large and $p$ is close to $1/2$.

Population statistics

Suppose there are \(N\) samples in total in the population.

Population mean

\begin{equation} \mu = \frac{1}{N}\sum_{i=1}^{N}X_i \label{eq:populationMean} \end{equation}

Population variance

\begin{equation} \sigma ^2 = \frac{1}{N}\sum_{i=1}^{N}(X_i - \mu) ^2 \label{eq:populationVariance} \end{equation}

$\sigma$ is the population standard deviatin.

Sample statistics

Suppose we take a sample of sample size \(n\) from the population.

Sample mean

Note: \(\overline{X}\) is used instead of \(\mu\).

\begin{equation} \overline{X} = \frac{1}{n}\sum_{i=1}^{n}X_i \label{eq:sampleMean} \end{equation}

Sample variance

\(n-1\) is used instead of \(n\) after Bessel’s correction. The intuition behind such correction is that sample variance tends to underestimate population variance, so we intentionally enlarge it a bit. Please see Bessel’s correction for more details.

Note: \(s^2\) is used instead of \(\sigma ^2\).

\begin{equation} s ^2 = \frac{1}{n - 1}\sum_{i=1}^{n}(X_i - \overline{X}) ^2 \label{eq:sampleVariance} \end{equation}

and $s$ is the sample standard deviation.

Standard error

The full name is the standard error of the mean (SEM), which is the standard deviation of sample mean (\(\overline{X}\)), which is still a random variable (r.v.).

\begin{equation} \textrm{SEM} = \frac{s}{\sqrt n} \label{eq:standardErrorCorrected} \end{equation}

Standardization

Standardization is a common transformation that brings data to be centered at 0 with unit standard deviation.

Let’s denote the transformed value of \(X_i\) as \(X_i'\),

\begin{equation} X’_i = \frac{X_i - \overline{X}}{s} \label{eq:standardization} \end{equation}

Apparently, the mean after standardization \(\overline{x'}\) becomes 0. Let’s calculate the variance of the transformed data,