Notes on common statistics
Laws
Law of total expectation (Adam’s Law)
\begin{equation} \mathbb{E}[X] = \mathbb{E}\big[\mathbb{E}[X|N]\big] = \mathbb{E}[\mu N] = \mu\mathbb{E}[N] \end{equation}
e.g.
- $X$ is the total amount of money all customers spend in a shop in a day
- $N$ is the number of customers visited that shop
- $\mu$ is the mean amount of money a customer spends
Law of total variance (Eve’s law)
Inequalities
Cauchy-Schwarz inequality
\begin{equation} \mathbb{E}[XY] \le \sqrt{\mathbb{E}[X^2]\mathbb{E}[Y^2]} \end{equation}
Jensen’s inequality
If $f$ is a convex function
\begin{equation} f\big(\mathbb{E}[X]\big) \le \mathbb{E}[f(x)] \end{equation}
Markov’s inequality
\begin{equation} \text{P}\big(\left|X\right| \ge a\big) \le \frac{\mathbb{E}[X]}{a} \end{equation}
Chebyshev’s inequality
\begin{equation} \text{P}\big(\left|X - \mu\right| \gt a \big) \le \frac{\mathbb{V}[X]}{a^2} \end{equation}
where $\mu=\mathbb{E}[X]$, and $a \gt 0$.
Hoeffding’s inequality
TODO, see All of Statistics book
Approximation
Approximate binomial distribution with
- Possion distribution when $n$ is large and $p$ is small ($\to 0$).
- Normal distribution when $n$ is large and $p$ is close to $1/2$.
Population statistics
Suppose there are \(N\) samples in total in the population.
Population mean
\begin{equation} \mu = \frac{1}{N}\sum_{i=1}^{N}X_i \label{eq:populationMean} \end{equation}
Population variance
\begin{equation} \sigma ^2 = \frac{1}{N}\sum_{i=1}^{N}(X_i - \mu) ^2 \label{eq:populationVariance} \end{equation}
$\sigma$ is the population standard deviatin.
Sample statistics
Suppose we take a sample of sample size \(n\) from the population.
Sample mean
Note: \(\overline{X}\) is used instead of \(\mu\).
\begin{equation} \overline{X} = \frac{1}{n}\sum_{i=1}^{n}X_i \label{eq:sampleMean} \end{equation}
Sample variance
\(n-1\) is used instead of \(n\) after Bessel’s correction. The intuition behind such correction is that sample variance tends to underestimate population variance, so we intentionally enlarge it a bit. Please see Bessel’s correction for more details.
Note: \(s^2\) is used instead of \(\sigma ^2\).
\begin{equation} s ^2 = \frac{1}{n - 1}\sum_{i=1}^{n}(X_i - \overline{X}) ^2 \label{eq:sampleVariance} \end{equation}
and $s$ is the sample standard deviation.
Standard error
The full name is the standard error of the mean (SEM), which is the standard deviation of sample mean (\(\overline{X}\)), which is still a random variable (r.v.).
\begin{equation} \textrm{SEM} = \frac{s}{\sqrt n} \label{eq:standardErrorCorrected} \end{equation}
Standardization
Standardization is a common transformation that brings data to be centered at 0 with unit standard deviation.
Let’s denote the transformed value of \(X_i\) as \(X_i'\),
\begin{equation} X’_i = \frac{X_i - \overline{X}}{s} \label{eq:standardization} \end{equation}
Apparently, the mean after standardization \(\overline{x'}\) becomes 0. Let’s calculate the variance of the transformed data,