### 1. Random Variables

In the study of probability and statistics, a random variable can be thought of as a variable whose value is subject to variations due to chance (i.e., randomness, in a mathematical sense). A random variable can take on a set of possible different values, each with an associated probability, in contrast to other mathematical variables.

### 2. Types of Random Variables

There are two main types of random variables, Discrete Random Variables (DRVs) and Continuous Random Variables (CRVs). Discrete random variables take on a countable number of distinct values. Examples of DRVs include things like the number of heads when flipping three coins or the number of defective items in a batch. In contrast, continuous random variables can take on any value within a specified range, and that range can be infinite. Examples include the height of a person or the time spent waiting for a bus.

#### 2.1 Discrete Random Variables (DRVs)

A Discrete Random Variable is one which may take on only a countable number of distinct values such as 0,1,2,3,4,... etc. Probability distribution function (PDF) for discrete random variables is given by:

$P(X = x) = p(x), \quad \text{for all } x$

And the cumulative distribution function (CDF) is:

$F(x) = P(X \leq x) = \sum_{t \leq x} p(t), \quad \text{for all } x$

The expected value (or mean, denoted $\mu$), variance (denoted $\sigma^2$), and standard deviation (denoted $\sigma$) of a discrete random variable are calculated using the following formulas:

Expected value: $\mu = E(X) = \sum_{x} xP(X = x)$

Variance: $\sigma^2 = Var(X) = \sum_{x} (x - \mu)^2 P(X = x)$

Standard deviation: $\sigma = \sqrt{Var(X)}$

#### 2.2 Continuous Random Variables (CRVs)

A Continuous Random Variable is one which takes an infinite number of possible values. Continuous Random Variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile. Probability density function (PDF) for continuous random variables is given by:

$f(x) \geq 0, \quad \text{for all } x$

And the cumulative distribution function (CDF) is:

$F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t) dt, \quad \text{for all } x$

The expected value, variance, and standard deviation of a continuous random variable are calculated using the following formulas:

Expected value: $\mu = E(X) = \int_{-\infty}^{\infty} x f(x) dx$

Variance: $\sigma^2 = Var(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) dx$

Standard deviation: $\sigma = \sqrt{Var(X)}$

### 3. Probability Distributions

A probability distribution describes how the values of a random variable is distributed. It is defined by the probability distribution function for discrete variables and the probability density function for continuous variables. The cumulative distribution function, which is the cumulative sum of the probabilities, applies to both discrete and continuous variables.

#### 3.1 Binomial Distribution

The binomial distribution with parameters $n$ and $p$ is the discrete probability distribution of the number of successes in a sequence of $n$ independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. The Binomial distribution is used when there are exactly two mutually exclusive outcomes of a trial. These outcomes are appropriately labeled "success" and "failure".

The probability mass function (PMF) of a binomial distribution is:

$P(X=k) = C(n, k) p^k (1-p)^{n-k}, \quad k = 0, 1, 2, ..., n$

where $C(n, k)$ is the binomial coefficient. The expected value, variance, and standard deviation are given by:

Expected value: $\mu = np$

Variance: $\sigma^2 = np(1-p)$

Standard deviation: $\sigma = \sqrt{np(1-p)}$

##### 3.1.1 Moment Generating Function, Characteristic Function, and Probability Generating Function for Binomial Distribution

The Moment Generating Function (MGF) of a Binomial distribution is given by:

$M(t) = E[e^{tX}] = (pe^t + q)^n$

The Characteristic Function (CF) of a Binomial distribution is given by:

$\phi(t) = E[e^{itX}] = (pe^{it} + q)^n$

The Probability Generating Function (PGF) of a Binomial distribution is given by:

$G(z) = E[z^X] = (pz + q)^n$

##### 3.1.2 Trivia: Binomial Distribution

Did you know that the term "binomial distribution" is derived from the fact that each random variable represents 'two' possible outcomes (success or failure)? Also, the shape of the binomial distribution changes based on the probability of success ($p$) and the number of trials ($n$). For instance, if the probability of success is high, the distribution is skewed to the right. If the probability of success is low, it is skewed to the left. If the probability is around 0.5, the distribution resembles a symmetric bell shape, much like the normal distribution.

#### 3.2 Poisson Distribution

The Poisson distribution is the discrete probability distribution of the number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

The PMF of a Poisson distribution is:

$P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, ..., \infty$

where $\lambda$ is the average rate of value. The expected value, variance, and standard deviation are given by:

Expected value: $\mu = \lambda$

Variance: $\sigma^2 = \lambda$

Standard deviation: $\sigma = \sqrt{\lambda}$

##### 3.2.1 Moment Generating Function, Characteristic Function, and Probability Generating Function for Poisson Distribution

The Moment Generating Function (MGF) of a Poisson distribution is given by:

$M(t) = E[e^{tX}] = e^{\lambda(e^t - 1)}$

The Characteristic Function (CF) of a Poisson distribution is given by:

$\phi(t) = E[e^{itX}] = e^{\lambda(e^{it} - 1)}$

The Probability Generating Function (PGF) of a Poisson distribution is given by:

$G(z) = E[z^X] = e^{\lambda(z - 1)}$

##### 3.2.2 Trivia: Poisson Distribution

Did you know that the Poisson distribution is named after the French mathematician Siméon Denis Poisson? He introduced this distribution to describe the number of times a gambler would win a rarely won game of chance in a large number of tries. It's interesting that real-life phenomena such as telephone calls to a radio show, goals in a World Cup match, or decay of radioactive atoms, all exhibit the Poisson distribution.

#### 3.3 Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell curve.

The PDF of a normal distribution is:

$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ - \frac{(x-\mu)^2}{2\sigma^2} }$

The expected value, variance, and standard deviation are given by:

Expected value: $\mu$

Variance: $\sigma^2$

Standard deviation: $\sigma$

Normal distribution is extensively used in natural and social sciences. The reason is Central Limit Theorem (CLT), which states that, for a large enough sample, the distribution of the sample mean will approach normal distribution, regardless of the shape of the population distribution.

##### 3.3.1 Normal Variate

In the context of the normal distribution, the term 'normal variate' refers to a random variable that follows a normal distribution. If the random variable $X$ follows a normal distribution with mean $\mu$ and standard deviation $\sigma$, we denote it as $X \sim N(\mu, \sigma^2)$.

##### 3.3.2 Moment Generating Function and Characteristic Function for Normal Distribution

The Moment Generating Function (MGF) of a Normal distribution is given by:

$M(t) = E[e^{tX}] = e^{\mu t + \frac{1}{2}\sigma^2 t^2}$

The Characteristic Function (CF) of a Normal distribution is given by:

$\phi(t) = E[e^{itX}] = e^{i\mu t - \frac{1}{2}\sigma^2 t^2}$

Note: The normal distribution does not have a Probability Generating Function as it is defined only for discrete random variables.

##### 3.3.3 Trivia: Normal Distribution

The normal distribution, also known as the Gaussian distribution, was first introduced by Abraham de Moivre in an article in 1733. However, it was fully developed by the German mathematician Carl Friedrich Gauss, hence the name 'Gaussian distribution'. This distribution is considered the most prominent probability distribution in statistics because it fits many natural phenomena like heights, blood pressure, IQ scores, etc. Plus, thanks to the Central Limit Theorem, it allows statisticians to make inferences about population means.

#### 3.4 Uniform Distribution

The uniform distribution is a type of probability distribution in which all outcomes are equally likely. A deck of cards has a uniform distribution because the likelihood of drawing a heart, club, diamond or spade is equally likely. A coin also has a uniform distribution because the probability of getting either heads or tails in a coin toss is the same.

The PDF of a uniform distribution in the interval $[a, b]$ is:

$f(x) = \frac{1}{b - a}, \quad a \leq x \leq b$

The expected value, variance, and standard deviation are given by:

Expected value: $\mu = \frac{a + b}{2}$

Variance: $\sigma^2 = \frac{(b - a)^2}{12}$

Standard deviation: $\sigma = \sqrt{\frac{(b - a)^2}{12}}$

##### 3.4.1 Moment Generating Function and Characteristic Function for Uniform Distribution

The Moment Generating Function (MGF) of a Uniform distribution is given by:

$M(t) = E[e^{tX}] = \frac{e^{tb} - e^{ta}}{t(b - a)}, \quad t \neq 0$

The Characteristic Function (CF) of a Uniform distribution is given by:

$\phi(t) = E[e^{itX}] = \frac{e^{itb} - e^{ita}}{it(b - a)}, \quad t \neq 0$

Note: The uniform distribution does not have a Probability Generating Function as it is defined only for discrete random variables.

##### 3.4.2 Trivia: Uniform Distribution

One interesting aspect of the uniform distribution is that it is the maximum entropy probability distribution for a random variable X under no constraint other than that it is contained in the distribution's support. This means that for a given set of events, the uniform distribution represents the maximum ignorance about which event will occur, because they are all equally likely.

#### 3.5 Exponential Distribution

The exponential distribution is often concerned with the amount of time until some specific event occurs. For example, the amount of time (beginning now) until an earthquake occurs has an exponential distribution. Other examples include the length of time, in minutes, of long distance business telephone calls, and the amount of time, in months, a car battery lasts.

The PDF of an exponential distribution with rate $\lambda$ is:

$f(x) = \lambda e^{-\lambda x}, \quad x \geq 0$

The expected value, variance, and standard deviation are given by:

Expected value: $\mu = \frac{1}{\lambda}$

Variance: $\sigma^2 = \frac{1}{\lambda^2}$

Standard deviation: $\sigma = \frac{1}{\lambda}$

##### 3.5.1 Moment Generating Function and Characteristic Function for Exponential Distribution

The Moment Generating Function (MGF) of an Exponential distribution is given by:

$M(t) = E[e^{tX}] = \frac{\lambda}{\lambda - t}, \quad t < \lambda$

The Characteristic Function (CF) of an Exponential distribution is given by:

$\phi(t) = E[e^{itX}] = \frac{\lambda}{\lambda - it}$

Note: The exponential distribution does not have a Probability Generating Function as it is defined only for discrete random variables.

##### 3.5.2 Trivia: Exponential Distribution

An interesting fact about the exponential distribution is its 'memoryless' property. This means that the remaining time until an event occurs does not depend on how much time has already passed. For example, if you're waiting for a bus that arrives on average every 15 minutes, and you've already been waiting for 30 minutes, the time you still have to wait is still distributed the same as when you first arrived. This makes it a perfect model for 'survival analysis' in engineering.

### 4. Transformation of Random Variables

Often in practice, we're interested not only in a random variable $X$ but also in some function of the random variable $Y = g(X)$. Given the probability distribution of $X$, we may want to find the distribution of $Y$. This is known as the transformation of random variables. It's useful in various branches of statistics and probability, such as hypothesis testing and estimation theory.

The transformation methods depend on whether the random variable is discrete or continuous and whether the transformation function is one-to-one or many-to-one. For a one-to-one function, we use the change of variables method, and for many-to-one functions, we employ the method of distribution functions.

#### 4.1 Transformation of Continuous Random Variables

For a continuous random variable, let $Y = g(X)$ be a transformation of $X$. If the transformation is strictly increasing or decreasing, we can find the PDF of $Y$ by:

$$f_Y(y) = f_X(g^{-1}(y))\left|\frac{dg^{-1}(y)}{dy}\right|$$

where $f_X(x)$ is the PDF of $X$, $g^{-1}(y)$ is the inverse of the transformation function, and the absolute value indicates that we consider the magnitude of the derivative.

#### 4.2 Transformation of Discrete Random Variables

For a discrete random variable, we can directly calculate the probability mass function (PMF) of $Y = g(X)$ by summing the probabilities for each value $x$ such that $g(x) = y$:

$$P_Y(y) = \sum_{x:g(x) = y} P_X(x)$$

where $P_X(x)$ is the PMF of $X$.

### 5. Relations Among Distributions

Many probability distributions are related to others, and understanding these relationships can often simplify analysis or provide insights into the underlying processes generating the data. Here, we'll explore some of the relationships among the binomial, Poisson, normal, uniform, and exponential distributions.

#### 5.1 Relation Between Binomial and Poisson Distributions

The Poisson distribution can be used as an approximation to the binomial distribution under certain conditions, known as the Poisson limit theorem. Specifically, if a binomial distribution has a large number of trials ($n$) and a small probability of success ($p$) such that $np = \lambda$ is a moderate size, then the distribution is approximately Poisson with parameter $\lambda$.

#### 5.2 Relation Between Binomial and Normal Distributions

The binomial distribution tends to the normal distribution as the number of trials goes to infinity, thanks to the Central Limit Theorem (CLT). The conditions for this approximation are that $n$ is large (generally taken as $n > 30$), and $np(1-p)$ is also large. The corresponding normal distribution has mean $\mu = np$ and variance $\sigma^2 = np(1-p)$.

#### 5.3 Relation Between Poisson and Normal Distributions

Just like the binomial distribution, the Poisson distribution tends to the normal distribution as the parameter $\lambda$ goes to infinity. This is due to the Central Limit Theorem (CLT) and is usually reasonable for $\lambda > 30$. The corresponding normal distribution has mean $\mu = \lambda$ and variance $\sigma^2 = \lambda$.

#### 5.4 Relation Between Exponential and Poisson Distributions

The exponential distribution and the Poisson distribution are closely related. If the times between random events follow an exponential distribution with rate $\lambda$, then the total number of events in a given amount of time follows a Poisson distribution with parameter $\lambda t$. This connection is often used in queueing theory and reliability analysis of systems.

#### 5.5 Relation Between Uniform and Normal Distributions

If a large number of independent random variables, each with a standard uniform distribution, are added together, their normalized sum tends toward a normal distribution (Irwin–Hall distribution) due to the Central Limit Theorem (CLT). For instance, the sum of 12 independent standard uniform random variables (subtracting 6 to shift the mean to 0) approximates a standard normal distribution quite closely.

### 6. Covariance and Correlation

When dealing with multiple random variables, it is often useful to examine how these variables interact with each other. Covariance and correlation are two measures that provide insights into the relationship between two random variables.

#### 6.1 Covariance

Covariance is a measure of how much two random variables vary together. For two random variables $X$ and $Y$, the covariance is defined as:

$$\text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])]$$

where $E[X]$ is the expected value of $X$, and $E[Y]$ is the expected value of $Y$. If the covariance is positive, $X$ and $Y$ tend to be above their expected values at the same time. If the covariance is negative, one tends to be above its expected value when the other is below.

#### 6.2 Correlation

Correlation, specifically the Pearson correlation coefficient, is a measure of the linear relationship between two random variables. It is the covariance of the two variables divided by the product of their standard deviations. The correlation coefficient is always between -1 and 1. For two random variables $X$ and $Y$, the correlation is defined as:

$$\text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}$$

where $\text{Var}(X)$ is the variance of $X$, and $\text{Var}(Y)$ is the variance of $Y$.