AI/ML/DL Cheat Sheet

Probability-the Science of Uncertainty and Data

by Fabi´an Kozynski

PROBABILITY

Probability models and axioms

Definition (Sample space) A sample space Ω is the set of all possible outcomes. The set’s elements must be mutually exclusive, collectively exhaustive and at the right granularity.
Definition (Event) An event is a subset of the sample space. Probability is assigned to events.
Definition (Probability axioms) A probability law

\mathbb{P}

assigns probabilities to events and satisties the following axioms:
a. Nonnegatively :

\mathbb{P}\left(A\right) \ge 0

for all events A.
b. Normalization :

\mathbb{P}\left(\Omega\right) = 1.

c. (Countable) additivity: For every sequence of events A₁,A₂…,A_n such that

A_i \cap A_j = \varnothing \colon \mathbb{P}\left(\bigcup\limits_{i} A_i\right) = \sum\limits_i \mathbb{P}(A_i).

Corollaries (Consequences of the axioms):
1.

\mathbb{P}\left(\varnothing\right) = 0.

2. For any finite collection of disjoint events A₁,…,A_n,

\mathbb{P}\left(\bigcup\limits_{i=1}^n A_i\right) = \sum\limits_{i=1}^n \mathbb{P}(A_i)

.
3.

\mathbb{P}\left(A\right) + \mathbb{P}\left(A^c\right) = 1

.
4.

\mathbb{P}\left(A\right) \le 1.

\textrm{If } A \subset B, \textrm{ then } \mathbb{P}(A) \leq \mathbb{P}(B)

.
6.

\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) – \mathbb{P}(A \cap B)

.
7.

\mathbb{P}(A \cup B) \leq \mathbb{P}(A) + \mathbb{P}(B)

.
Example (Discrete uniform law) Assume Ω is finite and consists of n equally likely elements. Also, assume that

A \subset \Omega

with k elements. Then

\mathbb{P}(A) = \tfrac{k}{n}

Conditioning and Bayes’ rule

Definition (Conditional probability) Given that event B has occurred and that

\mathbb{P}(B) \gt 0

, the probability that A occurs is

\mathbb{P}(A|B) \triangleq \frac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)}.

Remark (Conditional probabilities properties) They are the same as ordinary probabilities. Assuming

\mathbb{P}(B) > 0

:
a.

\mathbb{P}(A|B) \gt 0

.
b.

\mathbb{P}(\Omega|B) = 1

.
c.

\mathbb{P}(B|B) = 1

.
d.

\textrm{If} A \cap C = \varnothing, \mathbb{P}(A \cup C |B) = \mathbb{P}(A | B) + \mathbb{P}(C|B)

.
Proposition (Multiplication rule)

\mathbb{P}(A_1 \cap A_2 \cap \cdots \cap A_n) = \mathbb{P}(A_1) \cdot \mathbb{P}(A_2|A_1) \cdots \mathbb{P}(A_n|A_1 \cap A_2 \cap \cdots \cap A_{n-1}).

Theorem (Total probability theorem) Given a partition {A₁, A₂, …} of the sample space, meaning that

\bigcup\limits_i A_i = \Omega

and the events are disjoint, and for every event B, we have

\mathbb{P}(B) = \sum\limits_i{\mathbb{P}(A_i)\mathbb{P}(B|A_i)}.

Theorem (Bayes’ rule) Given a partition {A₁,A₂,…} of the sample space, meaning that

\bigcup \limits_{i} A_i = \Omega

and the events are disjoint, and if

\mathbb{P}(A_i) > 0

for all i, then for every event B, the conditional probabilities

\mathbb{P}(A_i | B)

can be obtained from the conditional probabilities

\mathbb{P}(B|A_i)

and the initial probabilities

\mathbb{P}(A_i)

as follows:

\mathbb{P}(A_i | B) = \frac{\mathbb{P}(A_i)\mathbb{P}(B|A_i)}{\sum_j \mathbb{P}(A_j)\mathbb{P}(B|A_j)}.

Independence

Definition (Independence of events) Two events are independent if occurrence of one provides no information about the other. We say that A and B are independent if

\mathbb{P}(A \cap B) = \mathbb{P}(A)\mathbb{P}(B).

Equivalently, as long as

\mathbb{P}(A) > 0

and

\mathbb{P}(B)>0,

\mathbb{P}(B|A) = \mathbb{P}(B)\qquad\mathbb{P}(A|B) = \mathbb{P}(A).

Remarks
a. The definition of indepence is symmetric with respect to A and B
b. The product definition applies even if

\mathbb{P}(A) = 0

\mathbb{P}(B) = 0.

Corollaries If A and B are independent, then A and B^c are independent. Similarly for A^c and B, or for A^c and B^c.
Definition (Conditional independence) We say that A and B are independent conditioned on C, where

\mathbb{P}(C) > 0

, if

\mathbb{P}(A \cap B | C) = \mathbb{P}(A|C)\mathbb{P}(B|C).

Definition (Independence of a collection of events) We say that events A₁,A₂,…,A_n are independent if for every collection of distinct indices i₁,i₂,…i_k, we have

\mathbb{P}(A_{i1} \cap … \cap A_{ik}) = \mathbb{P}(A_{i1}) \cdot \mathbb{P}(A_{i2}) \cdots \mathbb{P}(A_{ik}).

Original title: All Cheat Sheets / Machine Learning, Deep Learning, Artificial Intelligence by Stanford University and Massachusetts Institute of Technology (thank you)

AI/ML/DL Cheat Sheet

Probability models and axioms

Conditioning and Bayes’ rule

Independence

SQL — 获取所有用户的SQL

Bubble sort — (PHP Version)

Bubble sort — (C Version)