Probability
Probability Basics
Definitions
Sample space S: set of all possible outcomes
Event A: subset of the sample space
Probability: P(A) = |A| / |S| (equally likely outcomes)
Axioms:
0 <= P(A) <= 1
P(S) = 1
P(∅) = 0
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Complement:
P(A') = 1 - P(A)
"Probability of NOT A"
Often easier: P(at least one) = 1 - P(none)
Examples
Fair die: P(even) = 3/6 = 1/2
Two dice: P(sum = 7) = 6/36 = 1/6
Card draw: P(ace) = 4/52 = 1/13
Birthday problem:
P(at least 2 share birthday in n people)
= 1 - P(all different)
= 1 - (365/365)(364/365)(363/365)...(365-n+1)/365
n=23: P ≈ 50.7%
n=50: P ≈ 97%
Hash collision analogy: birthday attack
Conditional Probability
Formulas
P(A|B) = P(A ∩ B) / P(B) "probability of A given B"
Multiplication rule:
P(A ∩ B) = P(A|B) × P(B) = P(B|A) × P(A)
Independence:
A and B independent iff P(A ∩ B) = P(A) × P(B)
iff P(A|B) = P(A)
iff P(B|A) = P(B)
Bayes' theorem
P(A|B) = P(B|A) × P(A) / P(B)
Expanded form (total probability):
P(A|B) = P(B|A) × P(A) / [P(B|A)×P(A) + P(B|A')×P(A')]
Example: malware detection
P(malware) = 0.01 (1% of files are malware)
P(detect|malware) = 0.95 (95% detection rate)
P(detect|clean) = 0.03 (3% false positive rate)
P(malware|detect) = (0.95 × 0.01) / (0.95×0.01 + 0.03×0.99)
= 0.0095 / 0.0392
≈ 0.242 (24.2%)
Even with 95% detection, only 24% of alerts are real!
This is why false positive rate matters enormously.
Common Distributions
Discrete distributions
Bernoulli(p): Single trial, success (1) with probability p
P(X=1) = p, P(X=0) = 1-p
E[X] = p, Var(X) = p(1-p)
Binomial(n, p): Number of successes in n independent trials
P(X=k) = C(n,k) × p^k × (1-p)^(n-k)
E[X] = np, Var(X) = np(1-p)
Example: 100 packets, 2% loss rate
P(exactly 5 lost) = C(100,5) × 0.02^5 × 0.98^95 ≈ 0.0353
Geometric(p): Trials until first success
P(X=k) = (1-p)^(k-1) × p
E[X] = 1/p
Example: average rolls until 6 = 1/(1/6) = 6
Poisson(λ): Rare events in fixed interval
P(X=k) = e^(-λ) × λ^k / k!
E[X] = λ, Var(X) = λ
Example: 3 incidents/month average
P(0 incidents) = e^(-3) ≈ 0.050
P(5+ incidents) = 1 - sum_{k=0}^{4} P(X=k) ≈ 0.185
Normal distribution
N(μ, σ^2): Bell curve centered at mean μ, spread σ
68-95-99.7 rule:
68% of data within 1σ of mean
95% of data within 2σ of mean
99.7% of data within 3σ of mean
Z-score: z = (x - μ) / σ (how many SDs from mean)
Central Limit Theorem:
Average of many independent random variables → approximately normal
Regardless of original distribution
This is why the normal distribution appears everywhere
Expected Value and Variance
Expected value
E[X] = sum of (value × probability) for all outcomes
E[X] = sum_{i} x_i × P(X = x_i)
Properties:
E[aX + b] = a×E[X] + b (linearity)
E[X + Y] = E[X] + E[Y] (always, even if dependent)
E[XY] = E[X]×E[Y] (only if independent)
Variance and standard deviation
Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2
SD(X) = sqrt(Var(X))
Properties:
Var(aX + b) = a^2 × Var(X)
Var(X + Y) = Var(X) + Var(Y) (if independent)
Example: die roll
E[X] = (1+2+3+4+5+6)/6 = 3.5
E[X^2] = (1+4+9+16+25+36)/6 = 91/6
Var(X) = 91/6 - 12.25 = 2.917
SD(X) ≈ 1.708
Probability in Security
Brute force and key space
Key space K, brute force probability after t tries:
P(crack in t tries) = t / |K| (without replacement)
P(crack in t tries) ≈ 1 - e^(-t/|K|) (with replacement, large K)
Expected tries to crack: |K| / 2
AES-128: |K| = 2^128
At 10^12 guesses/sec: 2^128 / 10^12 / (3.15×10^7 sec/yr)
≈ 10^19 years (safe)
4-digit PIN: |K| = 10^4 = 10,000
Expected: 5,000 tries (seconds for a computer)
Availability calculations
Single system, uptime p:
Availability = p
Two systems in series (both must work):
A = p1 × p2
Two systems in parallel (either works):
A = 1 - (1-p1)(1-p2)
Example: 99.9% uptime each
Series: 0.999 × 0.999 = 0.998 (99.8%)
Parallel: 1 - 0.001^2 = 0.999999 (99.9999%)
Nines:
99% = 3.65 days downtime/year
99.9% = 8.77 hours/year
99.99% = 52.6 minutes/year
99.999% = 5.26 minutes/year