Biology
Cell biology, DNA as information system, evolutionary algorithms, and bioinformatics — biology through the lens of computing.
Cell Biology and DNA
Central dogma
DNA → RNA → Protein
DNA: double helix, base pairs A-T and G-C
Storage medium — the "source code"
4 bases: Adenine, Thymine, Guanine, Cytosine
RNA: single strand, U replaces T
Transcription: DNA → mRNA (copy of gene)
Translation: mRNA → protein (ribosome reads codons)
Codon: 3-base sequence encoding one amino acid
4^3 = 64 codons → 20 amino acids + stop signals
Redundancy: multiple codons per amino acid (error tolerance)
Genome: complete DNA of an organism
Human: ~3.2 billion base pairs, ~20,000 genes
E. coli: ~4.6 million base pairs, ~4,300 genes
DNA as information system
Storage density: 1 gram of DNA ≈ 215 petabytes
Encoding: 2 bits per base pair (A=00, C=01, G=10, T=11)
Error correction: DNA repair enzymes (biological ECC)
Replication: DNA polymerase copies with ~10^-9 error rate per base
3 billion bases × 10^-9 ≈ 3 mutations per cell division
Parallel to computing:
DNA = storage (hard drive)
mRNA = read-only copy (RAM)
Protein = executable program
Ribosome = CPU
Mutations = bit flips
Evolution and Algorithms
Natural selection
Variation: individuals differ (random mutation, recombination)
Selection: fittest survive and reproduce more
Inheritance: traits pass to offspring
Genetic algorithm (inspired by evolution):
1. Initialize population of random solutions
2. Evaluate fitness of each
3. Select parents (tournament, roulette wheel)
4. Crossover: combine parent genes → offspring
5. Mutation: random small changes
6. Repeat until convergence
Applications:
Network topology optimization
Scheduling problems
Machine learning hyperparameter tuning
Bioinformatics Basics
Sequence alignment
Problem: find best match between two sequences
ACGTACGT
AC-TAGGT
^ ^ mismatches/gaps have costs
Dynamic programming (like edit distance):
Match: +1, Mismatch: -1, Gap: -2
Same algorithm family as diff(1) and git merge
BLAST: fast approximate search against sequence databases
Analogous to grep for DNA/protein sequences
Phylogenetic trees
Tree showing evolutionary relationships
Same data structure as file system hierarchy
┌─── Human
├─┤
│ └─── Chimpanzee
┤
│ ┌─── Dog
└─┤
└─── Cat
Distance: number of mutations between sequences
Algorithms: neighbor-joining, maximum likelihood
Related to: hierarchical clustering, dendrogram
See Also
-
Information Theory — DNA encoding and error correction parallel information theory
-
Cryptography — hashing and error detection share algorithmic roots