Biology

Cell biology, DNA as information system, evolutionary algorithms, and bioinformatics — biology through the lens of computing.

Cell Biology and DNA

Central dogma
DNA → RNA → Protein

DNA:     double helix, base pairs A-T and G-C
         Storage medium — the "source code"
         4 bases: Adenine, Thymine, Guanine, Cytosine

RNA:     single strand, U replaces T
         Transcription: DNA → mRNA (copy of gene)
         Translation: mRNA → protein (ribosome reads codons)

Codon:   3-base sequence encoding one amino acid
         4^3 = 64 codons → 20 amino acids + stop signals
         Redundancy: multiple codons per amino acid (error tolerance)

Genome:  complete DNA of an organism
         Human: ~3.2 billion base pairs, ~20,000 genes
         E. coli: ~4.6 million base pairs, ~4,300 genes
DNA as information system
Storage density:  1 gram of DNA ≈ 215 petabytes
Encoding:         2 bits per base pair (A=00, C=01, G=10, T=11)
Error correction: DNA repair enzymes (biological ECC)
Replication:      DNA polymerase copies with ~10^-9 error rate per base
                  3 billion bases × 10^-9 ≈ 3 mutations per cell division

Parallel to computing:
  DNA    = storage (hard drive)
  mRNA   = read-only copy (RAM)
  Protein = executable program
  Ribosome = CPU
  Mutations = bit flips

Evolution and Algorithms

Natural selection
Variation:    individuals differ (random mutation, recombination)
Selection:    fittest survive and reproduce more
Inheritance:  traits pass to offspring

Genetic algorithm (inspired by evolution):
  1. Initialize population of random solutions
  2. Evaluate fitness of each
  3. Select parents (tournament, roulette wheel)
  4. Crossover: combine parent genes → offspring
  5. Mutation: random small changes
  6. Repeat until convergence

Applications:
  Network topology optimization
  Scheduling problems
  Machine learning hyperparameter tuning

Bioinformatics Basics

Sequence alignment
Problem: find best match between two sequences

  ACGTACGT
  AC-TAGGT
    ^  ^     mismatches/gaps have costs

Dynamic programming (like edit distance):
  Match: +1, Mismatch: -1, Gap: -2
  Same algorithm family as diff(1) and git merge

BLAST: fast approximate search against sequence databases
  Analogous to grep for DNA/protein sequences
Phylogenetic trees
Tree showing evolutionary relationships
Same data structure as file system hierarchy

  ┌─── Human
  ├─┤
  │ └─── Chimpanzee
  ┤
  │ ┌─── Dog
  └─┤
    └─── Cat

Distance: number of mutations between sequences
Algorithms: neighbor-joining, maximum likelihood
Related to: hierarchical clustering, dendrogram

See Also

  • Information Theory — DNA encoding and error correction parallel information theory

  • Cryptography — hashing and error detection share algorithmic roots