Graph Theory

Graph Fundamentals

Definitions
Graph:   G = (V, E) where V = vertices (nodes), E = edges (connections)

Undirected edge: {u, v}    no direction (friendship)
Directed edge:   (u, v)    from u to v (follows, depends-on)

Degree: deg(v) = number of edges incident to v
  Directed: in-degree (incoming) + out-degree (outgoing)

Handshaking lemma: sum of all degrees = 2|E|
  Every edge contributes 2 to total degree count
Representations
Adjacency list:   {A: [B,C], B: [A,D], ...}
  Space: O(V+E)
  Neighbor lookup: O(deg(v))
  Best for: sparse graphs, iteration

Adjacency matrix:  M[i][j] = 1 if edge (i,j) exists
  Space: O(V^2)
  Edge lookup: O(1)
  Best for: dense graphs, matrix operations

Weighted graph: edges carry values (cost, distance, capacity)
  List: {A: [(B,5), (C,3)]}
  Matrix: M[i][j] = weight

Paths and Connectivity

Path types
Walk:     sequence of vertices/edges (repeats allowed)
Path:     walk with no repeated vertices
Cycle:    path that starts and ends at same vertex

Connected:   path exists between every pair (undirected)
Strongly connected: directed path both ways between every pair

Connected components: maximal connected subgraphs
  Each component is an isolated cluster
  Count: run BFS/DFS from each unvisited vertex
Special graphs
Tree:       connected acyclic graph, |E| = |V| - 1
            Remove any edge → disconnects
            Add any edge → creates cycle

Bipartite:  vertices split into two sets, edges only between sets
            Equivalent to: no odd-length cycles
            Test: BFS 2-coloring

Complete:   K_n = edge between every pair, |E| = n(n-1)/2
DAG:        directed acyclic graph (dependency graphs)

Graph Traversal

BFS — Breadth-First Search
Strategy:    Explore level by level using a queue
Complexity:  O(V + E)
Finds:       shortest path in unweighted graphs
             connected components
             bipartiteness

Algorithm:
  1. Enqueue start vertex, mark visited
  2. While queue not empty:
     a. Dequeue vertex u
     b. For each neighbor v of u:
        If v not visited: mark visited, enqueue v
DFS — Depth-First Search
Strategy:    Go deep using stack/recursion, backtrack when stuck
Complexity:  O(V + E)
Finds:       cycles
             topological order (DAG)
             connected components
             strongly connected components (Tarjan/Kosaraju)

Algorithm:
  1. Push start vertex, mark visited
  2. While stack not empty:
     a. Pop vertex u
     b. For each neighbor v of u:
        If v not visited: mark visited, push v

Shortest Path Algorithms

Algorithm selection
Unweighted:                    BFS from source → O(V + E)
Non-negative weights:          Dijkstra → O((V+E) log V)
Negative weights allowed:      Bellman-Ford → O(V × E)
All pairs:                     Floyd-Warshall → O(V^3)
DAG:                           Topological sort + relax → O(V + E)
Dijkstra
Greedy: always expand the cheapest unvisited vertex
Uses priority queue (min-heap)

1. Set dist[start] = 0, all others = ∞
2. Insert start into priority queue
3. While PQ not empty:
   a. Extract minimum distance vertex u
   b. For each neighbor v of u:
      If dist[u] + weight(u,v) < dist[v]:
        Update dist[v]
        Insert/decrease-key v in PQ

Fails with negative edges (greedy assumption violated)
Bellman-Ford
Relaxes ALL edges V-1 times
Detects negative cycles on Vth pass

1. Set dist[start] = 0, all others = ∞
2. Repeat V-1 times:
   For each edge (u, v, w):
     If dist[u] + w < dist[v]: update dist[v]
3. Check for negative cycles:
   For each edge (u, v, w):
     If dist[u] + w < dist[v]: negative cycle exists!

Trees and Spanning Trees

Minimum spanning tree
Spanning tree: tree connecting all vertices of the graph
MST: spanning tree with minimum total edge weight

Kruskal's algorithm:
  1. Sort edges by weight
  2. For each edge (lightest first):
     If adding it doesn't create a cycle: include it
  Uses union-find data structure
  O(E log E)

Prim's algorithm:
  1. Start from any vertex
  2. Repeatedly add cheapest edge connecting tree to non-tree vertex
  Uses priority queue
  O(E log V)
Topological sort (DAG only)
Linear ordering where u comes before v if edge (u,v) exists

Applications:
  Build systems (Makefile dependency order)
  Package managers (install dependencies first)
  Course prerequisites

Algorithm (DFS-based):
  1. DFS from each unvisited vertex
  2. Add vertex to result AFTER all descendants processed
  3. Reverse the result

Centrality and Network Analysis

Centrality measures
Degree centrality:
  How connected? = deg(v) / (|V| - 1)

Betweenness centrality:
  How much a bridge? = fraction of shortest paths through v

Closeness centrality:
  How accessible? = (|V|-1) / sum of shortest paths from v

Eigenvector centrality:
  Connected to important nodes? (iterative)
  PageRank is a variant with damping factor d ≈ 0.85
Community detection
Goal: find densely connected groups with sparse connections between them

Louvain algorithm:    greedy modularity optimization
Label propagation:    each node adopts majority neighbor label
Spectral clustering:  eigenvalues of graph Laplacian

Application in association engine:
  Vertices = concepts (tools, topics, projects)
  Edges = relationships (uses, requires, enables)
  Weight = association strength
  Communities = knowledge clusters

Classic Problems

Graph problems
Euler path:       visits every EDGE exactly once
  Exists if 0 or 2 vertices have odd degree

Hamiltonian path: visits every VERTEX exactly once
  NP-complete (no known efficient algorithm)

Graph coloring:   assign colors so no adjacent vertices share a color
  Chromatic number χ(G) = minimum colors needed
  Bipartite iff χ(G) = 2
  Four Color Theorem: planar graphs need at most 4 colors

Clique:           complete subgraph
  Finding max clique is NP-hard

Graph isomorphism: same structure, different labels
  G1 ~ G2 if bijection preserving adjacency exists