Graph Theory
Graph Fundamentals
Definitions
Graph: G = (V, E) where V = vertices (nodes), E = edges (connections)
Undirected edge: {u, v} no direction (friendship)
Directed edge: (u, v) from u to v (follows, depends-on)
Degree: deg(v) = number of edges incident to v
Directed: in-degree (incoming) + out-degree (outgoing)
Handshaking lemma: sum of all degrees = 2|E|
Every edge contributes 2 to total degree count
Representations
Adjacency list: {A: [B,C], B: [A,D], ...}
Space: O(V+E)
Neighbor lookup: O(deg(v))
Best for: sparse graphs, iteration
Adjacency matrix: M[i][j] = 1 if edge (i,j) exists
Space: O(V^2)
Edge lookup: O(1)
Best for: dense graphs, matrix operations
Weighted graph: edges carry values (cost, distance, capacity)
List: {A: [(B,5), (C,3)]}
Matrix: M[i][j] = weight
Paths and Connectivity
Path types
Walk: sequence of vertices/edges (repeats allowed)
Path: walk with no repeated vertices
Cycle: path that starts and ends at same vertex
Connected: path exists between every pair (undirected)
Strongly connected: directed path both ways between every pair
Connected components: maximal connected subgraphs
Each component is an isolated cluster
Count: run BFS/DFS from each unvisited vertex
Special graphs
Tree: connected acyclic graph, |E| = |V| - 1
Remove any edge → disconnects
Add any edge → creates cycle
Bipartite: vertices split into two sets, edges only between sets
Equivalent to: no odd-length cycles
Test: BFS 2-coloring
Complete: K_n = edge between every pair, |E| = n(n-1)/2
DAG: directed acyclic graph (dependency graphs)
Graph Traversal
BFS — Breadth-First Search
Strategy: Explore level by level using a queue
Complexity: O(V + E)
Finds: shortest path in unweighted graphs
connected components
bipartiteness
Algorithm:
1. Enqueue start vertex, mark visited
2. While queue not empty:
a. Dequeue vertex u
b. For each neighbor v of u:
If v not visited: mark visited, enqueue v
DFS — Depth-First Search
Strategy: Go deep using stack/recursion, backtrack when stuck
Complexity: O(V + E)
Finds: cycles
topological order (DAG)
connected components
strongly connected components (Tarjan/Kosaraju)
Algorithm:
1. Push start vertex, mark visited
2. While stack not empty:
a. Pop vertex u
b. For each neighbor v of u:
If v not visited: mark visited, push v
Shortest Path Algorithms
Algorithm selection
Unweighted: BFS from source → O(V + E)
Non-negative weights: Dijkstra → O((V+E) log V)
Negative weights allowed: Bellman-Ford → O(V × E)
All pairs: Floyd-Warshall → O(V^3)
DAG: Topological sort + relax → O(V + E)
Dijkstra
Greedy: always expand the cheapest unvisited vertex
Uses priority queue (min-heap)
1. Set dist[start] = 0, all others = ∞
2. Insert start into priority queue
3. While PQ not empty:
a. Extract minimum distance vertex u
b. For each neighbor v of u:
If dist[u] + weight(u,v) < dist[v]:
Update dist[v]
Insert/decrease-key v in PQ
Fails with negative edges (greedy assumption violated)
Bellman-Ford
Relaxes ALL edges V-1 times
Detects negative cycles on Vth pass
1. Set dist[start] = 0, all others = ∞
2. Repeat V-1 times:
For each edge (u, v, w):
If dist[u] + w < dist[v]: update dist[v]
3. Check for negative cycles:
For each edge (u, v, w):
If dist[u] + w < dist[v]: negative cycle exists!
Trees and Spanning Trees
Minimum spanning tree
Spanning tree: tree connecting all vertices of the graph
MST: spanning tree with minimum total edge weight
Kruskal's algorithm:
1. Sort edges by weight
2. For each edge (lightest first):
If adding it doesn't create a cycle: include it
Uses union-find data structure
O(E log E)
Prim's algorithm:
1. Start from any vertex
2. Repeatedly add cheapest edge connecting tree to non-tree vertex
Uses priority queue
O(E log V)
Topological sort (DAG only)
Linear ordering where u comes before v if edge (u,v) exists
Applications:
Build systems (Makefile dependency order)
Package managers (install dependencies first)
Course prerequisites
Algorithm (DFS-based):
1. DFS from each unvisited vertex
2. Add vertex to result AFTER all descendants processed
3. Reverse the result
Centrality and Network Analysis
Centrality measures
Degree centrality:
How connected? = deg(v) / (|V| - 1)
Betweenness centrality:
How much a bridge? = fraction of shortest paths through v
Closeness centrality:
How accessible? = (|V|-1) / sum of shortest paths from v
Eigenvector centrality:
Connected to important nodes? (iterative)
PageRank is a variant with damping factor d ≈ 0.85
Community detection
Goal: find densely connected groups with sparse connections between them
Louvain algorithm: greedy modularity optimization
Label propagation: each node adopts majority neighbor label
Spectral clustering: eigenvalues of graph Laplacian
Application in association engine:
Vertices = concepts (tools, topics, projects)
Edges = relationships (uses, requires, enables)
Weight = association strength
Communities = knowledge clusters
Classic Problems
Graph problems
Euler path: visits every EDGE exactly once
Exists if 0 or 2 vertices have odd degree
Hamiltonian path: visits every VERTEX exactly once
NP-complete (no known efficient algorithm)
Graph coloring: assign colors so no adjacent vertices share a color
Chromatic number χ(G) = minimum colors needed
Bipartite iff χ(G) = 2
Four Color Theorem: planar graphs need at most 4 colors
Clique: complete subgraph
Finding max clique is NP-hard
Graph isomorphism: same structure, different labels
G1 ~ G2 if bijection preserving adjacency exists