Phase 8: Seed Data

Phase 8: Seed Data

Objective

Populate the association graph with real relationships from your domus-captures knowledge base. This is where the engine becomes useful — not as a programming exercise, but as a tool that reveals connections across your projects, certifications, skills, and infrastructure.

Steps

1. Organize by domain

Create one YAML file per domain in the data/ directory:

data/
├── certifications.yml    # CISSP, CCNP, and what they cover
├── projects.yml          # domus-* projects and their dependencies
├── skills.yml            # Tools, languages, and what uses them
└── infrastructure.yml    # Network components and their relations

Each file is independent. The load_directory() method merges them at runtime. This means you can add, remove, or reorganize files without changing code.

2. Certifications

Create data/certifications.yml:

associations:
  # --- CISSP domains ---
  - source: CISSP
    relation: covers
    target: security-risk-management

  - source: CISSP
    relation: covers
    target: asset-security

  - source: CISSP
    relation: covers
    target: security-architecture

  - source: CISSP
    relation: covers
    target: communication-network-security

  - source: CISSP
    relation: covers
    target: identity-access-management

  - source: CISSP
    relation: covers
    target: security-assessment-testing

  - source: CISSP
    relation: covers
    target: security-operations

  - source: CISSP
    relation: covers
    target: software-development-security

  # --- CCNP ---
  - source: CCNP
    relation: covers
    target: routing

  - source: CCNP
    relation: covers
    target: switching

  - source: CCNP
    relation: covers
    target: BGP

  - source: CCNP
    relation: covers
    target: OSPF

  - source: CCNP
    relation: covers
    target: network-automation

  # --- Cross-certification links ---
  - source: CISSP
    relation: relates-to
    target: CCNP

  - source: communication-network-security
    relation: relates-to
    target: routing

  - source: communication-network-security
    relation: relates-to
    target: switching

3. Projects

Create data/projects.yml:

associations:
  # --- association-engine ---
  - source: association-engine
    relation: uses
    target: Python

  - source: association-engine
    relation: uses
    target: FastAPI

  - source: association-engine
    relation: uses
    target: typer

  - source: association-engine
    relation: uses
    target: PyYAML

  - source: association-engine
    relation: uses
    target: pytest

  - source: association-engine
    relation: teaches
    target: classes

  - source: association-engine
    relation: teaches
    target: dicts

  - source: association-engine
    relation: teaches
    target: testing

  # --- domus-api ---
  - source: domus-api
    relation: uses
    target: Python

  - source: domus-api
    relation: uses
    target: FastAPI

  - source: domus-api
    relation: relates-to
    target: association-engine

  # --- domus-captures ---
  - source: domus-captures
    relation: uses
    target: AsciiDoc

  - source: domus-captures
    relation: uses
    target: Antora

  - source: domus-captures
    relation: relates-to
    target: association-engine

  # --- domus-infra-ops ---
  - source: domus-infra-ops
    relation: uses
    target: Ansible

  - source: domus-infra-ops
    relation: uses
    target: Vault

  - source: domus-infra-ops
    relation: covers
    target: 802.1X

  - source: domus-infra-ops
    relation: covers
    target: DNS

  - source: domus-infra-ops
    relation: covers
    target: PKI

4. Skills and tools

Create data/skills.yml:

associations:
  # --- Languages ---
  - source: Python
    relation: uses
    target: pip

  - source: Python
    relation: uses
    target: uv

  - source: Python
    relation: uses
    target: pytest

  - source: Python
    relation: uses
    target: ruff

  # --- CLI tools ---
  - source: awk
    relation: relates-to
    target: sed

  - source: awk
    relation: relates-to
    target: grep

  - source: jq
    relation: relates-to
    target: awk

  - source: jq
    relation: covers
    target: JSON

  # --- Frameworks ---
  - source: FastAPI
    relation: uses
    target: Pydantic

  - source: FastAPI
    relation: uses
    target: uvicorn

  - source: Antora
    relation: uses
    target: AsciiDoc

  - source: Antora
    relation: uses
    target: Node.js

  # --- Security tools ---
  - source: Vault
    relation: covers
    target: PKI

  - source: Vault
    relation: covers
    target: secrets-management

  - source: ISE
    relation: covers
    target: 802.1X

  - source: ISE
    relation: covers
    target: RADIUS

  - source: ISE
    relation: relates-to
    target: Active-Directory

5. Infrastructure

Create data/infrastructure.yml:

associations:
  # --- Network layers ---
  - source: 802.1X
    relation: requires
    target: RADIUS

  - source: 802.1X
    relation: requires
    target: PKI

  - source: 802.1X
    relation: requires
    target: Active-Directory

  - source: RADIUS
    relation: uses
    target: ISE

  - source: PKI
    relation: uses
    target: Vault

  # --- DNS ---
  - source: DNS
    relation: uses
    target: BIND

  - source: DNS
    relation: requires
    target: Active-Directory

  # --- Services ---
  - source: Active-Directory
    relation: covers
    target: Kerberos

  - source: Active-Directory
    relation: covers
    target: LDAP

  - source: Active-Directory
    relation: covers
    target: DNS

6. Query for hidden connections

Now run queries and look for relationships you did not explicitly think about:

# What does 802.1X require?
uv run assoc query 802.1X | jq

# What is PKI used by? (reverse query)
uv run assoc reverse PKI | jq

# What connects to Active-Directory?
uv run assoc reverse Active-Directory | jq

# How many entities are in the graph?
uv run assoc list | wc -l

# What relations exist?
uv run assoc relations

# The interesting query: what does CISSP share with your infrastructure?
# CISSP covers communication-network-security, which relates-to routing
# routing is covered-by CCNP
# This chain: CISSP → network-security → routing → CCNP → network-automation

The graph does not traverse chains automatically yet — that would be a graph traversal algorithm (BFS/DFS), a future enhancement. But even single-hop queries reveal connections: PKI appears in certifications (CISSP domain), projects (domus-infra-ops), tools (Vault), and infrastructure (802.1X requirement). That convergence is the point.

7. Validate the data

# Count total associations across all files
awk '/source:/' data/*.yml | wc -l

# Find any entities that appear only once (potential orphans)
uv run assoc list | while read -r entity; do
  fwd=$(uv run assoc query "$entity" 2>/dev/null | jq 'length')
  rev=$(uv run assoc reverse "$entity" 2>/dev/null | jq 'length')
  [ "$fwd" = "0" ] && [ "$rev" = "0" ] && echo "ORPHAN: $entity"
done

Checklist

  • data/certifications.yml created with CISSP and CCNP domains

  • data/projects.yml created with domus-* project relationships

  • data/skills.yml created with tools and language relationships

  • data/infrastructure.yml created with network component relationships

  • uv run assoc list shows all entities

  • uv run assoc query CISSP returns 8 domains

  • uv run assoc reverse PKI reveals multiple sources

  • No orphan entities (every entity has at least one connection)

  • All tests still pass

Verification

# Data loads without errors
uv run assoc list | head -5

# Specific queries work
uv run assoc query 802.1X | jq -e '.requires | length > 0'

# Tests unaffected
uv run pytest tests/ -v --tb=short

The graph now holds real knowledge. Every project, certification, tool, and infrastructure component is connected. Phase 9 outlines the future Go port — after the Python version has proven its value.