Documentation Scraper

Overview

The netapi docs command group provides a universal documentation scraper with site-specific optimizations.

Commands

ise

Scrape Cisco ISE Admin Guide:

# All chapters
netapi docs ise --version 3.2 --chapters all

# Specific chapter
netapi docs ise --version 3.2 --chapters 2

# Markdown output
netapi docs ise --version 3.2 --format markdown --chapters 1

arch

Scrape Arch Linux Wiki:

netapi docs arch pacman --output-dir /tmp/arch-docs
netapi docs arch "System maintenance" --output-dir /tmp/arch-docs

github

Scrape GitHub repository docs:

netapi docs github pallets/flask --output-dir /tmp/flask-docs
netapi docs github pallets/flask --docs-path docs --output-dir /tmp/docs

scrape

Generic single page scraper:

netapi docs scrape "https://example.com/docs/page" \
  --selector "article.content" \
  --output-dir /tmp/docs

scrape-guide

Multi-chapter guide scraper:

netapi docs scrape-guide "https://docs.example.com" \
  --toc-selector "nav.toc a" \
  --output-dir /tmp/guide

Supported Sites

The scraper auto-detects optimal CSS selectors for common sites:

Site Selector Notes

wiki.archlinux.org

#mw-content-text

Wiki pages

man.archlinux.org

main.container

Man pages

imslp.org

#wiki-body

Sheet music wiki

docs.python.org

div.body

Python docs

readthedocs.io

div.document

RTD projects

github.com

article.markdown-body

READMEs

developer.cisco.com

article.content

DevNet

Output Formats

Format Description

asciidoc

AsciiDoc (default) - Best for Antora

markdown

Markdown - GitHub, Obsidian compatible

Options

Option Default Description

--output-dir

./docs

Output directory

--format

asciidoc

Output format

--selector

(auto)

Custom CSS selector

--delay

1.0

Delay between requests (seconds)