Multi-Rig Masterclass: Orchestrating a Recording Fleet
Table of Contents
- 1. Fleet Architecture: The Hub & Spoke Model
- Why not just build one giant machine?
- 2. Shared Storage & 10GbE Networking
- The Storage Flow Diagram
- Recommended Setup:
- 3. Worker Node Configuration
- config.yaml Node Mapping
- Syncing Hooks across the Fleet
- 4. Remote Monitoring & Management
- Monitoring Checklist:
- 5. Common Fleet Failures (and Mitigation)
- Summary: The 3-Rig Test Case
A single powerhouse rig is great, but eventually, you’ll hit a hardware wall. Whether it’s VRAM limits on your GPU, PCIe lane starvation, or thermal throttling, professional archivists eventually move to a Multi-Rig Fleet.
This masterclass outlines the architecture, networking, and configuration required to manage a 3+ rig recording cluster.
1. Fleet Architecture: The Hub & Spoke Model
For a reliable fleet, we recommend the Hub & Spoke model. This separates the “Brain” from the “Muscle”:
- The Monitor Node (The Hub): A low-power machine (even a NUC or Laptop) running the Multi-Rig Monitor. It polls the workers and displays the unified HUD.
- The Worker Nodes (The Spoke): High-performance machines (dedicated GPUs) that handle the actual FFmpeg ingest and disk writes.
- The Storage Node (The Sink): A dedicated NAS or ZFS server where all recordings are saved via 10GbE.
Why not just build one giant machine?
Redundancy. If one rig in a 3-rig fleet fails, you only lose 33% of your capture capacity. If your “mega-rig” motherboard dies, you are 100% dark.
2. Shared Storage & 10GbE Networking
In a multi-rig setup, Network I/O is the primary bottleneck. 1GbE networking is insufficient for writing 10+ concurrent 4K streams to a NAS.
The Storage Flow Diagram
[Worker 1] --(10GbE)--> [Core Switch] <--(10GbE)-- [Storage Node (ZFS)]
[Worker 2] --(10GbE)--> [Core Switch]
Recommended Setup:
- Networking: Intel X540-T2 or ConnectX-3 10GbE NICs in every worker.
- Protocol: NFS (Linux) or SMB Direct with RDMA (Windows).
- ZFS Configuration: Use a 1MB recordsize on your archive pool to match CaptureGem’s large sequential write pattern.
3. Worker Node Configuration
Each worker needs a unique identity but a shared configuration logic.
config.yaml Node Mapping
# Worker Node 01
node_id: "rig-alpha-01"
api:
port: 8080
allow_remote: true
storage:
path: "/mnt/archive/node-01/" # Map to NAS share
temp_path: "/local/nvme/staging/" # Record locally, move on finish
Syncing Hooks across the Fleet
Don’t manually copy your scripts. Use a simple rsync or Git-based approach to ensure all workers are running the latest Automation Library scripts.
# Example: Syncing hooks from your Monitor Node
rsync -avz /home/user/master_hooks/ worker-01:/home/user/capturegem/hooks/
4. Remote Monitoring & Management
Enable the Monitor API on every node. Then, add them to your Unified HUD.
Monitoring Checklist:
- Heartbeat: Does every node report “LIVE”?
- IO Wait: Use
iostatto ensure the NAS isn’t stalling. - Entropy: Ensure your Proxy rotation isn’t using the same IP across multiple workers (this triggers site-level bans).
5. Common Fleet Failures (and Mitigation)
| Failure | Impact | Mitigation |
|---|---|---|
| Split-Brain | Two nodes trying to record the same model. | Centralize your watchlist management using the CaptureGem API. |
| Storage Lock | NAS pool hits 100% utilization. | Implement Automated Library Pruning on the Storage Node. |
| PCIe Lane Saturation | Worker drops frames despite low CPU. | Ensure NIC and GPU are on separate CPU-direct PCIe lanes. |
Summary: The 3-Rig Test Case
- Rig A (Ingest): 1080p Standard resolution captures (40+ streams).
- Rig B (High-Fidelity): 4K/8K High-bitrate captures (10+ streams).
- Rig C (Transcode): Handles post-processing and Cloud Archival.
By specializing your hardware, you create a workflow that is impossible to achieve on a single workstation.
Loading comments...