Optimizing for Dual-GPU Recording: Scaling to 100+ Concurrent Streams
Table of Contents
- The Dual-GPU Architecture
- Optimal Hardware Combinations
- Mastering CaptureGem GPU Flags
- Targeting Specific Devices
- The config.yaml Multi-GPU Map
- NVENC Session Management
- 💡 Pro Tip: VRAM is the Real Limit
- Avoiding PCIe Bandwidth Starvation
- Operating System Setup
- Docker Passthrough
- Benchmarks: Single vs. Dual
- Cooling & Power
- Summary Checklist
When your recording rig scales beyond 30-40 concurrent streams, a single GPU—even a powerhouse like the RTX 4090—can become a bottleneck. Not necessarily due to raw performance, but because of NVENC Session Limits and PCIe bandwidth starvation.
This masterclass covers how to architect a dual-GPU setup to effectively double your capture capacity.
The Dual-GPU Architecture
In a professional multi-GPU setup, we move away from the “One Card Does All” approach. Instead, we define Primary and Secondary roles:
- Primary GPU (The Ingestor): Responsible for high-fidelity 4K/8K recording and initial stream capture.
- Secondary GPU (The Transcoder): Handles background transcoding (e.g., converting RAW to HEVC for cloud archival) or managing lower-priority 1080p captures.
Optimal Hardware Combinations
| Role | Recommended GPU | Why? |
|---|---|---|
| Primary | RTX 4090 / 4080 | 24GB VRAM for high-res buffers; Dual AV1 encoders. |
| Secondary | RTX 4060 Ti (16GB) | High VRAM-to-Price ratio; 8th Gen NVENC for efficient transcoding. |
Mastering CaptureGem GPU Flags
CaptureGem allows you to target specific GPUs using FFmpeg-level device mapping.
Targeting Specific Devices
To assign a specific model recording to your second GPU, use the following flag in your config.yaml or CLI:
# CaptureGem CLI example targeting Device 1 (Secondary GPU)
capturegem --model "ModelName" --gpu 1 --ffmpeg-flags "-c:v h264_nvenc -gpu 1"
The config.yaml Multi-GPU Map
# Balanced load distribution
instances:
- id: "rig-alpha"
gpu_id: 0
models: ["Model_A", "Model_B", "Model_C"]
- id: "rig-beta"
gpu_id: 1
models: ["Model_D", "Model_E", "Model_F"]
NVENC Session Management
Consumer NVIDIA cards are typically locked to 5-8 concurrent NVENC sessions. While some driver “hacks” exist to bypass this, the safest professional approach is splitting the load.
💡 Pro Tip: VRAM is the Real Limit
Even if you unlock sessions, you will hit a VRAM wall. A 4K stream buffer typically consumes ~400MB of VRAM. An RTX 4090 (24GB) can safely handle ~45-50 high-bitrate sessions before encountering
Out of Memoryerrors in the logs.
Avoiding PCIe Bandwidth Starvation
Most consumer motherboards (Z790/X670) offer a single x16 slot wired to the CPU. The second physical x16 slot often runs at x4 through the chipset.
The Bottleneck: If your primary GPU is saturating the PCIe bus with multi-stream writes, your secondary GPU may suffer from high latency, leading to dropped frames even if its CPU usage is low.
The Solution: Use a Threadripper or Xeon platform with 64+ lanes, or ensure your Secondary GPU is handling lower-bitrate tasks that don’t saturate the chipset link.
Operating System Setup
Docker Passthrough
When running in containers, you must pass the specific UUIDs to ensure Docker doesn’t “fight” over the primary device.
# docker-compose.yml snippet
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1'] # Maps both GPUs to the container
capabilities: [compute, utility, video]
Benchmarks: Single vs. Dual
| Metric | Single RTX 4090 | Dual (4090 + 4060 Ti) | Improvement |
|---|---|---|---|
| Max Stable 4K Streams | 38 | 62 | +63% |
| Transcode Latency | 120ms | 45ms | -62% |
| System Thermal Load | 82°C | 74°C (Distributed) | -8°C |
Cooling & Power
A dual-GPU rig can pull 800W+ under full recording load.
- PSU: Use a 1200W Platinum rated unit.
- Spacing: Ensure at least 2 slots of air between cards. Blowers are preferred for the bottom card to exhaust heat directly out the rear.
Summary Checklist
- Assign IDs: Map your high-priority models to GPU 0.
- Set Flags: Use the
-gpuflag in your FFmpeg strings. - Monitor VRAM: Use
nvidia-smito ensure neither card exceeds 90% utilization. - Check Lanes: Verify PCIe link speed in GPU-Z.
Loading comments...