vibescoder

AI-NT-No-Problem: Cramming a 9950X3D and RTX 5090 Into an SFF Custom Loop

·11 min read

My homelab workstation — hostname AI-NT-No-Problem — has been running a Ryzen 9 9950X3D and an RTX 5090 in an Antec Performance 1 FT full tower for months. It does local AI inference with llama.cpp, hosts my Coder server for remote development, runs Tailscale, a Cloudflare tunnel, Docker, RustDesk, and whatever else I throw at it. It's the nerve center of the whole operation.

It also sounds like a drone trying to fly away.

The RTX 5090's stock triple-fan cooler is the main offender. Under inference load — four concurrent Qwen3-Coder-30B-A3B requests pulling 386W average — those fans spin to nearly 50%. In a room where I'm trying to work, that's unacceptable. So I decided to move everything into an SFF open-frame case with custom hardline water cooling. The question wasn't whether I wanted to do it. It was whether 2×240mm slim radiators could actually handle a 575W-TDP GPU and a 16-core CPU sharing a single loop.

One way to find out: measure everything before, measure everything after, let the data decide.

The Hardware Swap

This wasn't just a cooler change — it was a full platform migration. New motherboard, new case, new form factor.

ComponentBeforeAfter
CaseAntec Performance 1 FT (full tower)Hardline Nexus Morph R2 (SFF open-frame)
MotherboardASRock X870 Pro-A WiFi (E-ATX)Asus ROG Strix X870-i (Mini-ITX)
CPU Cooling360mm AIO (dedicated)Alphacool Core 1 block + Thermal Grizzly AM5 contact frame
GPU CoolingStock NVIDIA air cooler (triple fan)Alphacool Core RTX 5090 full-cover block + KryoSheet
RadiatorsAIO-integrated 360mm2× Alphacool NexXxoS HPE-30 240mm slim (30mm)
FansAIO fans + case fans6× Alphacool Apex Stealth Metal Aurora 120mm (push-only)
Pump/ResAIO-integratedAlphacool Core Flat Reservoir 240 + VPP Apex D5
TubingN/ACorsair 14mm hardline, satin white
Coolant SensorNoneAlphacool G1/4 inline T-sensor → motherboard T_Sensor header
Cooling ArchitectureIndependent — CPU and GPU decoupledSingle shared loop — CPU and GPU thermally coupled

Unchanged: CPU (9950X3D), GPU (RTX 5090), RAM (2×32GB DDR5 SK Hynix @ 6000 MT/s EXPO), NVMe drives (Samsung 9100 PRO 2TB + Crucial P510 2TB), OS (Ubuntu 24.04.4), NVIDIA driver (590.48.01), CUDA 13.1.

The loop sequence runs reservoir → bottom rad → top rad → CPU block → GPU block → reservoir, with a stubbed drain port off the reservoir bottom. Both radiators in series before any component means coolant is maximally pre-cooled before it hits anything. CPU before GPU because the 9950X3D adds modest heat compared to the 5090 — the GPU benefits most from the coolest incoming coolant.

Fan Curve: Coolant Temp, Not CPU Temp

One detail that matters more than it sounds: the six radiator fans are controlled by coolant temperature, not CPU temperature. The Alphacool G1/4 inline temp sensor feeds the Asus X870-I's T_Sensor header, and all chassis fans follow a manual PWM curve tied to that reading:

Coolant TempFan Duty
35°C30%
40°C50%
45°C70%
50°C90%
55°C100%

This eliminates fan hunting — the rapid spin-up/spin-down you get when fans chase CPU Tctl spikes. Coolant temperature changes slowly (high thermal mass), so the fans ramp gradually. The pump runs at full speed on the AIO_PUMP header. D5 pumps are quiet at any RPM, so there's no reason to throttle it.

The Test Harness: Vibe-Coded, Obviously

I needed a reproducible thermal test I could run identically before and after the migration. So I did what I always do: I vibe-coded it with a Coder agent.

The agent SSHed from a Docker-based Coder workspace into the host, discovered all available sensors by walking /sys/class/hwmon, and wrote an 873-line bash script that polls every sensor at 1-second intervals across six sequential phases:

PhaseDurationWorkload
Idle5 minNone — baseline temps
CPU Stress10 minstress-ng all-core matrixprod
Inference10 min4× concurrent llama.cpp requests (Qwen3-Coder-30B-A3B)
Gaming10 minglmark2 via PRIME offload
Combined10 minstress-ng + inference simultaneously
Storage5 minfio mixed random+sequential on boot NVMe

Total runtime: 50 minutes. Both runs produced exactly 2,726 sensor readings. Seven bugs found and fixed during development. The migration checklist itself was also built as a Vercel web app with an API endpoint so the agent could check off steps programmatically. When I say this project was vibe-coded end to end, I mean it.

Results: The Big Picture

Here's the hero table — every key sensor, every phase, before vs. after.

PhaseSensorBefore AvgBefore MaxAfter AvgAfter MaxΔ AvgΔ Max
IdleCPU Tctl49.6°C49.8°C54.7°C56.1°C+5.1+6.3
GPU Temp54.7°C57.0°C31.9°C33.0°C-22.8-24.0
NVMe044.9°C44.9°C38.0°C38.9°C-6.9-6.0
CPU StressCPU Tctl72.2°C73.0°C73.9°C76.4°C+1.7+3.4
GPU Temp48.9°C55.0°C36.7°C39.0°C-12.2-16.0
InferenceCPU Tctl63.9°C72.6°C72.3°C75.1°C+8.4+2.5
GPU Temp64.4°C66.0°C48.7°C51.0°C-15.7-15.0
CombinedCPU Tctl73.6°C77.8°C80.1°C83.6°C+6.5+5.8
GPU Temp63.0°C66.0°C47.7°C52.0°C-15.3-14.0
StorageNVMe068.3°C69.8°C59.7°C63.9°C-8.6-5.9

Two stories: the GPU got dramatically cooler, the CPU got moderately warmer. Both within safe limits. NVMe improved across the board.

GPU: The Star of the Show

The RTX 5090 never exceeded 52°C in the after test. Under sustained inference — the workload this machine exists to run — the GPU dropped from 66°C peak to 51°C. A full-cover water block with KryoSheet graphite on the die will do that.

The GPU fan speed column is the satisfying one: 0% across all six phases. Not because the fans are off — because they don't exist anymore. The stock cooler was physically removed. Cooling is handled entirely by the water block and the loop's radiator fans. This is the single biggest contributor to the noise reduction.

But the surprise was power efficiency. Under inference, the GPU draws 26W less (386W → 360W) while doing the same work. Under combined load, the drop is 70W (380W → 310W). Lower temperatures mean the card isn't fighting thermal limits, so it boosts more cleanly at lower power. That's not just a thermal win — it's an efficiency win that reduces total heat into the loop. The system helps itself.

CPU: The Honest Tradeoff

The CPU is warmer. That's expected and I want to be upfront about it.

Before, the CPU had a dedicated 360mm AIO — 50% more radiator area all to itself, with zero thermal coupling to the GPU. Now it shares 480mm of total rad area with a GPU that dumps 360W into the loop during inference.

The worst case — combined phase peak of 83.6°C — still leaves 11.4°C of headroom below the 9950X3D's 95°C Tctl throttle limit. No throttling occurred during any test phase. Under CPU-only stress, the delta is just +1.7°C average. The CPU block and loop handle CPU-only loads almost as well as the 360mm AIO did. It's the thermal coupling during mixed workloads that creates the gap.

Does it matter for the actual workload? For AI inference, the GPU is the bottleneck — not the CPU. The CPU's job is tokenization and prompt processing, which is lightweight. Running 6-8°C warmer doesn't affect inference throughput at all.

Everything Else

NVMe temps improved 7-10°C across the board. The Asus Mini-ITX board's M.2 heatsink is effective, and the open-frame case has decent airflow around the drives. The Samsung 9100 PRO's controller hotspot hit 76.8°C peak under storage stress — down from 79.8°C, and within Samsung's 83.8°C threshold.

The motherboard swap brought sensor changes worth noting: the Asus board exposes DDR5 SPD Hub temps via the spd5118 driver (idle: 35.8°C) — the ASRock didn't. The ASRock's Realtek NIC had a hwmon temp sensor; the Intel I226-V doesn't expose one. No functional loss — NIC temps were never actionable.

Thermal capacity math vs. reality: The pre-migration estimate of 570-775W peak combined heat was conservative. The actual combined load landed at ~510W (200W CPU + 310W GPU avg) because inference doesn't push the GPU to its 575W TDP, and the cooler GPU draws less power for the same work. The inline T-sensor confirmed coolant equilibrium in the range the fan curve was designed for. The system found its own balance.

Gotchas

  1. Secure Boot MOK enrollment after motherboard swap. The NVIDIA driver is a DKMS kernel module (nvidia-dkms-590-open). After moving to the new motherboard, nvidia-smi failed with Key was rejected by service — the new board's Secure Boot database didn't have the Machine Owner Key. Fix: sudo mokutil --import /var/lib/shim-signed/mok/MOK.der, reboot, and catch the blue MOK enrollment screen before the OS boots. I missed it the first time and had to repeat the whole cycle. If you've never seen it before, you'll blow right past it.

  2. Bottom fan orientation matters in an open frame. The bottom radiator fans were initially configured exhausting downward. Corrected to exhaust upward — pull config through the rad — to create coherent bottom-to-top airflow through the open frame. Pull vs push performance delta is ~5%, but the airflow direction delta is significant.

  3. The test harness doesn't monitor the T-sensor. The inline coolant temp sensor feeds the motherboard's T_Sensor header for fan control, but it isn't exposed as a standard hwmon device that the bash script's auto-discovery picks up. I have the fan curve working correctly, but the thermal test CSV doesn't include coolant temperature as a logged column. Future improvement.

What I'd Change

  • Thicker radiators or push-pull. The 30mm slim rads in push-only are the minimum viable configuration. 45mm rads in push-pull would give substantially more thermal headroom and lower coolant equilibrium. The CPU would directly benefit.
  • Dedicated CPU loop. With unlimited budget, dual loops would eliminate the thermal coupling entirely. The CPU would get its own 240mm rad and perform close to the old 360mm AIO. But the shared loop works — it's just not optimal for the CPU.
  • Log coolant temp in the test harness. The T-sensor drives the fan curve perfectly, but I want it in the CSV for correlation analysis. That means either wiring up the motherboard's sensor reading via lm-sensors config or adding a USB temperature probe the script can poll directly.

The system went from a full tower I couldn't sit next to during inference to an SFF build that's effectively silent. The GPU runs 14-24°C cooler. The CPU runs warmer because it shares the loop, but nowhere near throttling. Power efficiency improved because the GPU doesn't fight thermal limits. The NVMe drives got cooler too, somehow.

2,726 sensor readings don't lie. AI-NT-No-Problem earned its name.

Was the CPU tradeoff worth the silence? For an inference-bound workload, I'd make the same call every time.

By the Numbers

  • 2,726 sensor readings per test run (1-second intervals, 50-minute test)
  • 873 lines of bash in the vibe-coded thermal test harness
  • 7 bugs found and fixed during script development
  • 6 test phases: idle, CPU, inference, gaming, combined, storage
  • 23°C GPU temperature drop at idle (55°C → 32°C)
  • 14°C GPU temperature drop under combined load (66°C → 52°C)
  • 70W GPU power reduction under combined load
  • 0% GPU fan speed across all phases (because the fans don't exist)
  • 11.4°C headroom to CPU throttle limit at worst case
  • 2×240mm slim radiators handling a ~510W combined thermal load
  • 1 Secure Boot MOK enrollment screen missed on first attempt
  • 1 very quiet homelab

Comments