Building a Multi-GPU AI Workstation on a Budget

Author: Dennis Garcia

Published: Monday, March 30, 2026

Performance

A workstation like this is considerably different from what most every armchair AI enthusiast, macintosh zealot and PC gamer with a 50-series GPU will put together. Most budget AI systems will be limited to a single GPU and likely consumer grade at best. So, to help provide a little context I have some performance numbers to share.

Ollama Inference

The first table compares the Ollama inference performance of Gemma3 12B @ Q8, Gemma3 27B @ Q6 across two GPUs 2x RTX A4500 and 1x RTX 5080.

Prompt: “Tell me about the moon”.

GPU	Model
	Gemma3 12B @ Q8	Gemma3 27B @ Q6
2x RTX A4500	38.14 Tok/s	24.85 Tok/s
1x RTX 5080	33.18 Tok/s	8.51 Tok/s

What I find interesting is that the Ollama memory management allowed me to run both Gemma3 models from GPU memory whereas if I was using vLLM I might not be able to run the 27B model at all.

The biggest difference here is the memory disparity. My RTX A4500 comes with 20GB while the RTX 5080 has 16GB and despite the faster core clock the inference speed was slightly slower. Due to the limited RTX 5080 VRAM the 27B was mostly offloaded to DRAM and run from the CPU making inference considerably slower.

Qwen Image 2512 Generation

The second chart is using the Qwen Image 2512 FP8 image generation model in ComfyUI. When generating images the system will default to using only a single GPU and will use system DRAM if the model is too large for VRAM. The GPUs are the same as before and the final image has dimensions of 1328x1328

Prompt: “D&D style dungeon background, wide shot, atmospheric, dramatic lighting

Stone walls, moss and vines covering sections of the walls, flickering torch lights casting long shadows, damp stone, rough-hewn stone blocks, arched doorway in the distance, slightly overgrown, fantasy setting, detailed, high resolution

digital painting, concept art, highly detailed, realistic textures, cinematic lighting, 8k “

Model	GPU
	RTX A4500	RTX 5080
Qwen Image 2512	237.50 Seconds	155.64 Seconds

In this test the 50-Series GPU is the clear winner and one of the primary reasons for wanting a faster GPU for this particular workload.