Building a Multi-GPU AI Workstation on a Budget
Author: Dennis GarciaPerformance
A workstation like this is considerably different from what most every armchair AI enthusiast, macintosh zealot and PC gamer with a 50-series GPU will put together. Most budget AI systems will be limited to a single GPU and likely consumer grade at best. So, to help provide a little context I have some performance numbers to share.
The first table compares the Ollama inference performance of Gemma3 12B @ Q8, Gemma3 27B @ Q6 across two GPUs 2x RTX A4500 and 1x RTX 5080.
Prompt: “Tell me about the moon”.
| GPU | Model | |
| Gemma3 12B @ Q8 | Gemma3 27B @ Q6 | |
| 2x RTX A4500 | 38.14 Tok/s | 24.85 Tok/s |
| 1x RTX 5080 | 33.18 Tok/s | 8.51 Tok/s |
What I find interesting is that the Ollama memory management allowed me to run both Gemma3 models from GPU memory whereas if I was using vLLM I might not be able to run the 27B model at all.
The biggest difference here is the memory disparity. My RTX A4500 comes with 20GB while the RTX 5080 has 16GB and despite the faster core clock the inference speed was slightly slower. Due to the limited RTX 5080 VRAM the 27B was mostly offloaded to DRAM and run from the CPU making inference considerably slower.
The second chart is using the Qwen Image 2512 FP8 image generation model in ComfyUI. When generating images the system will default to using only a single GPU and will use system DRAM if the model is too large for VRAM. The GPUs are the same as before and the final image has dimensions of 1328x1328
Prompt: “D&D style dungeon background, wide shot, atmospheric, dramatic lighting
Stone walls, moss and vines covering sections of the walls, flickering torch lights casting long shadows, damp stone, rough-hewn stone blocks, arched doorway in the distance, slightly overgrown, fantasy setting, detailed, high resolution
digital painting, concept art, highly detailed, realistic textures, cinematic lighting, 8k “
| Model | GPU | |
| RTX A4500 | RTX 5080 | |
| Qwen Image 2512 | 237.50 Seconds | 155.64 Seconds |
In this test the 50-Series GPU is the clear winner and one of the primary reasons for wanting a faster GPU for this particular workload.

