• articles
  • general information
  • Building a Multi-GPU AI Workstation on a Budget
  • Building a Multi-GPU AI Workstation on a Budget

    Author:
    Published:

    Performance

    A workstation like this is considerably different from what most every armchair AI enthusiast, macintosh zealot and PC gamer with a 50-series GPU will put together.  Most budget AI systems will be limited to a single GPU and likely consumer grade at best.  So, to help provide a little context I have some performance numbers to share. 

    Ollama Inference

    The first table compares the Ollama inference performance of Gemma3 12B @ Q8, Gemma3 27B @ Q6 across two GPUs  2x RTX A4500 and 1x RTX 5080. 

    Prompt: “Tell me about the moon”.

    GPUModel
     Gemma3 12B @ Q8Gemma3 27B @ Q6
    2x RTX A450038.14 Tok/s24.85 Tok/s
    1x RTX 508033.18 Tok/s8.51 Tok/s

    What I find interesting is that the Ollama memory management allowed me to run both Gemma3 models from GPU memory whereas if I was using vLLM I might not be able to run the 27B model at all.

    The biggest difference here is the memory disparity.  My RTX A4500 comes with 20GB while the RTX 5080 has 16GB and despite the faster core clock the inference speed was slightly slower.  Due to the limited RTX 5080 VRAM the 27B was mostly offloaded to DRAM and run from the CPU making inference considerably slower.

    Qwen Image 2512 Generation

    The second chart is using the Qwen Image 2512 FP8 image generation model in ComfyUI.  When generating images the system will default to using only a single GPU and will use system DRAM if the model is too large for VRAM.  The GPUs are the same as before and the final image has dimensions of 1328x1328

    Prompt: “D&D style dungeon background, wide shot, atmospheric, dramatic lighting

    Stone walls, moss and vines covering sections of the walls, flickering torch lights casting long shadows, damp stone, rough-hewn stone blocks, arched doorway in the distance, slightly overgrown, fantasy setting, detailed, high resolution

    digital painting, concept art, highly detailed, realistic textures, cinematic lighting, 8k “

    ModelGPU
     RTX A4500RTX 5080
    Qwen Image 2512237.50 Seconds155.64 Seconds

    In this test the 50-Series GPU is the clear winner and one of the primary reasons for wanting a faster GPU for this particular workload.