Fine-Tuning LLMs on a Local Multi-GPU AI Workstation

Author: Dennis Garcia

Published: Wednesday, April 8, 2026

Introduction

What fun is having an AI workstation if you don’t dive deeply into what makes AI work. Inference is maybe 98% of AI tasks from image generation to machine learning and even basic chats and summaries. We rely on model makers to do the heavy lifting and provide models that we can use, either in the cloud or locally.

Fine Tune Training is a way to customize an LLM for a particular task. The model will transfer from being a general purpose model to something domain specific and may not even function the same as before. A good example would be to teach a LLM how to read and write FORTAN or create a model that can understand the Classic ASP built by a cowboy coder from the early 2000’s. Point is, by changing the model you change how it responds and narrow its usable focus to reduce hallucinations.

Be sure to check out my previous AI focused articles for more information on the build I am using and software packages I am working with.

Fine Tune Training

Training LLMs is extremely memory intensive often requiring roughly 20x the memory needed to simply inference the model.

Model Parameter	Inference (FP16)	Training (FP16)
4B	8 GB	80 GB
8B	16 GB	160 GB
12B	24 GB	240 GB
30B	60 GB	600 GB
70B	140 GB	1400 GB

When training in the cloud, you’ll use these metrics to determine what class of VM you will need to complete the training. As you can imagine, this costs money and despite how simple the process is, you will find that fine tune training is an iterative process and will require a fair amount of trial and error to get correct.

Due to my aversion to spend money on cloud services I have started exploring options to train models offline and discover the unique hardware demands to make that happen.