• articles
  • general information
  • Fine-Tuning LLMs on a Local Multi-GPU AI Workstation
  • Fine-Tuning LLMs on a Local Multi-GPU AI Workstation

    Author:
    Published:

    Introduction

    What fun is having an AI workstation if you don’t dive deeply into what makes AI work.  Inference is maybe 98% of AI tasks from image generation to machine learning and even basic chats and summaries.  We rely on model makers to do the heavy lifting and provide models that we can use, either in the cloud or locally.

    Fine Tune Training is a way to customize an LLM for a particular task.  The model will transfer from being a general purpose model to something domain specific and may not even function the same as before.  A good example would be to teach a LLM how to read and write FORTAN or create a model that can understand the Classic ASP built by a cowboy coder from the early 2000’s.  Point is, by changing the model you change how it responds and narrow its usable focus to reduce hallucinations.

    Be sure to check out my previous AI focused articles for more information on the build I am using and software packages I am working with.

    Fine Tune Training

    Training LLMs is extremely memory intensive often requiring roughly 20x the memory needed to simply inference the model. 

    Model ParameterInference (FP16)Training (FP16)
    4B8 GB80 GB
    8B16 GB160 GB
    12B24 GB240 GB
    30B60 GB600 GB
    70B140 GB1400 GB


    When training in the cloud, you’ll use these metrics to determine what class of VM you will need to complete the training.  As you can imagine, this costs money and despite how simple the process is, you will find that fine tune training is an iterative process and will require a fair amount of trial and error to get correct.

    Due to my aversion to spend money on cloud services I have started exploring options to train models offline and discover the unique hardware demands to make that happen.