View on GitHub

Deploying Large-Language Models locally

Material for a training on AI tools

Large Language Models (LLMs) are a class of machine learning models that have recently gained a lot of attention. These models are trained on large amounts of data and after training can be used in many applications.

Although models from OpenAI and Google can be used as services online, it is often desirable to have a model that can be used offline. This training will show you how to deploy and use such models locally.

Learning outcomes

When you complete this training you will

understand what LLMs are and how they are trained;
be able to use a pre-trained LLM for text generation;
be able to use Retrieval Augmented Generation (RAG) for question answering on your own data;
understand how quantization works and how it can be used to reduce the size of a model;
be able to fine-tune a pre-trained LLM for a specific task.

Schedule

Total duration: 6.5 hours.

Subject	Duration
preparation	20 min.
introduction and motivation	10 min.
neural networks: the basics	15 min.
large language models	90 min.
local deployment	30 min.
simple applications	10 min.
Retrieval Augmented Generation (RAG)	50 min.
quantization	60 min.
fine-tuning models	60 min.
wrap up	10 min.

Training materials

Slides are available in the GitHub repository, as well as example code and hands-on material.

Target audience

This training is for you if you need to deploy Larqe Language Models (LLMs) on your own infrastructure.

Prerequisites

You will need experience programming in Python. This is not a training that starts from scratch.

Familiarity with Linux or HPC environments is recommented.

Quick self-assessment

If you can do most of the tasks below, you are likely ready for this training.

run Python code in a script or notebook;
install or activate a Python environment and import installed packages;
use the command line to run commands and inspect output;
explain at a high level what a machine-learning model does during inference;
understand the difference between local files and online services;
work with text files or small document collections as input data;
log in to a remote Linux or HPC system if that is where the examples will run;
make a small change to an example command or script and run it again.

If several of these items still feel difficult, the training will probably move too fast. In that case, it is better to first refresh basic Python, command-line use, and the local or remote environment you plan to use.

Software and access requirements

For following along hands-on, you need

laptop or desktop with internet access.
a system set up so you can connect to an HPC system, an account on an HPC system (e.g., VSC, CECI, …), compute credits if that is required to run jobs on the HPC system if you want to use an HPC system;
a Python environment that can run Jupyter Lab if you want to use your own system (note that you would require a GPU for most of the examples to work);
access to Google Colaboratory if you prefer not to install software.

Level

Introductory: 30 %
Intermetiate: 40 %
Advanced: 30 %

Trainer(s)

Geert Jan Bex (geertjan.bex@uhasselt.be)