View on GitHub

OpenMP GPU offloading

Training material for the session 'OpenMP GPU offloading'.

Leveraging the compute power of GPU accelerators for scientific computing is becoming increasingly important. There are many programming models for GPU programming, but OpenMP is a vendor-agnostic approach that allows you to write code that can run on a wide variety of GPU hardware. In this training, we will cover GPU programming with OpenMP, and how to optimize performance.

Learning outcomes

When you complete this training you will

Schedule

Total duration: 4 hours.

Subject Duration
introduction and motivation 5 min.
GPU hardware/programming model 80 min.
OpenMP worksharing 25 min.
coffee break 10 min.
OpenMP data movements 30 min.
OpenMP kernels 10 min.
examples 60 min.
wrap up 10 min.

Training materials

The slide deck is available as a Quarto RevealJS presentation.

Slides, source code, and supporting material are available in the GitHub repository. The repository contains C and Fortran examples with CMake build files.

Target audience

This training is for you if you want to do GPU programming in C, C++, or Fortran and want a vendor-agnostic approach.

Prerequisites

You will need experience programming in C, C++, or Fortran and be familiar with the OpenMP programming model.

If you plan to do GPU programming in a Linux or HPC environment you should be familiar with these as well.

More concretely, participants should already be comfortable with the following:

Familiarity with C-style raw pointers helps a lot for the C and C++ examples. You do not need prior experience with GPU programming, OpenMP target offloading, device memory management, target data regions, or GPU-specific performance tuning. Those are part of the training itself.

Quick self-assessment

If you can do most of the tasks below without looking up basic language, OpenMP, or shell syntax, you are likely ready for this training.

If several of these items still feel difficult, the training will probably move too fast. In that case, it is better to first refresh your base language and basic OpenMP shared-memory programming.

Software and access requirements

To follow hands-on, you need a system with GPU hardware and a compiler toolchain that supports OpenMP target offloading for that hardware. The example code is organized as C and Fortran CMake projects.

More concretely, you need:

The C and Fortran example directories contain local README files with notes on setting compiler environment variables such as CC and FC when using the NVIDIA HPC SDK.

Level of the Material

For participants who already have programming experience in C, C++, or Fortran, and basic OpenMP experience, the material in this training is approximately

These percentages describe the level of the GPU-offloading and OpenMP topics covered in the training, not the required entry level in the base programming language itself.

Trainer(s)