View on GitHub

Heterogeneous programming with Kokkos

raining material for the session 'Heterogeneous programming with Kokkos'

As the diversity of hardware accelerators increases, and CPUs keep adding more cores, it becomes increasingly important to write software that can run in a heterogeneous environment. This training introduces Kokkos, a C++ programming model that lets you write parallel code for CPUs, GPUs, and other accelerators.

Kokkos provides abstractions for parallel execution and data management, and it can target different hardware platforms through backends such as Serial, OpenMP, CUDA, HIP, and SYCL.

Learning outcomes

When you complete this training, you will be able to

Schedule

Total duration: 6 hours, split over two sessions.

Session one

Total duration: 3 hours

Subject Duration
introduction and motivation 5 min.
GPU hardware and general programming model 85 min.
coffee break 10 min.
installation and building 20 min.
Kokkos views 60 min.

Session two

Total duration: 3 hours

Subject Duration
Kokkos execution patterns 30 min.
Kokkos execution spaces 30 min.
code examples 30 min.
coffee break 10 min.
Kokkos nested parallelism 30 min.
Kokkos scratch pads 20 min.
Kokkos libraries and kernels 20 min.
wrap-up and questions 10 min.

Training materials

The slide deck is available as a Quarto RevealJS presentation.

Slides, source code, environment recipes, and supporting material are available in the GitHub repository.

Target audience

This training is for you if you write C++ software and want to understand how Kokkos can help target CPUs, GPUs, and other accelerators without rewriting the core algorithm for each backend.

It is especially relevant for scientific-computing, high-performance-computing, and technical-software developers who need portable performance across changing hardware platforms.

Prerequisites

You should be familiar with C++. Familiarity with a Linux environment is also required.

More concretely, participants should already be comfortable with the following:

You do not need prior experience with Kokkos itself, execution spaces, views, team policies, device memory, or backend-specific GPU programming such as CUDA or HIP. Those are part of the training itself.

Quick self-assessment

If you can do most of the tasks below without looking up basic C++ or shell syntax, you are likely ready for this training.

If several of these items still feel difficult, the training will probably move too fast. In that case, it is better to first refresh modern C++ basics and, if needed, take a short introduction to parallel programming concepts.

Software and access requirements

To follow hands-on, you need a C++ development environment with CMake and Kokkos available. For CPU-only examples, a modern C++ compiler and a Kokkos build with Serial or OpenMP support are sufficient.

For GPU examples, you need access to appropriate accelerator hardware and a matching Kokkos backend, such as CUDA for NVIDIA GPUs or HIP for AMD GPUs. Some examples in source-code/cuda_required only compile when Kokkos is configured with CUDA support.

The repository contains example CMake projects under source-code/ and environment recipes under environments/. The environment recipes are based on the NVIDIA HPC SDK container and can be used to build a Docker, Apptainer, or Singularity image for the training.

If you use an HPC system, make sure you know how to load compilers, CMake, Kokkos, and GPU runtime modules provided by that system.

Level of the Material

For participants who already have basic to intermediate C++ programming experience, the material in this training is approximately

These percentages describe the level of the heterogeneous-programming and Kokkos topics covered in the training, not the required entry level in C++ itself.

Trainer(s)