View on GitHub

Python-for-HPC

Repository for participants of the "Python for HPC" training

Although vanilla Python is fairly slow and hence not a good candidate, there are several options to significantly increase the efficiency of Python programs.

Learning outcomes

When you complete this training you will

understand and identify performance bottlenecks of Python;
know some libraries that can help improve performance for scientific computing such as numpy, numexpr and numba;
be able to use Cython to improve your code’s performance;
be able to wrap C, C++ and Fortran code to use it from Python;
understand the opportunities and pitfalls of multi-threaded programming with Python;
be able to write distributed application using MPI;
have an understanding of how frameworks for distributed computing such as dask and pyspark work.

Schedule

Total duration: 8 hours.

Subject	Duration
introduction and motivation	5 min.
performance and profiling	25 min.
libraries	10 min.
Cython	90 min.
interfacing with C/C++/Fortran	60 min.
multi-threaded programming	60 min.
MPI	120 min.
dask	30 min.
pyspark	20 min.
wrap up	10 min.

Training materials

Slides are available in the GitHub repository, as well as example code and hands-on material.

Software environment

Instructions on how to create the required software environment are available.

Target audience

This training is for you if you need to use Python for computationally intensive scientific computing.

Prerequisites

You will need experience programming in Python, using numpy, and have a passing familiarity with C/C++. This is not a training that starts from scratch.

If you plan to do Python programming in a Linux or HPC environment you should be familiar with these as well.

Level

Introductory: 10 %
Intermeidate: 30 %
Advanced: 60 %

Trainer(s)

Geert Jan Bex (geertjan.bex@uhasselt.be)