An important type of scientific workload is quite easy to parallelize, e.g.,
- parameter exploration, i.e., running software on a data set with many different parameter settings;
- running software on many different input files; or
- a combination of the two scenarios above.
It is in fact so easy that it is called embarrassingly parallel. Since this workload is so common, we developed some frameworks to support them, and take the bookkeeping burden of your shoulders as much as possible: worker and atools.
Learning outcomes
When you complete this training you will be able to
- to use the worker framework;
- to use atools;
- choose between both tools depending on the situations and your requirements;
- understand weak versus strong parallel scaling;
- recognize and avoid potential pitfalls.
Schedule
Total duration: 3 hours.
Subject | Duration |
---|---|
introduction and motivation | 15 min. |
worker framework | 75 min. |
coffee break | 15 min. |
atools | 45 min. |
use cases & comparison | 20 min. |
wrap up | 10 min. |
Training materials
Slides are available in the GitHub repository, as well as example code and job scripts.
Repository and documentation of the tools covered:
- worker: repository, documentation
- atools: repository, documentation
- parameter-weaver: repository, documentation
- datasink: repository, documentation
- mem_io: repository, documentation
Video materials
Video recordings of this training are available on YouTube.
- Introduction (1 minute)
- worker: parameter exploration (11 minutes)
- Get example scripts (1 minute)
- worker: Bash example (6 minutes)
- worker: MapReduce (2.5 minutes)
- worker features (17 minutes)
- parameter-weaver (5 minutes)
- worker: tuning (19 minutes)
- atools: parameter exploration (6.5 minutes)
- atools: features (7 minutes)
- atools: demo (6 minutes)
- atools: tuning (2 minutes)
- Comparison between worker and atools (2 minutes)
- File I/O and performance (4.5 minutes)
- Conclusions (1 minute)
- Implementation (3.5 minutes)
Target audience
This training is for you if you need to use HPC resources effectively for embarrassingly parallel workloads.
Prerequisites
You will need to be comfortable using Linux and the HPC environment. If necessary, attend the appropriate training sessions on those subjects.
Trainer(s)
- Geert Jan Bex (geertjan.bex@uhasselt.be)