An important type of scientific workload is quite easy to parallelize, e.g.,
- parameter exploration, i.e., running software on a data set with many different parameter settings;
- running software on many different input files; or
- a combination of the two scenarios above.
It is in fact so easy that it is called embarrassingly parallel. Since this workload is so common, we developed some frameworks to support them, and take the bookkeeping burden of your shoulders as much as possible: worker and atools.
When you complete this training you will be able to
- to use the worker framework;
- to use atools;
- choose between both tools depending on the situations and your requirements;
- understand weak versus strong parallel scaling;
- recognize and avoid potential pitfalls.
Total duration: 3 hours.
|introduction and motivation||15 min.|
|worker framework||75 min.|
|coffee break||15 min.|
|use cases & comparison||20 min.|
|wrap up||10 min.|
Slides are available in the GitHub repository, as well as example code and job scripts.
Repository and documentation of the tools covered:
- worker: repository, documentation
- atools: repository, documentation
- parameter-weaver: repository, documentation
- datasink: repository, documentation
- mem_io: repository, documentation
Video recordings of this training are available on YouTube.
- Introduction (1 minute)
- worker: parameter exploration (11 minutes)
- Get example scripts (1 minute)
- worker: Bash example (6 minutes)
- worker: MapReduce (2.5 minutes)
- worker features (17 minutes)
- parameter-weaver (5 minutes)
- worker: tuning (19 minutes)
- atools: parameter exploration (6.5 minutes)
- atools: features (7 minutes)
- atools: demo (6 minutes)
- atools: tuning (2 minutes)
- Comparison between worker and atools (2 minutes)
- File I/O and performance (4.5 minutes)
- Conclusions (1 minute)
- Implementation (3.5 minutes)
This training is for you if you need to use HPC resources effectively for embarrassingly parallel workloads.
You will need to be comfortable using Linux and the HPC environment. If necessary, attend the appropriate training sessions on those subjects.
- Geert Jan Bex (firstname.lastname@example.org)