View on GitHub

worker-and-atools

Repository for participants of the "worker & atools" training

An important type of scientific workload is quite easy to parallelize, e.g.,

parameter exploration, i.e., running software on a data set with many different parameter settings;
running software on many different input files; or
a combination of the two scenarios above.

It is in fact so easy that it is called embarrassingly parallel. Since this workload is so common, we developed some frameworks to support them, and take the bookkeeping burden of your shoulders as much as possible: worker and atools.

Learning outcomes

When you complete this training you will be able to

to use the worker framework;
to use atools;
choose between both tools depending on the situations and your requirements;
understand weak versus strong parallel scaling;
recognize and avoid potential pitfalls.

Schedule

Total duration: 3 hours.

Subject	Duration
introduction and motivation	15 min.
worker framework	75 min.
coffee break	15 min.
atools	45 min.
use cases & comparison	20 min.
wrap up	10 min.

Training materials

Slides are available in the GitHub repository, as well as example code and job scripts.

Repository and documentation of the tools covered:

worker: repository, documentation
atools: repository, documentation
parameter-weaver: repository, documentation
datasink: repository, documentation
mem_io: repository, documentation

Video materials

Video recordings of this training are available on YouTube.

Introduction (1 minute)
worker: parameter exploration (11 minutes)
Get example scripts (1 minute)
worker: Bash example (6 minutes)
worker: MapReduce (2.5 minutes)
worker features (17 minutes)
parameter-weaver (5 minutes)
worker: tuning (19 minutes)
atools: parameter exploration (6.5 minutes)
atools: features (7 minutes)
atools: demo (6 minutes)
atools: tuning (2 minutes)
Comparison between worker and atools (2 minutes)
File I/O and performance (4.5 minutes)
Conclusions (1 minute)
Implementation (3.5 minutes)

Target audience

This training is for you if you need to use HPC resources effectively for embarrassingly parallel workloads.

Prerequisites

You will need to be comfortable using Linux and the HPC environment. If necessary, attend the appropriate training sessions on those subjects.

Trainer(s)

Geert Jan Bex (geertjan.bex@uhasselt.be)