View on GitHub

Python for data science

Repository for participants of the "Python for data science" training

The Python programming language is increasingly popular. It is a versatile language for general purpose programming and accessible for novice programmers. However, it is also increasingly used for data science applications. This training introduces modules that are useful in that context.

Learning outcomes

When you complete this training you will

be able to use pandas to represent, compute with and query data;
be able to visualize data with seaborn and holoviews;
be able to create data visualizations with matplotlib and bokeh;
be able to parse textual information using regular expressions;
be able to interact with relational databases using SQLAlchemy;
be able to extract information from web pages using beautiful soup;
be able to represent and query geographical information using geopandas.

Schedule

Total duration: 4 hours.

Subject	Duration
introduction and motivation	5 min.
pandas & seaborn or polars & seaborn	105 min.
coffee break	10 min.
text parsing with regular expressions	40 min.
querying relational databases	30 min.
web scraping	10 min.
geographical information with geopandas	30 min.
wrap up	10 min.

Training materials

Slides are available in the GitHub repository, as well as example code and hands-on material.

Target audience

This training is for you if you need to use Python for data analysis.

Prerequisites

You will need experience programming in Python. This is not a training that starts from scratch. Familiarity with numpy is not required, but would be beneficial.

If you plan to do Python programming in a Linux or HPC environment you should be familiar with these as well.

For following along hands-on, you need

laptop or desktop with internet access.
a system set up so you can connect to an HPC system, an account on an HPC system (e.g., VSC, CECI, …), compute credits if that is required to run jobs on the HPC system if you want to use an HPC system;
a Python environment that can run Jupyter Lab if you want to use your own system;
access to Google Colaboratory if you prefer not to install software.

Level

Introductory: 30 %
Intermediate: 50 %
Advanced: 20 %

Trainer(s)

Geert Jan Bex (geertjan.bex@uhasselt.be)