The Python programming language is increasingly popular. It is a versatile language for general purpose programming and accessible for novice programmers. However, it is also increasingly used for data science applications. This training introduces modules that are useful in that context.
When you complete this training you will
- be able to use pandas to represent, compute with and query data;
- be able to visualize data with seaborn and holoviews;
- be able to create data visualizations with matplotlib and bokeh;
- be able to parse textual information using regular expressions;
- be able to interact with relational databases using SQLAlchemy;
- be able to extract information from web pages using beautiful soup;
- be able to represent and query geographical information using geopandas.
Total duration: 4 hours.
|introduction and motivation||5 min.|
|pandas & seaborn||105 min.|
|coffee break||10 min.|
|text parsing with regular expressions||40 min.|
|querying relational databases||30 min.|
|web scraping||10 min.|
|geographical information with geopandas||30 min.|
|wrap up||10 min.|
Slides are available in the GitHub repository, as well as example code and hands-on material.
This training is for you if you need to use Python for data analysis.
You will need experience programming in Python. This is not a training that starts from scratch. Familiarity with numpy is not required, but would be beneficial.
If you plan to do Python programming in a Linux or HPC environment you should be familiar with these as well.
- Geert Jan Bex (firstname.lastname@example.org)