Python in Spark

This course will give you an understanding of Pyspark, the Python interface to the distributed processing tool Spark. With it, you will be able to handle huge data sets effortlessly, and process, query, and manipulate data which is beyond the reach of traditional programming languages.

Over two days, the course will cover the why and how of distributed processing, give a strong introduction to the key data structure of Pyspark, and teach you how to investigate data, combine it, query it and run complex transformations upon it.

With a hands-on approach, you will be writing a lot of code throughout the material, getting to immediately try out what you have just learnt, before finishing with a pair of case studies designed to combine everything you have learnt over the course.

By the end of the course, you will be able to ingest, investigate, and manipulate vast data sets to come to meaningful conclusions. You will also be able to perform simple visualisations on your data, and have the knowledge needed to handle data in an efficient, effective manner.


No prior knowledge of Pyspark or distributed processing is needed, but experience in Python is essential. An internet capable laptop is also required.

Please note that this course will require to bring your laptop.

Book now via Eventbrite.