Statistics for Data Science

Version 1.0

This course introduces the basics of carrying out a statistical analysis in Python. It covers exploratory data analysis, constructing and interpreting linear and generalised linear models, and introduces Bayesian modelling.

Course objectives

By the end of the course students will be comfortable implementing and interpreting linear models and generalised linear model in Python and be familiar with the concepts of Bayesian modelling.

Learning objectives

  • What is tidy data?
  • What is a variable, value, and observation?
Several Python commands to explore the structure of the data
  • What is the difference between a continuous and categorical variable?
  • What is variation and covariation?
  • Where Exploratory Data Analysis fits within data analysis;
  • How to use plots to explore variation in continuous variable and categorical variables;
  • How to use plots to explore covariation between two categorical variables, two continuous variables or a categorical and continuous variable.
Model Basics
  • What is a model family and fitted model?
  • What is the difference between a response and an explanatory variable?
Model Construction
  • How to construct a linear model in Python;
  • What are the slope and intercept in a linear model?
  • Picking out key information from the model table;
  • How to extract specific parameters from the model object;
  • How to construct a linear model in Python;
  • What are the slope and intercept in a linear model?
  • Picking out key information from the model table;
  • How to extract specific parameters from the model object.
Assessing Model Fit
  • How to inspect model residuals to assess model fit;
  • How to pick out key information from the table from a fitted model;
  • How to use Adjusted R-squared and AIC to compare models;
  • How to inspect model residuals to assess model fit;
  • How to pick out key information from the table from a fitted model;
  • How to use Adjusted R-squared and AIC to compare models;
  • What is probability?
  • What is a random variable?
  • What a probability distribution is and how it differs for continuous vs. discrete random variables;
  • Be familiar with several common probability distributions used to model variation in the response variable Binomial Normal Poisson Negative Binomial;
  • How to implement a generalised linear model in Python;
  • What is Bayes’s rule and how it is used in Bayesian statistics;
  • How Bayesian and Frequentist schools of thought differ;
  • How to implement a simple Bayesian linear model in Python.

Course type

E learning – Not available

Self learning – Not available

Face to face – Available

Skill level

Participants should be familiar with Python but do not need any prior statistical training.

Course materials

All course materials can be found on the Data Science Campus Github page

Booking

To discuss booking this course for remote delivery, please contact the Data Science Campus.