Introduction to Natural Language Processing in Python

Version 1.0

Natural Language Processing is a sub-field of Artificial Intelligence. It is used for processing and analysing large amounts of natural language. Some applications include search engines (Google), text classification (spam filters), identifying sentiments for a product (sentiment analysis), methods for discovering abstract topics in a collection of documents (topic modelling) and machine translation technologies.

Concepts covered include cleaning, exploring datasets through methods rooted in Corpus Linguistics, and application of feature engineering techniques to transform textual data into a numerical representation. Key techniques such as word embedding and language modelling are also introduced as well as illustrations as to how they can be performed over a dataset.

Course objectives

Participants should gain competency in using core techniques to handle natural language content to undertake analysis to detect patterns and derive insights for development of applications like mentioned in course summary.

Learning objectives

  • Describe the main components of language structure;
  • Perform pre-processing (cleaning) operations on text;
  • Apply methods from Corpus Linguistics to garner greater insights on a corpus;
  • Produce word-clouds, bar charts and other basic visualisations on variables of interest;
  • Produce clusters using the k-means algorithm to uncover patterns in a corpus;
  • Transform text to vectors using approaches delineated;
  • Produce word embedding on a corpus;
  • Calculate the probability of a sentence using a language modelling approach.

Course type

E learning – Available

Self learning – Not available

Face to face – Not available

Skill level

Competency in using the Python programming language to perform basic data manipulation is required.


To discuss booking this course for remote delivery, please contact the Data Science Campus.