Text Analysis Learning Journey


It is a truth well established that today’s world is awash with data and just about all of it is in the form of unstructured text. Textual data is overflowing from all fronts: social media outlets to blogs to emails. To tap into the knowledge and insights embedded in the text, analysts, economist and scientists of all description require basic skills in handling textual data. The Natural Language Processing (NLP) series of courses offers a comprehensive introduction into how to unlock the value of this ubiquitous but overlooked resource. 

This learning journey covers how to preprocess text, parse to extract linguistic structures of interest and how to perform robust exploratory analysis in order to determine key features in the body of text under examination. The intermediate course imparts skills in building up matrices with linguistic features of interest, driving downstream machine learning applications. Key concepts such as language models that inform ‘intelligent’ conversational agents and other application are explained. The advanced courses offer stand-alone modules in key specialist areas such as topic modelling, information retrieval, sentiment analysis and more. 

Learning outcomes

Learners should be able to:

  • Demonstrate a thorough understanding of natural language processing (NLP) principles in a computational setup
  • Ingest, clean and pre-process text data using industry standard open-source tools and libraries
  • Understand and practice standard statistical machine learning approaches including decision trees, regression, classification and clustering
  • Apply statistical machine learning principles to natural language data for a variety of use cases.

Pathway detail

This pathway takes learners from an introduction to the R or Python programming frameworks through to intermediate natural language processing, utilising machine learning to derive insight from large volumes of text. After covering the fundamental principles in working with the programming framework of choice, essential data preparation techniques are covered within the Introduction to NLP courses. Once the preparatory NLP techniques have been embedded, exploratory data analysis and statistical techniques are used to assist in formulating and testing hypotheses about trends in data. The machine learning courses provide a firm basis In the capability to identify patterns in big data, using popular machine learning libraries to identify trends where human perception is limited. The pathway culminates in intermediate NLP, deploying the power of ML with text data, unlocking the potential of the world’s most ubiquitous data type – unstructured text.

Courses in this learning journey

This pathway can be completed using either the R or Python programming languages

Text Analysis Learning Journey in R

Course name Skill level Duration
Introduction to R Beginner 2 days
Introduction to Natural Language Processing in R Intermediate 2 days
Statistics in R Intermediate 16 hours
Introduction to Machine Learning in R Intermediate 3 days

Text Analysis Learning Journey in Python

Course name Skill level Duration
Introduction to Python Beginner 2 days
Introduction to NLP in Python Intermediate 2 days
Statistics in Python Intermediate 16 hours
Introduction to Machine Learning in Python Intermediate 3 days
Natural Language Processing in Python Intermediate 4 to 6 hours