Introduction to Natural Language Processing in Python
Natural Language Processing is a sub-field of Artificial Intelligence. It is used for processing and analysing large amounts of natural language. Some applications include search engines (Google), text classification (spam filters), identifying sentiments for a product (sentiment analysis), methods for discovering abstract topics in a collection of documents (topic modelling) and machine translation technologies.
Concepts covered include cleaning, exploring datasets through methods rooted in Corpus Linguistics, and application of feature engineering techniques to transform textual data into a numerical representation. Key techniques such as word embedding and language modelling are also introduced as well as illustrations as to how they can be performed over a dataset.
Participants should gain competency in using core techniques to handle natural language content to undertake analysis to detect patterns and derive insights for development of applications like mentioned in course summary.
- Describe the main components of language structure;
- Perform pre-processing (cleaning) operations on text;
- Apply methods from Corpus Linguistics to garner greater insights on a corpus;
- Produce word-clouds, bar charts and other basic visualisations on variables of interest;
- Produce clusters using the k-means algorithm to uncover patterns in a corpus;
- Transform text to vectors using approaches delineated;
- Produce word embedding on a corpus;
- Calculate the probability of a sentence using a language modelling approach.
E learning – Available
Self learning – Not available
Face to face – Not available
Competency in using the Python programming language to perform basic data manipulation is required.
To discuss booking this course for remote delivery, please contact the Data Science Campus.