Duration: 3 Days

Overview: This course focuses on applying natural language processing tools and techniques to extract insights from large collections of text data. It has been designed to guide delegates through the most important topics in NLP, and how they should be applied to build real-world-relevant solutions. This course is delivered in Python.

At Course Completion: After attending this course delegates will be able to:

  • Understand the potential data analytics solutions that can be built by utilising text data and natural language processing and the problems that they solve
  • Understand the structure of text corpora and the different challenges in working with different types of text including long and short text; formal and informal text; structured, semi-structured and unstructured text; and specialized and general text.
  • Perform exploratory analysis of text collections using tokenisation, concordances, n-gram frequency counting, regular expressions, sentiment analysis, and key-word and key-phrase extraction
  • Apply core NLP techniques (including part of speech tagging, syntax parsing, language modelling, chunking, and named entity recognition) to extract features from text documents and text collections for use in machine learning models
  • Generate and utilise word embeddings from texts (using techniques such as word2vec or glove) and use pre-trained word embeddings
  • Build predictive models from text collections using different machine learning techniques (e.g. naïve Bayes models, support vector machines, and deep neural networks)
  • Perform topic modelling on large text collections and interpret and fine tune the resulting topic models
  • Utilise external resources such as WordNet, DBPedia, SentiWordNet, or the Unified Medical Language System
  • Generate text using recurrent neural networks for tasks such as machine translation or question answering

Who Should Attend: This course is relevant to people who are working with data analytics and machine learning tools but would like to harness the potential of unstructured text data as well as structured tabular data within their organisations. This course is ideally suited to people working in data analyst, data science, business analyst, statistician, or similar roles.

Prerequisites: To attend this course delegates should be familiar with fundamental concepts in data manipulation, descriptive statistics, and machine learning. Specifically, delegates should be comfortable building and evaluating classification models (using techniques such as logistic regression, decision trees, support vector machines or random forests).  In addition, delegates should be capable of writing code in the Python programming language. In particular delegates should be comfortable performing data manipulation with the pandas package and performing machine learning tasks using the scikit-learn package.

Outline: The course will run over three days and will broadly follow the timetable shown below. The course will be delivered through presentations, real world examples, discussions, and workshops.

Day Time Topic
Day 1 Morning Using Text for Data Analytics Through NLP
Exploratory Analysis of Text Collections
Afternoon Fundamentals Of NLP
Day 2 Morning Predictive Modelling with Text
Afternoon Building and Using Word Embeddings
Day 3 Morning Topic Modelling
Afternoon Generative Text Models Using Deep Neural Networks