Duration: 1 day

An enormous amount of the data that we create in our personal and professional lives is unstructured free text data. Too often, the potential of text data lies untapped as it is more difficult to analyse than more accessible tabular data. The term natural language processing (NLP) refers to a set of tools and techniques that can be used to unlock insight in text data. Through real world examples, discussions, and live code demonstrations this one-day workshop designed for analytics professionals introduces the most important natural language processing techniques.

At Course Completion:

This workshop has been designed to equip delegates with the most important natural language processing techniques, and an understanding of how they should be applied to build real-world-relevant solutions. After completing the workshop delegates will be able to:

  • Understand the potential data analytics solutions that can be built by utilising text data and natural language processing and the problems that they solve
  • Understand the structure of text corpora and the different challenges in working with different types of text including long and short text; formal and informal text; structured, semi-structured and unstructured text; and specialized and general text.
  • Perform exploratory analysis of text collections using tokenisation, concordances, n-gram frequency counting, regular expressions, sentiment analysis, and key-word and key-phrase extraction
  • Apply core NLP techniques (including part of speech tagging, syntax parsing, language modelling, chunking, and named entity recognition) to extract features from text documents and text collections for use in machine learning models
  • Generate and utilise word embeddings from texts (using techniques such as word2vec or glove) and use pre-trained word embeddings
  • Build predictive models from text collections using different machine learning techniques (e.g. naïve Bayes models, support vector machines, and deep neural networks)
  • Perform topic modelling on large text collections and interpret and fine tune the resulting topic models
  • Utilise external resources such as WordNet and DBPedia


  • Using text for data analytics through NLP
  • Exploratory analysis of text collections
  • Fundamentals of NLP
  • Demonstration: Exploring, explaining, and exploiting a large text corpus
  • Predictive modelling for text
  • Workshop: Building a text categorisation
  • Building and using word embeddings
  • Demonstration: [king – man + woman = queen] exploring word embeddings
  • Topic modelling
  • Demonstration: Finding topics in a large text corpus
  • Future directions in natural language processing
Probabilistic Topic Models


To attend this course delegates should be familiar with fundamental concepts in data manipulation, descriptive statistics, and machine learning. Specifically, delegates should be comfortable building and evaluating classification models (using techniques such as logistic regression, decision trees, support vector machines or random forests).


The live code demonstrations during the workshop will use the Python programming language and relevant Python packages (e.g. pandas, scikit-learn, and nltk). While familiarity with these is not required it would be useful. A list of specific functionality with which delegates should be familiar, and suggested online revision materials, will be circulated to delegates before the workshop.

Download Info PDF

About The Trainers

Prof. John Kelleher
Prof. John Kelleher
Prof. John Kelleher is a lecturer in the School of Computing at the Dublin Institute of Technology (DIT), and is a founding member of the Applied Intelligence Research Centre at DIT. John is co-author of the text book Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies published by MIT Press in 2015. John has conducted research and presented work internationally in the broad areas of Artificial Intelligence, Data Analytics, Natural Language Processing and Computational Linguistics. Some of the specific topics in which John has worked are:
• machine learning
• machine translation
• activity recognition
• grounding language in perception
• reference resolution and generation
• dialog systems and human robot interaction
• spatial cognition and computational models of spatial semantics