Duration: 1 day

While many machine learning tasks, such as propensity modelling, have become standardised to the point of near automation, detecting anomalies in large complex datasets remains a fundamental challenge often requiring bespoke, creative solutions. There are however a core set of techniques and patterns that can be built upon for anomaly detection problems in domains such as fraud detection, risk identification, and classification of rare events. Through real world examples, discussions, and live code demonstrations this one-day workshop designed for analytics professionals introduces the most important natural language processing techniques.

At Course Completion:

This workshop has been designed to equip delegates with the most important natural language processing techniques, and an understanding of how they should be applied to build real-world-relevant solutions. After completing the workshop delegates will be able to:

  • Frame a wide range of problems as anomaly detection problems and determine the appropriate techniques and patterns to solve to them
  • Apply appropriate techniques to perform univariate outlier detection
  • Select and apply appropriate techniques for detecting anomalies in time series data
  • Perform anomaly detection in multivariate data using machine learning techniques
  • Design and implement solutions for anomaly detection in datasets of specific formats such as graph data or transactional data
  • Evaluate the performance of anomaly detection techniques

 

Content:

  • Statistical techniques for univariate outlier detection (e.g. Z-test)
  • Demonstration: Hunting for outliers
  • Techniques for detecting anomalies in time series data
  • Demonstration: Funding anomalies in time series data
  • Multivariate machine learning methods for anomaly detection (clustering, one class classification methods, and autoencoders)
  • Evaluating anomaly detection methods
Single-linkage cluster analysis on a gaussian-distribution-based data set

Prerequisites:

To attend this course delegates should be familiar with fundamental concepts in data manipulation, descriptive statistics, and machine learning. Specifically, delegates should be comfortable building and evaluating classification models (using techniques such as logistic regression, decision trees, support vector machines or random forests).

Demonstrations:

The live code demonstrations during the workshop will use the Python programming language and relevant Python packages (e.g. pandas, scikit-learn, and nltk). While familiarity with these is not required it would be useful. A list of specific functionality with which delegates should be familiar, and suggested online revision materials, will be circulated to delegates before the workshop.

Download Info PDF

About The Trainers

Aoife D'Arcy
Aoife D'Arcy
Aoife has over 14 years’ experience in analytics consultancy. Working with major national and international companies in banking, finance, insurance, gaming and manufacturing,
Aoife has developed particular expertise in customer insight analytics, fraud analytics, and risk analytics. Aoife’s passionate belief in the importance of developing in-house analytics talent in organisations underlies the design and delivery of her training courses. Degrees in statistics, computer science, and financial and industrial mathematics add academic rigour to all of Aoife’s work.