Duration: 1 day
While many machine learning tasks, such as propensity modelling, have become standardised to the point of near automation, detecting anomalies in large complex datasets remains a fundamental challenge often requiring bespoke, creative solutions. There are however a core set of techniques and patterns that can be built upon for anomaly detection problems in domains such as fraud detection, risk identification, and classification of rare events. Through real world examples, discussions, and live code demonstrations this one-day workshop designed for analytics professionals introduces the most important natural language processing techniques.
At Course Completion:
This workshop has been designed to equip delegates with the most important natural language processing techniques, and an understanding of how they should be applied to build real-world-relevant solutions. After completing the workshop delegates will be able to:
- Frame a wide range of problems as anomaly detection problems and determine the appropriate techniques and patterns to solve to them
- Apply appropriate techniques to perform univariate outlier detection
- Select and apply appropriate techniques for detecting anomalies in time series data
- Perform anomaly detection in multivariate data using machine learning techniques
- Design and implement solutions for anomaly detection in datasets of specific formats such as graph data or transactional data
- Evaluate the performance of anomaly detection techniques
- Statistical techniques for univariate outlier detection (e.g. Z-test)
- Demonstration: Hunting for outliers
- Techniques for detecting anomalies in time series data
- Demonstration: Funding anomalies in time series data
- Multivariate machine learning methods for anomaly detection (clustering, one class classification methods, and autoencoders)
- Evaluating anomaly detection methods
To attend this course delegates should be familiar with fundamental concepts in data manipulation, descriptive statistics, and machine learning. Specifically, delegates should be comfortable building and evaluating classification models (using techniques such as logistic regression, decision trees, support vector machines or random forests).
The live code demonstrations during the workshop will use the Python programming language and relevant Python packages (e.g. pandas, scikit-learn, and nltk). While familiarity with these is not required it would be useful. A list of specific functionality with which delegates should be familiar, and suggested online revision materials, will be circulated to delegates before the workshop.