Abstract: Data is everywhere and its prevalence drives decisions for almost every industry. However, anomalies in data can lead to incorrect or out of date decisions to be made. Whether it is just doing exploratory data analysis and trying to clean your data, monitoring the health of a computer system to make sure things are working properly, or trying to catch fraudulent claims in life insurance, anomaly detection helps detect outliers before they can become too much of a problem for decision makers.
This course will examine anomaly detection through the example of fraud, but all these techniques can be applied to other areas as well. We will start with the importance of feature creation and transformation. We will then cover more statistical based approaches to anomaly detection. Last, we will end with more machine learning based approaches to allow the learner to approach anomalies from any angle and industry need.
1. Introduction to Fraud
The Problem of Fraud - How can we analytically define fraud? There are important characteristics of fraud that puts a better perspective on the modeling and identification of fraud.
Detection and Prevention - The two biggest pieces that any holistic fraud solution should have are detection of previous instances of fraud and prevention of new instances. This section also defines the typical fraud identification process in organizations.
Analytical Solution - Now that we now what fraud is as well as the organizational structure of how to deal with fraud, we need to introduce the analytical approaches to becoming a mature organization on detecting and preventing fraud.
2. Data Preparation
Feature Engineering - The best way to glean information from data is to develop good features to help detect and identify fraud. We talk about and develop strategies for developing good features for anomaly detection.
RFM Features - Thinking about new features in terms of recency, frequency, and monetary impact help define important characteristics of fraud. This is where the session gets interactive as participants put on their "fraudster hat" and try to think like a criminal to help develop new features.
Categorical Feature Engineering - This section will cover ways to use categorical pieces of information to create even more rich features for our anomaly detection.
3. Anomaly Models
Non-statistical Techniques - This section covers Benford's Law and why it was used (and still is) for basic anomaly detection.
Univariate Analysis - When addressing anomalies for one variable at a time, we can use a variety of techniques. This section covers z-scores, robust z-scores, the IQR Rule, and the adjusted IQR rule.
Multivariate Analysis - This is where the biggest improvements in anomaly detection have happened over the past decade. We will start with more statistical approaches like Mahalanobis distances (and their robust counterparts) as well as k-Nearest Neighbors (k-NN) and the Local Outlier Factor (LOF). Then we will move into more advanced machine learning approaches to anomaly detection like isolation forests and classifier-adjusted density estimation (CADE).
Wrap-up - Here will will summarize everything we have done to build up our anomaly detection as well as hint towards the next course in more advanced fraud detection models.
Introductory knowledge to statistics to understand means and standard deviations. Introduction to basic machine learning to grasp the concepts of the advanced anomaly detection. Familiarity with either Python or R.
Bio: Bio Coming Soon!