Abstract: The Association of Fraud Examiners (ACFE) consistently estimates that organizations lose approximately 5% of their revenues due to fraud. Based on world GDP estimates, this would be anywhere from $4-5 trillion annually. Fraud is one of the most interesting problems to try and solve. Data science techniques are now at the forefront of this industry to help fight the battle against criminals in banking, cybersecurity, and more.
This course outlines the typical fraud framework at an organization and where data science can play a role as well as lay out how to build an analytically advanced fraud system. It then covers statistical and machine learning approaches to anomaly detection. Moving beyond anomaly detection, these supervised and unsupervised approaches to fraud modeling will help an organization combat the every present problem of fraud. These approaches can also be used in other industries to help find unique customers or problems that exist.
1. Introduction to Fraud and Data Preparation with Anomaly Detection:
Section-1: The Problem of Fraud - How can we analytically define fraud? There are important characteristics of fraud that puts a better perspective on the modeling and identification of fraud.
Section-2: Detection and Prevention - The two biggest pieces that any holistic fraud solution should have are detection of previous instances of fraud and prevention of new instances. This section also defines the typical fraud identification process in organizations.
Section-3: Analytical Solution - Now that we now what fraud is as well as the organizational structure of how to deal with fraud, we need to introduce the analytical approaches to becoming a mature organization on detecting and preventing fraud.
Section-4: Feature Engineering - The best way to glean information from data is to develop good features to help detect and identify fraud. We talk about and develop strategies for developing good features for anomaly detection.
Section-5: Anomaly Detection with Statistical Techniques - This section goes into details about how to detect anomalies with more classical techniques like Benford’s Law, z-scores, and Mahalanobis distances.
Section-6: Anomaly Detection with Machine Learning Techinques - This is where the biggest improvements in anomaly detection have happened over the past decade. We will start with k-Nearest Neighbors (k-NN) and the Local Outlier Factor (LOF). Then we will move into more advanced machine learning approaches to anomaly detection like isolation forests, classifier-adjusted density estimation (CADE), and one-class support vector machines (SVMs).
Section-7: Sampling Concerns - Fraud is (hopefully) a rare event in your data. This does make modeling a little harder as models may have a tendency to predict that no one will commit fraud. We need to learn how to adjust our data before-hand to better aid the model.
- Introductory R/Python
- Basic introduction to supervised modeling
- Basic introduction to classification models like logistic regression, decision trees, etc. (this isn't required, but helpful for understanding)
Bio: A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation's first Master of Science in Analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.