Abstract: Fraud detection in credit card transactions is a very wide and complex field. Over the years, a number of techniques have been proposed, mostly stemming from the anomaly detection branch of data science. That said, most techniques can be reduced to two main situations depending on the available dataset:
Situation 1: The dataset has a sufficient number of fraud examples
Situation 2: The dataset has no (or just a negligible number of) fraud examples.
The first situation is more standard. Here we can deal with the problem of fraud detection with classic machine learning techniques. All supervised machine learning algorithms for classification will do, e.g. Random Forest, Logistic Regression, etc.
The second situation is a bit trickier. Here, we have no examples of fraudulent transactions, and we need to become a bit more creative. We could use techniques from the outlier detection or the anomaly detection approach, e.g. anomaly detection and isolation forests.
In this hands-on tutorial you learn how to handle both situations using either logistic regression, an isolation forest, or an autoencoder.
The tool of choice for this tutorial is the open source tool KNIME Analytics Platform. After a short introduction to the tool, we will split in two groups, each group focusing on one of the two situations.
Please bring your own laptop with KNIME Analytics Platform pre-installed. To install KNIME Analytics Platform, follow the instructions provided in these YouTube videos:
If you would like to get familiar with KNIME Analytics Platform, you can explore the content of our E-learning course (https://www.knime.com/knime-introductory-course).
Bio: Maarit Widmann is a data scientist at KNIME. She started with quantitative sociology and holds her Bachelor degree in social sciences. The University of Konstanz made her drop the "social" part when she completed her Master of Science! She now communicates concepts behind data science in videos and blog articles.