AI in a Minefield: Learning from Poisoned Data


Data poisoning is one of the main threats on AI systems. When malicious actors have even limited control over the data used for training a model, they can try to fail the training process, prevent it from convergence, skewing the model or install so-called ML backdoors – areas where this model makes incorrect decisions, usually areas of interest for the attacker. This threat is especially applicable when security technologies use anomaly detection mechanisms on top of a normality model constructed from previously seen traffic data. When the traffic originates from unreliable sources, which may be partially controlled by malicious actors, the learning process needs to be designed under the assumption that data poisoning attempts are very likely to occur.

In this talk, we will present the challenges of learning from dirty data, overview data poisoning attacks on different systems like Spam detection, image classification and rating systems, discuss the problem of learning from web traffic - probably the dirtiest data in the world, and explain different approaches for learning from dirty data and poisoned data. We will focus on threshold-learning mitigation for data poisoning, aiming to reduce the impact of any single data source, and discuss a mundane but crucial aspect of threshold learning – memory complexity. We will present a robust learning scheme optimized to work efficiently on streamed data with bounded memory consumption. We will give examples from the web security arena with robust learning of URLs, parameters, character sets, cookies and more.


Experienced Data Scientist and Tech Lead at Imperva’s threat research group where I work on creating machine learning algorithms to help protect our customers against web app and DDoS attacks. Before joining Imperva, I obtained a B.Sc and M.Sc in Bioinformatics from Bar Ilan University.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google