Abstract: Machine Learning has achieved incredible feats in predicting real-world phenomena; however, sometimes accurate predictions are insufficient and a true understanding is necessary. Lay-offs guided by neural networks have resulted in class-lawsuits, and an opaque logic in lending has brought fines to companies deploying contemporary Machine Learning technology. These are only two of many examples where public and regulators require transparency in decision-making logic, while the most popular Machine Learning tools are completely opaque in this respect. Big Data is valued for its richness and complexity therefore extracting value for it poses an inherent problem under the constraint of transparency.
The current most common approach to this problem is hypothesis-led science where novel hypotheses are mostly being derived from lower-level principles (i.e. behaviour of a molecule based on behaviour of atoms). In the study of complex systems, however, such as financial markets or systems-level biology, it is often impractical to develop hypotheses in this way (i.e. market behaviour from psychology of market actors). Instead, the hypotheses have to be inspired by the data itself - by Exploratory Data Analysis.
In this talk, I argue for the importance of Exploratory Data Analysis as a tool that enables gain of knowledge about datasets in hypothesis-free manner. The method pioneered by J. Tukey in 1970’ relies on picturing the datasets and applying human reasoning to the patterns observed. However, the dimensionality of the datasets we work with these days is far larger than it was in 1970’ - usually too high for human perception to deal with. Manifold Learning is an area of Machine Learning that aims to render datasets in their lower-dimensional intrinsic space exposing the fundamental principles governing the dataset. Current methods however have severe limitations because the assumptions they make about the data that are almost never met. I will introduce a unique Topological approach to the problem, which is nearly assumption-free, and allows the user to deliver useful Exploratory Data Analysis resulting in the generation of new knowledge about real-world datasets.
Bio: George is currently a Chief Data Scientist at illumr, a company developing solutions for Topological Data Analysis. The mission of illumr is to go beyond black-box machine learning models and instead leverage machine learning to provide human-understandable insights to their clients. George has completed a PhD in Computational Neuroscience at the University of Cambridge, characterising the algorithm used by animals to learn while facing of uncertainty