I have always appreciated the unusual, unexpected, and surprising in science and in data. As famous science author Arthur C. Clarke once said, “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ (I found it) but ‘That’s funny!’” This is the primary reason that I motivated most of the doctoral students that I mentored at GMU to work on some variation of Novelty Discovery (or Surprise Discovery) for their Ph.D. dissertations.
“Surprise discovery” for me is a much more positive, exciting phrase than “outlier detection” or “anomaly detection”, and it is much richer in meaning, in algorithms, and in new opportunities. Finding the surprising unexpected thing in your data is what inspires our exclamation “That’s funny!” that may be signaling a great discovery (either about your data’s quality, or about your data pipeline’s deficiencies, or about some wholly new scientific concept). As famous astronomer, Vera Rubin said, “Science progresses best when observations force us to alter our preconceptions.”
My two training sessions will look at two different topics from a common perspective that reflects the theme of “novelty” through the study of some uncommon examples. Specifically, some (hopefully, most) of these examples may alter the sessions’ participants’ preconceptions (in a positive way) about your data science applications and the typical machine learning algorithms that you use every day. Each of the training sessions will present a series of examples (approximately 10 each) to demonstrate the over-arching idea represented in the title of the corresponding session.
My training sessions will focus on novel approaches and ways of thinking about common machine learning techniques and algorithms that data scientists frequently use. These include Bayes theorem, independent component analysis, Markov modeling, recommender engines, K-means clustering, K-nearest neighbors, neural networks, deep learning, TensorFlow, knowledge graphs, and more.
The machine learning cold-start problem is the focus of my first session. It will explore examples of meta-learning and optimization when there is very little initial knowledge about where to start in model hyperparameter space. This is a frequent challenge in data science applications, encountered either when there is very little labeled data to adequately train a supervised learning model or when our goal is to figure out what the data is saying to us (i.e., applying unsupervised learning, to explore them without the added baggage of our preconceptions as to what we think the data is revealing). We will review backpropagation and TensorFlow in this same context.
My second training session will examine atypical applications of some typical machine learning algorithms. This will include predicting tropical storm intensification using retail market basket analysis, and it will include predicting solar storm impact on astronauts in space using customer journey mapping techniques. It will even include examples from Formula 1 racing and finding a cure for cancer. The most surprising example might be the one where a company achieved a 100,000% ROI on a data analytics investment to reduce customer churn – and they used perhaps the simplest algorithm in the known Universe.
When we take a novel look at the methods and algorithms that we use every day, which then leads to unexpected and surprising discoveries in data, that should get us excited for each new day with data.
Note: Kirk will present two training sessions at the ODSC East 2021 Virtual Conference. One will focus on “Solving the Data Scientist’s Cold-Start Problem with Machine Learning Examples” and the other will look at “Atypical Applications of Typical Machine Learning Algorithms.”