Editor’s Note: Kirk will present his talk “Adapting Machine Learning Algorithms to Novel Use Cases” at ODSC West 2019.
If there was a metric for success in the data science profession, it would require a multi-dimensional scoring model. This metric would cover a data scientist’s technical skills and talents, analytic literacies and ways of thinking, and soft skills and aptitudes.
Soft skills include a collection of aptitudes that I call the “seven C’s of successful data scientists”: Collaboration (data science as a team sport), Communication (data storytelling), Computational thinking, Critical thinking, Creativity, Curiosity, Continuous lifelong learning, Complex problem-solving, Compassion (design thinking), Consultative (active listening), Community-focused, and Cool under pressure (“tolerance for ambiguity”). Okay, that’s more than seven things, but they represent my perspective on the journey to data science maturity as “sailing on the seven seas”. This idiom comes from an ancient allegory that associated the seven known seas on Earth with the seven known moving objects in the heavens. (As an astronomer, I love that astronomy reference).
[Related Article: 3 Sought-After Data Science Skills to Get Hired in 2019]
Adopting Analogies for the Profession
The “seven seas” is not only a nice metaphor for the journey, but it also associates with another seafaring analogy for passionate data scientists—passion about our work being another strong characteristic of the community. That other analogy connects the passion of seafarers with the passion of data scientists, which is often demonstrated through data hackathons and “data for social good” projects. This passion is expressed in a famous quote: “If you want to build a ship, don’t drum up people to gather wood and don’t assign them tasks and work, but rather teach them to yearn for the vast and endless sea” (Antoine de Saint-Exupery). Similarly, data scientists yearn to explore vast and endless seas of data.
Adapting Machine Learning Algorithms in Practice
Another key aptitude of successful data science practice blends soft skills (such as creativity) with the technical skills (of mathematics, algorithms, and coding). That is adaptability—adapting common methods and techniques to new applications and novel use cases. Here is an example: Kurtosis.
Measuring the kurtosis (fourth moment) of a statistical distribution may seem like a pedantic exercise for an introductory statistics class, after one learns about the first moment (mean), second moment (variance), and third moment (skewness). However, kurtosis can be used for some unexpected and significant use cases, particularly when employed as a special case of Independent Component Analysis (ICA) for unsupervised learning. ICA is a variant of PCA (Principal Component Analysis) in situations where the data distribution contains subcomponents that are statistically independent of each other, though generally not orthogonal.
ICA is an example of blind source separation, sometimes called the “cocktail party problem.” In the latter application, you try to isolate a specific speech signal out of a superposition of many independent voices. In large data collections, these independent components are unlikely to have the same mean, variance, and skewness. Consequently, a broad (fat tail) statistical data distribution that is identified through high kurtosis may be an indicator of the presence of multiple components in a complex signal. Exploring the data with the goal of discovering the specific cross-cut slice(s) through the data that yield(s) the most negative kurtosis will help the creative data scientist to identify and verify the existence of those independent sources (see diagram). A subsequent application of a mixture modeling algorithm can then assist in the separation of those independent and unknown (blind) sources.
What makes a novel use case?
Novel use cases may be surprising, unexpected, and perhaps even delightful applications of common methods and algorithms. Adapting algorithms to novel use cases includes these examples:
[Related Article: 6 Unique GANs Use Cases]
- How a statistical tautology attributed to an 18th-century Presbyterian minister may be used to estimate the mass density function of galaxies across the Universe.
- How a marketing segmentation algorithm could be used to protect Mars-bound astronauts from certain death.
- How a crash in a Formula 1 car race during the 1950’s might inspire one of the greatest data science use cases for the Internet of Things.
- How a violation of the triangle inequality theorem in mathematics is being used to seek out and discover a cure for cancer.
There are many similar opportunities for data scientists to demonstrate their curiosity, creativity, and competence in exploring novel applications of algorithms within vast and endless seas of data. Such efforts demonstrate how data scientists can go beyond that which is expected in order to create even more value for their organizations from their data assets and from their algorithmic talents, skills, and aptitudes.
Want to learn more? At ODSC West 2019, Booz Allen Hamilton’s Principal Data Scientist Kirk Borne will present several well-known machine learning algorithms with examples of how they have been adopted for specific use cases or applied in specific business domains. Dr. Borne will then show how each one of these algorithms can be adapted to a novel use case that may be less obvious, perhaps producing significantly surprising results in some other domain, including those use cases mentioned above.