
Abstract: pomegranate is a python package for probabilistic modeling with a primary emphasis on ease of use and a secondary emphasis on speed. In keeping with the first emphasis it has a consistent sklearn-like API for training and making predictions using a model, and a convenient "lego API" that allows complex models to be built out of simple components without needing to think about how the math might work. In keeping with the second emphasis the computationally intensive parts are written in efficient cython code and all models support both multithreaded parallelism and out-of-core computations for training on massive datasets. Currently, pomegranate allows you to use basic probability distributions to build general mixture models, naive Bayes classifiers, Markov chains, hidden Markov models, factor graphs, and Bayesian networks. In this talk I will show how to build models of increasing complexity with code examples and describing the type of phenomena they model well, drawing examples from "popular culture" and inadvertently proving how out of touch I am with today's youth. I will showcase both its speed and flexibility at each step with comparisons to other well-known packages such as numpy, scipy, and scikit-learn. Finally, I will show the simplicity of training a mixture of hidden Markov models in parallel using pomegranate.
Bio: Jacob Schreiber is a graduate student at the University of Washington, Seattle, where he studies how to leverage large scale machine learning (big data) systems to solve problems in genome science. He is also a core developer of scikit-learn.
Jacob Schreiber
Title
Core Developer of scikit-learn
Category
east2017 | east2017workshop
