Abstract: Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.
Bio: Thomas Fan is a Software Developer at Columbia University's Data Science Institute. He collaborates with the scikit-learn community to develop features, review code, and resolve issues. On his free time, Thomas contributes to skorch, a scikit-learn compatible neural network library that wraps PyTorch.