Abstract: This is a hands-on short tutorial on how to write your own estimator (transformer, regressor, classifier, or a meta-estimator) which can be used in a scikit-learn pipeline, and works seamlessly with the other meta-estimators of the library. It also includes how they can be conveniently tested with a simple set of tests. In many data science related tasks, the use-case specific requirements require us to slightly manipulate the behavior of some of the estimators present in scikit-learn. Some of the tips and requirements are not necessarily well documented by the library, and it can be cumbersome to find those details. In this short tutorial, we go through an example of writing our own estimator, test it against the scikit-learn's common tests, and see how it behaves inside a pipeline and a grid search. There have also been recent developments related to the general API of the estimators which require slight modifications by the third party developers. In this talk we cover these changes and point you to the activities to watch as well as some of the private utilities which you can use to improve your experience of developing an estimator.
Bio: He is a computer scientist / bioinformatician who has turned to be a core developer of `scikit-learn` and `fairlearn`, and work as a Machine Learning Engineer at Hugging Face. He is also an organizer of PyData Berlin.
These days he mostly focus on aspects of machine learning and tools which help with creating more ethical and fair decision making systems. This trend has influenced him to work on `fairlearn`, and to work on aspects of `scikit-learn` which would help tools such as `fairlearn` to work more fluently with the package; and at Hugging Face, his focus is to enable the community of these libraries to be able to share their models more easily and be more open about their work.