Abstract: In this short talk, we'll cover how to easily run SQL from Jupyter Notebooks through popular open-source tools (JupySQL). The main focus will be a real-life industry use case (predicting customer churn) where we need to connect to the data source (we'll use an example data set, but we can easily use a database, data warehouse, or a data lake), and query the data. Once our data is ready, we will run exploratory data analysis, which includes plotting the data and gaining insights about it. Once we're done with this part, we will be ready to fit a model and evaluate its results. The moment we've decided on the model, dataset, and hyperparameters - we will create a report that we can share with the rest of the team and the business stakeholders.
Jupyter is the most common tool today to perform data science. Its ability to interactively work with the data and transform it match no other IDE. We think with JupySQL and some other packages Jupyter can become even better and we can overcome some of the challenges that currently exist in the platform.
This lecture is at a beginner/intermediate level and is intended for data scientists, data engineers or data analysts who are looking for a better way to query their data stores from Jupyter notebooks.
This will be the layout of the talk:
1. Introduction of the problem, use case, and background story.
2. Introduction to JupySQL and the dataset.
3. EDA + fitting the model
4. Generating a stakeholder report
5. Questions and Answers
Bio: Ido Michael, a seasoned data engineering and science professional, co-founded Ploomber & JupySQL with the mission of empowering data scientists to build faster and more efficient solutions. Prior to this, he led data engineering and science teams at Amazon Web Services (AWS), where he played an instrumental role in building hundreds of data pipelines during various customer engagements, working closely with his team.
A proud alumnus of Columbia University, Ido moved to New York to pursue his Master's degree in Computer Engineering. It was during his time at Columbia that he identified the challenges in working with multiple data sources and Jupyter notebooks for reliable model development. This realization inspired him to concentrate on building Ploomber, a platform designed to address these issues and streamline the data science workflow.