Unified Data Science Platform for Accelerating Data Insights


We at LinkedIn leverage Jupyter notebooks extensively to do ad-hoc data analysis and our data scientists, engineers and developers spend a lot of time iterating over the query development lifecycle. We have created a hosted notebook platform at LinkedIn for our internal users) called Darwin (Data Analytics & Relevance Workbench at LinkedIn) - a one-stop solution for a complete Jupyter notebook/query life cycle (query development, query testing and query productionizing). Requirements for query development include connecting to various data sources like HDFS, MySQL, Kafka, etc., utilizing data engines like Apache Hadoop, Apache Spark & Trino and using libraries like TensorFlow to build state of the art machine learning models. Requirements for query testing include viewing the result in tabular format, pivot over the same and analyze with the help of visualization tools and collaborating with peers using Git and RB. Requirements for query productionizing include creating a shareable report link of the executions, scheduling the query at a set frequency & publishing the query to another internal app. We will also present the capabilities that we have added in our hosted notebook platform to iterate/pivot/visualize using a customized extension developed called a workbook along with features for productionizing them with creating custom schedules, creating customized dashboards & sharing seamlessly with other users.

We will share the advancements done in this field of Jupyter notebooks over the past few years which enables any user to be more productive with ad-hoc analysis using Jupyter notebooks. These learnings will enable data scientists/ machine learners to easily iterate and share their findings with the broader community.

LinkedIn is a data-driven company. Every team consumes and produces data that improves user experience on LinkedIn. Join Swasti Kakker and Manu Ram Pandit to explore the scalable, extensible unified platform LinkedIn is building leveraging Jupyter Hub, Jupyter Notebook, Docker and Kubernetes, MySQL, Git, and Restli that enforces productive data science and improves development experience.


Manu Ram Pandit is a Staff software engineer on the data analytics and infrastructure team at LinkedIn, where he’s influenced the design and implementation of hosted notebooks, providing a seamless experience to end users. Manu has worked on setting up multiple features in the platform like sharing/choosing custom docker environments & recently is involved with visualization efforts to effectively view big data visualizations.He works closely with customers, engineers, and product to understand and define the requirements and design of the system. He has extensive experience in building complex and scalable applications. Previously, he was with Paytm, Amadeus, and Samsung, where he built scalable applications for various domains.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google