A Complete Real-Time Data Application in 90 Minutes : from Kafka to Streamlit

Abstract: 

Real-time data application is always one of the scariest ones for junior data scientists and (even) data engineers. However, the demand for building and managing real-time systems is growing rapidly. For most data scientists and researchers from non-engineering background, working with real-time data applications is not a trivial thing. In this workshop, we are going to build a real-time data application in 90 minutes to introduce hands-on experience to the audience. The application is simple yet complete, it is designed to be friendly and approachable. To serve a broader audience, mixed conceptual contents and practical coding will be implemented.

The workshop will follow the conventional four stages of a data pipeline: ingestion, storage, transformation and delivery. I will also cover some machine learning devops to demystify the fancy words in the context of this pipeline.

The project starts with the public Twitter API to stream live data from twitter feeds. The data will be stored in a document database with proper schema as well as a relational database for persistence. A machine learning model will be trained iteratively using the incoming new data. The original model was initiated with batch data offline. Bridging between API, databases and the model is handled by Kafka or faust. Basic concepts of the pub/sub model will be introduced to serve the beginner level audience who have no prior knowledge.

At the next stage, Streamlit is used to query the document database to serve users’ requests on the client-facing end. I will cover the life cycle of the requests/response scenario and its implementation.

Bio: 

Ron Li is a data science instructor and senior data scientist at Galvanize, Inc. Before that, He worked on machine learning and knowledge graphs at the Information Sciences Institute. Ron has published a 4.5-star rating book Essential Statistics for Non-STEM Data Analysts. He has also authored/co-authored several academic papers, taught data science to non-STEM professionals as pro bono service, and gave talks at conferences like PyData.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google