Under The Hood: Creating Your Own Spark Datasources
Under The Hood: Creating Your Own Spark Datasources

Abstract: 

Apache Spark has become the tool of choice for data engineers and data scientists for data discovery, data munging and pipelining, general ETL and many other kinds of scalable distributed data processing tasks. One of the key contributing factor for this success is the consistent and easy to use distributed framework supporting Scala, Java, Python and R and the ability to connect Spark to a variety of data sources. Connectors come bundled with Spark, and available as part of the ecosystem from other projects and vendors. However there is often a need to integrate Spark with a source, destination or system for which there is no available connector.

This workshop is geared to address that problem by providing an under-the-hood understanding of static and structured streaming sources in Spark by way of building a rudimentary data source. Furthermore, with the knowledge gained, attendees will be able to understand, appreciate and optimally use existing data connectors and integrations like that for Kafka or Hadoop.

Bio: 

Jayesh Thakrar is a senior software engineer at Conversant where he designed and built systems covering Hadoop, HBase, Cassandra, Flume, Kafka, Hive and OpenTSDB. For the past year he has been working on Spark application development using the big data systems he created. Jayesh is an avid learner and passionate about big data, and often speaks at meetups and conferences sharing his experiences with Apache and other open source projects.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google