
ODSC hosts a fantastic lineup of some of the best and brightest expert speakers and core contributors to data science
Register your interest for 2024Training Sessions
Workshops
Speakers
Hours of Content

Click for more info
Robert F. Dougherty, PhD
Vice President, Digital Health Research Compass Pathways
Pedro Domingos, PhD
Pedro Domingos is a professor emeritus of computer science and engineering at the University of Washington and the author of The Master Algorithm. He is a winner of the SIGKDD Innovation Award and the IJCAI John McCarthy Award, two of the highest honors in data science and AI. He is a Fellow of the AAAS and AAAI, and has received an NSF CAREER Award, a Sloan Fellowship, a Fulbright Scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. Pedro received an undergraduate degree (1988) and M.S. in Electrical Engineering and Computer Science (1992) from IST, in Lisbon, and an M.S. (1994) and Ph.D. (1997) in Information and Computer Science from the University of California at Irvine. He is the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on the program committees of AAAI, ICML, IJCAI, KDD, NIPS, SIGMOD, UAI, WWW, and others. I’ve written for the Wall Street Journal, Spectator, Scientific American, Wired, and others. He helped start the fields of statistical relational AI, data stream mining, adversarial learning, machine learning for information integration, and influence maximization in social networks.
Secrets of Successful AI Projects(Keynote)

Daphne Koller, PhD

Raluca Ada Popa, PhD
Raluca Ada Popa is the Robert E. and Beverly A. Brooks associate professor of computer science at UC Berkeley working in computer security, systems, and applied cryptography. She is a co-founder and co-director of the RISELab and SkyLab at UC Berkeley, as well as a co-founder of Opaque Systems and PreVeil, two cybersecurity companies. Raluca has received her PhD in computer science as well as her Masters and two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of the 2021 ACM Grace Murray Hopper Award, a Sloan Foundation Fellowship award, Jay Lepreau Best Paper Award at OSDI 2021, Distinguished Paper Award at IEEE Euro S&P 2022, Jim and Donna Gray Excellence in Undergraduate Teaching Award, NSF Career Award, Technology Review 35 Innovators under 35, Microsoft Faculty Fellowship, and a George M. Sprowls Award for best MIT CS doctoral thesis.
Confidential Data Computing and Collaboration for Data Scientists(Keynote)

Jeff Clune, PhD
Jeff Clune is an Associate Professor of Computer Science at the University of British Columbia and a Faculty Member at the Vector Institute and a Senior Research Advisor at DeepMind.
Previously, he was a Research Team Leader at OpenAI. Before that he was a Senior Research Manager and founding member of Uber AI Labs, which was formed after Uber acquired a startup our startup. Prior to Uber, he was the Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming.
He conducts research in three related areas of machine learning (and combinations thereof):
– Deep Learning: Improving our understanding of deep neural networks, harnessing them in novel applications, and advancing deep reinforcement learning
– Evolving Neural Networks: Investigating open questions in evolutionary biology regarding how intelligence evolved and harnessing those discoveries to improve our ability to evolve more complex, intelligent neural networks
– Robotics: Making robots more like animals in being adaptable and resilient
A good way to learn about Jeff’s research is by visiting the Google Scholar page, which lists all of his publications.
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos(Track Keynote)

Dr. Jon Krohn
Jon Krohn is Co-Founder and Chief Data Scientist at the machine learning company Nebula. He authored the book Deep Learning Illustrated, an instant #1 bestseller that was translated into seven languages. He is also the host of SuperDataScience, the data science industry’s most listened-to podcast. Jon is renowned for his compelling lectures, which he offers at leading universities and conferences, as well as via his award-winning YouTube channel. He holds a PhD from Oxford and has been publishing on machine learning in prominent academic journals since 2010.
Deep Learning with PyTorch and TensorFlow(Training)
NLP with GPT-4 and other LLMs: From Training to Deployment with Hugging Face and PyTorch Lightning(Training)

Eve Psalti
Eve Psalti is 20+year tech and business leader, currently the Senior Director at Microsoft’s Azure AI engineering organization responsible for scaling & commercializing artificial intelligence solutions.
She was previously the Head of Strategic Platforms at Google Cloud where she worked with F500 companies helping them grow their businesses through digital transformation initiatives.
Prior to Google, Eve held business development, sales and marketing leadership positions at Microsoft and startups across the US and Europe leading 200-people teams and $600M businesses.
A native of Greece, she holds a Master’s degree and several technology and business certifications from London Business School and the University of Washington. Eve currently serves on the board of WE Global Studios, a full-stack startup innovation studio supporting female entrepreneurs.
Infuse Generative AI in your Apps Using Azure OpenAI Service(Keynote)

Ted Kwartler
Ted Kwartler is the Field CTO at DataRobot. Ted sets product strategy for explainable and ethical uses of data technology. Ted brings unique insights and experience utilizing data, business acumen and ethics to his current and previous positions at Liberty Mutual Insurance and Amazon. In addition to having 4 DataCamp courses, he teaches graduate courses at the Harvard Extension School and is the author of “Text Mining in Practice with R.” Ted is an advisor to the US Government Bureau of Economic Affairs, sitting on a Congressionally mandated committee called the “Advisory Committee for Data for Evidence Building” advocating for data-driven policies.

Jay Jackson
Jay is a VP of our Artificial Intelligence and Machine Learning organization at Oracle Cloud. He completed a degree in neuroscience and started his career in technology at Oracle, maintaining an idea that these two paths would converge.
Unlocking the Power of Large Language Models: Why Owning Your Own Model is Critical—and Within Reach(Keynote)

Yaron Haviv
Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in AI, cloud, data and networking to leading startups and enterprises since the late 1990s. As the Co-Founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless framework with over 4,000 Github stars and MLRun, a cutting-edge open source MLOps orchestration framework.
Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA – NASDAQ: NVDA), where he led technology innovation, software development and solution integrations. He also served as the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007 and was later acquired by Mellanox (NASDAQ:MLNX).
Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He sits on the Data Science Committee of the AI Infrastructure Alliance (AIIA), of which Iguazio is a founding member. He is co-authoring a book on Implementing MLOps in the Enterprise for O’Reilly. Yaron presents at major industry events worldwide and writes tech content for leading publications including TheNewStack, Hackernoon, DZone,Towards Data Science and more.
Implementing Gen AI in Practice(Track Keynote)

Tamilla Triantoro, PhD
Tamilla Triantoro is an Associate Professor of Computer Information Systems at Quinnipiac University and a leader of the Masters Program in Business Analytics. She was previously an Academic Director of Data Analytics at the University of Connecticut. Dr. Triantoro is an author, speaker, researcher, and educator in the fields of artificial intelligence, data analytics, user experience with technology, and the future of work. She received her Ph.D. from the City University of New York where she researched online user behavior. Dr. Triantoro presents her research around the world, attempting to demystify the complexity of today’s digital world and to make it understandable and relevant to business professionals and the general audience.
Graph Viz: Exploring, Analyzing and Visualizing Graphs and Networks with Gephi and ChatGPT(Workshop)
AI + Human: A Powerful Partnership for Success(Women Ignite)

Hagay Lupesko
Hagay Lupesko is the VP of Engineering at MosaicML, where he focuses on making generative AI training and inference efficient, fast, and accessible. Prior to MosaicML, Hagay held AI engineering leadership roles at Meta, AWS, and GE Healthcare. He shipped products across various domains: from 3D medical imaging, through global-scale web systems, and up to deep learning systems that power apps and services used by billions of people worldwide.
Unlocking the Power of Large Language Models: Why Owning Your Own Model is Critical—and Within Reach(Keynote)

Audrey Reznik Guidera
Audrey is a Sr. Principal Software Engineer in the Red Hat Cloud Services – Red Hat OpenShift Data Science team focusing on helping customers with managed services, AI/ML workloads and next-generation platforms. She holds a degree in Computer Information Systems and has been working in the IT Industry for over 20 years in full stack development to data science roles. Audrey is passionate about Data Science and in particular the current opportunities with AIML at the Edge and Open Source technologies.
Data Science Software Acceleration at the Edge(AiX Keynote)

Sheamus McGovern
Sheamus McGovern is the founder of ODSC (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Stefanie Molin
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Aric LaBarr, PhD
A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first Master of Science in Analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.

Joe Dery, PhD
Joe Dery joined Western Governors University’s College of IT as the VP & Dean of Data Analytics in summer, 2022. At WGU, Joe is working to help more than 3,000 current analytics students learn how to effect change in their professional roles – surgically balancing a combination of mathematics, data management, programming, and business influence skills. Prior to joining academia full-time, Joe spent much of his corporate career working for EMC – and later, Dell Technologies – where he joined as a “hands-on-keyboard” Data Scientist in 2011. Joe went on to hold leadership positions in Dell’s Sales, Finance, and Supply Chain organizations driving efforts in Data Science, Business Intelligence, Digital Strategy, and Digital Transformation. Across these domains, Joe’s efforts touched a wide variety of business problems, including ML-driven sales quota allocations, sales forecasting & opportunity prioritization, customer cross-sell/whitespace targeting, addressable marketing opportunity sizing, sales territory optimization, supply chain planning optimization, data/analytics literacy training, and self-service BI. Building from his experiences, Joe is often invited to speak on the crucial role of decision intelligence frameworks, change management, and “improv” in bringing analytics solutions to life. Joe holds a Ph.D in Business Analytics & an M.S. in Marketing Analytics, both from Bentley University.
Unlock the Power of Data Science for Real Change: A Blueprint for Decision Intelligence(Track Keynote)

Cansu Canca, PhD
Cansu joined the Institute of Experiential AI as the ethics lead and a research associate professor. She also has an affiliation with the Department of Philosophy and Religion and the Ethics Institute in the College of Social Sciences and Humanities. She has a doctorate in philosophy specializing in applied ethics.
Cansu is the founder of AI Ethics Lab, one of the first initiatives focusing exclusively on advising practitioners and conducting multidisciplinary research on the ethics of artificial intelligence. She remains the director of the AI Ethics Lab, where she leads teams of computer scientists, philosophers, legal scholars, and other experts in research, the development of toolkits, and consulting.
Cansu developed the Puzzle-solving in Ethics (PiE) Model, a dynamic and collaborative model for integrating ethics into the AI innovation cycle that organizations have implemented through consulting work with the AI Ethics Lab. She brings the PiE Model to EAI along with her industry experience and her background in philosophy.
Cansu serves as an ethics expert in various ethics, advisory, and editorial boards. She is a founding editor for the international peer-reviewed journal AI & Ethics (Springer Nature), serves as an ethics expert for EU-funded research projects focusing on the ethics of AI, robotics, human enhancement, and law enforcement AI technologies, and chairs the Institute of Electrical and Electronics Engineers (IEEE) AI Experts Network Criteria Committee.
The Box: Operationalizing AI Ethics Principles(AiX Track Keynote)

Dan Roth, PhD
Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, a VP/Distinguished Scientist at Amazon AWS, and a Fellow of the AAAS, the ACM, AAAI, and the ACL.
In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.”
Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Until February 2017 Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was acquired by Reveal in 2020. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.

Matt Harrison
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Machine Learning with XGBoost(Workshop)
Idiomatic Pandas(Workshop)

Julia Lintern
Julia Lintern currently works as a Director of Data Science at Gartner. Previously, she worked as a Data Scientist for the New York Times. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods and discovered a deep appreciation for the combination of mathematics and visualizations. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.
Introduction to Machine Learning(Bootcamp)

Liran Hason
Liran Hason is the Co-Founder and CEO of Aporia, a full-stack ML observability platform used by Fortune 500 companies and data science teams across the world to ensure responsible AI. Prior to founding Aporia, Liran was an ML Architect at Adallom (acquired by Microsoft), and later an investor at Vertex Ventures. Liran created Aporia after seeing first-hand the effects of AI without guardrails. In 2022, Forbes named Aporia as the “Next Billion-Dollar Companies”.

Julien Simon
Julien is currently Chief Evangelist at Hugging Face. He’s recently spent 6 years at Amazon Web Services where he was the Global Technical Evangelist for AI & Machine Learning. Prior to joining AWS, Julien served for 10 years as CTO/VP Engineering in large-scale startups.
Hyper-productive NLP with Hugging Face Transformers(Workshop)

Reah Miyara
Reah Miyara is the VP of Product at Aporia. Aporia is the leading ML observability platform empowering organizations to monitor, visualize, and improve their ML models in production. Reah is a 2014 graduate of UC Berkeley, majoring in Electrical Engineering and Computer Science. Reah spent three years at IBM where, as Senior Product Manager/Watson Industry Applications, he was the founding member of the product team that designed & launched an AI-powered legal & compliance assistant. He also led technical strategy for over ten engineers & researchers. Reah then worked at Google AI in Product, Research & Machine Intelligence. At Google, he focused on driving product and technical strategy for tools & research related to large-scale optimization, market economics, graph-based and neural structured machine learning. Reah is certified in Machine Learning (from Stanford University) and in IBM Product Fundamentals (having established the curriculum and led the course). Reah recently relocated to Tel Aviv.
Session Title: Turning Model Insights into Actions with Aporia’s Root Cause Analysis
Abstract:
The process of investigating model performance issues in production environments has long been plagued by complexity, inefficiency, and limited success. In this demo talk, we unveil a new paradigm for streamlining performance diagnostics and performing effective RCA. Our Root Cause Analysis enables you to effortlessly slice and dice production data and identify previously hidden relationships and valuable insights from your raw data. Easily collaborate to find when, where, and why issues originated to expedite response and remediation. Investigate any use case, any production issue, in every model type, and turn deep insights into actionable success.

Thomas J. Fan
Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.
Introduction to scikit-learn: Machine Learning in Python (Training)

Daniel Gerlanc
Daniel Gerlanc has worked as a data scientist for more than decade and been writing software for nearly 20 years. He frequently teaches live trainings on oreilly.com and is the author of the video course Programming with Data: Python and Pandas. He has coauthored several open source R packages, published in peer-reviewed journals, and is a graduate of Williams College.
Programming with Data: Python and Pandas(Bootcamp)

Jimmy Whitaker
Jimmy Whitaker is HPE’s Chief Scientist of AI & Strategy, specializing in the application of machine learning to diverse industries. With a strong background in Natural Language Processing (NLP) and Speech Recognition, Jimmy focuses on innovative solutions to enhance data-driven decision making and the application of AI at scale. He received his Masters in Computer Science at the University of Oxford, co-authored the textbook Deep Learning for NLP and Speech Recognition (Springer), and was previously Chief Scientist at Pachyderm (acquired by HPE), where he focused on applying data versioning and data-driven pipeline capabilities to ML problems.
Session Outline:
Data-Centric AI: Moving Beyond Model-Centric Approaches with Pachyderm
Abstract:
This technical talk delves into the paradigm shift from model-centric to data-centric AI, emphasizing the importance of data quality in improving machine learning outcomes. We will explore the current AI landscape and discuss the reasons behind this shift. Focusing on the Pachyderm platform for data-driven processing and versioning, attendees will learn practical steps and principles to streamline their data-centric AI efforts. This talk aims to equip practitioners with the knowledge and tools necessary to harness AI’s full potential by embracing a data-driven approach and leveraging Pachyderm’s innovative platform.

Dr. Hongxia Yang, PhD
Dr. Hongxia Yang, PhD from Duke University, led the team to develop AI open sourced platforms and systems such as AliGraph, M6, Luoxi. Dr. Yang has published nearly 100 top conference and journal papers, and held more than 20 patents. She has been awarded the highest prize of the 2019 World Artificial Intelligence Conference, Super AI Leader (SAIL Award), the second prize of the 2020 National Science and Technology Progress Award (China’s Top tech award), the first prize of Science and Technology Progress of the Chinese Institute of Electronics in 2021, and the Forbes China Top 50 Women in Science and Technology in 2022. She used to work as the Senior Staff Data Scientist and Director in Alibaba Group, Principal Data Scientist at Yahoo! Inc and Research Staff Member at IBM T.J. Watson Research Center, joint adjunct professor at Zhejiang University Shanghai Advanced Research Institute respectively.
Towards the Next Generation of Artificial Intelligence with its Applications in Practice(Talk)

Noah Giansiracusa, PhD
Noah Giansiracusa (PhD in math from Brown University) is a tenured associate professor of mathematics and data science at Bentley University, a business school near Boston. His research interests range from algebraic geometry to machine learning to empirical legal studies. After publishing the book How Algorithms Create and Prevent Fake News in July 2021, Noah has gotten more involved in public writing and policy discussions concerning data-driven algorithms and their role in society. He’s written op-eds for Barron’s, Boston Globe, Wired, Slate, and Fast Company and is currently working on a second book, Robin Hood Math: How to Fight Back When the World Treats You Like a Number, with a Foreword by Nobel Prize-winning economist Paul Romer.
Deepfakes: How’re They Made, Detected, and How They Impact Society(Tutorial)

Jacob Andreas, PhD
Jacob Andreas is the X Consortium Assistant Professor at MIT. His research aims to build intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. As a researcher at Microsoft Semantic Machines, he founded the language generation team and helped develop core pieces of the technology that powers conversational interaction in Microsoft Outlook. He has been the recipient of Samsung’s AI Researcher of the Year award, MIT’s Kolokotrones teaching award, and paper awards at NAACL and ICML.
Interpreting Features in Deep Networks(Tutorial)

Bill Franks
Bill Franks is the Director of the Center for Statistics and Analytical Research at Kennesaw State University. He is also Chief Analytics Officer for The International Institute For Analytics (IIA) and serves on several corporate advisory boards. Franks is also the author of the books Winning The Room, Taming The Big Data Tidal Wave, The Analytics Revolution, and 97 Things About Ethics Everyone In Data Science Should Know. He is a sought after speaker and frequent blogger who has over the years been ranked a top global big data influencer, a top global artificial intelligence and big data influencer, a top AI influencer, and was an inaugural inductee into the Analytics Hall of Fame. His work, including several years as Chief Analytics Officer for Teradata (NYSE: TDC), has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.
Winning The Room: Creating And Delivering An Effective Data-Driven Presentation(Business Talk)

Eric Eager, PhD
Eric Eager is the VP of Research and Development at SumerSports, a football analytics startup founded by Paul Tudor Jones and Jack Jones. Prior to joining Sumer, he held similar roles at Pro Football Focus, and is responsible for many of the insights that have grown the game of American football to this day. Eric holds a PhD in Mathematical Biology from the University of Nebraska, and has taught at Wharton, DataCamp and the University of Wisconsin – La Crosse, publishing over 25 academic papers during his career.
Using Data Science to Better Evaluate American Football Players(Talk)

Irina Rish, PhD
Irina Rish is an Associate Professor in the Computer Science and Operations Research Department at the Université de Montréal (UdeM) and a core faculty member of MILA – Quebec AI Institute. She holds Canada Excellence Research Chair (CERC) in Autonomous AI and a Canadian Institute for Advanced Research (CIFAR) Canada AI Chair. She received her MSc and PhD in AI from University of California, Irvine and MSc in Applied Mathematics from Moscow Gubkin Institute. Dr. Rish’s research focus is on machine learning, neural data analysis and neuroscience-inspired AI. Before joining UdeM and MILA in 2019, Irina was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She received multiple IBM awards, including IBM Eminence & Excellence Award and IBM Outstanding Innovation Award in 2018, IBM Outstanding Technical Achievement Award in 2017, and IBM Research Accomplishment Award in 2009. Dr. Rish holds 64 patents, has published over 80 research papers in peer-reviewed conferences and journals, several book chapters, three edited books, and a monograph on Sparse Modeling.
Recent Advances in Foundation Models: Scaling Laws, Emergent Behaviors, and AI Democratization(Talk)

Pradeep Ravikumar, PhD
Pradeep Ravikumar is a Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously an Associate Director at the Center for Big Data Analytics, at the University of Texas at Austin. His thesis has received honorable mentions in the ACM SIGKDD Dissertation award and the CMU School of Computer Science Distinguished Dissertation award. He is a Sloan Fellow, a Siebel Scholar, a recipient of the NSF CAREER Award, and was Program Chair for the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2013. He is Associate Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and action editor for the Machine Learning journal, and the Journal of Machine Learning Research.
Dr. Ravikumar’s research group at CMU works on the foundations of statistical machine learning, with recent focus on “next generation” machine learning systems, that are explainable, robust to train and test time corruptions, and resilient to distribution shifts, and are learnt under resource constraints by leveraging or discovering various notions of “structure” and domain knowledge.
Robustness to Adversarial Inputs and Tail Risk via Boosting(Talk)

David P. Woodruff, PhD
David Woodruff is a professor at Carnegie Mellon University in the Computer Science Department. Before that he was a research scientist at the IBM Almaden Research Center, which he joined in 2007 after completing his Ph.D. at MIT in theoretical computer science. His research interests include data stream algorithms, distributed algorithms, machine learning, numerical linear algebra, optimization, sketching, and sparse recovery. He is the recipient of the 2020 Simons Investigator Award, the 2014 Presburger Award, and Best Paper Awards at STOC 2013, PODS 2010, and PODS, 2020. At IBM he was a member of the Academy of Technology and a Master Inventor.
Testing Positive Semidefiniteness and Eigenvalue Approximation(Talk)

Jordan Boyd-Graber, PhD
Jordan is an associate professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, iSchool, and Language Science Center. Previously, he was an assistant professor at Colorado’s Department of Computer Science (tenure granted in 2017). He was a graduate student at Princeton with David Blei.
His research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.
If We Want AI to be Interpretable, We Need to Measure Interpretability(Talk)

Chandra Khatri
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
Truth Checker: Generative Large Language Models and Hallucinations(Talk)

Moran Beladev
Moran is a machine learning manager at booking.com, researching and developing computer vision and NLP models for the tourism domain. Moran is a Ph.D candidate in information systems engineering at Ben Gurion University, researching NLP aspects in temporal graphs. Previously worked as a Data Science Team Leader at Diagnostic Robotics, building ML solutions for the medical domain and NLP algorithms to extract clinical entities from medical visit summaries.
Leverage Reviews Data for Multi Label Topics Classification in Booking.com(Talk)

Panos Alexopoulos, PhD
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book “Semantic Modeling for Data – Avoiding Pitfalls and Breaking Dilemmas” (O’Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.

Leonardo De Marchi
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks. He now works in Thomson Reuters as VP of Labs, and also provides consultancy and training for small and large companies. His previous experience includes being Head of Data Science and Analytics in Bumble, the largest dating site with over 500 million users, heading the team through acquisition and an IPO.

Ali Rossi
Ali Rossi is a Data Science Tech Lead at Foursquare, working closely with their first-party foot traffic panel to deliver insights against a broad range of client business questions. She is passionate about consumer behavioral data, with experience building consumer panels, researching normalization methodologies, and developing methods to derive actionable insights. Previously, she worked in product management at Foursquare, Amazon and Nielsen, mainly focused on building analytics products using consumer-sourced data. She studied chemistry and mathematics at the University of Connecticut and is currently pursuing a Master of Science in computer science at the Georgia Institute of Technology.
Uncovering Behavioral Segments by Applying Unsupervised Learning to Location Data(Talk)

Moez Ali
Innovator, Technologist, and a Data Scientist turned Product Manager with proven track record of building and scaling data products, platforms, and communities. Experienced in building and leading teams of data scientists, data engineers, and product managers. Strongly opinionated tech visionary and a thought partner to C-level leadership.
Moez Ali is an inventor and creator of PyCaret. PyCaret is an open-source, low-code, machine learning software. Ranked in top 1%, 8M+ downloads, 7K+ GitHub stars, 100+ contributors, and 1000+ citations.
Globally recognized personality for open-source work on PyCaret. Keynote speaker and top ten most-read writer in the field of artificial intelligence. Teaching AI and ML courses at Cornell, NY and Queens University, CA. Currently building world’s first hyper-focused Data and ML Platform.
Automate Machine Learning Workflows with PyCaret 3.0(Workshop)

Timo Walther
Timo Walther is a Principal Software Engineer at Confluent and a long-time member of Apache Flink’s management committee. He studied Computer Science at TU Berlin and was part of the Database Group there – the origins of Apache Flink. He worked as a software engineer at DataArtisans and led SQL team at Ververica. He was a Co-Founder of Immerok which was acquired by Confluent in 2023. In Flink, he is working on various topics in the Table & SQL ecosystem to make stream processing accessible for everyone.

Tom Shafer, PhD
Tom Shafer works as a Lead Data Scientist at Elder Research, a recognized leader in data science, machine learning, and artificial intelligence consulting since its founding in 1995. As a lead scientist, Tom contributes technically to a wide variety of projects across the company, mentors data scientists, and helps to direct the company’s technical vision. His current interests focus on Bayesian modeling, interpretable ML, and data science workflow. Before joining Elder Research, Tom completed a PhD in Physics at the University of North Carolina, modeling nuclear radioactive decays using high-performance computing.
Beyond Credit Scoring: Interpretable Models for Responsible Machine Learning(Talk)

Meg Kurdziolek, PhD
Meg is currently the Lead UXR for Intrinsic.ai, where she focuses her work on making it easier for engineers to adopt and automate with industrial robotics. She is a “Xoogler”, and prior to Intrinsic worked on the Explainable AI services on Google Cloud. Meg has had a varied career working for start-ups and large corporations alike, and she has published on topics such as user research, information visualization, educational-technology design, voice user interface (VUI) design, explainable AI (XAI), and human-robot interaction (HRI). Meg is also a proud alumnus of Virginia Tech, where she received her Ph.D. in Human-Computer Interaction.

Nikolay Manchev, PhD
Nikolay is an experienced Data Science professional who currently leads the EMEA Data Science team at Domino Data Lab. He holds an MSc in Software Technologies, an MSc in Data Science, and is currently undertaking postgraduate research at King’s College London. His area of expertise is Statistics, Mathematics, and Data Science in general, and his research interests are in Neural Networks with emphasis on biological plausibility. He writes articles and blogs regularly and speaks at various European conferences (ODSC, Big Data Spain, Strata, Big Data London etc.) to build awareness about data science and artificial intelligence. He is also the organizer of the London Data Science and Machine Learning meetup and recipient of several technical mastery awards like the Oracle ACE Award and the IBM Outstanding Technical Achievement Award.

Jesse Johnson
Jesse Johnson is Vice President of Data Science and Data Engineering at Dewpoint Therapeutics, a drug development Biotech startup founded in 2019 around a scientific field called biomolecular condensates. In this role, Jesse’s diverse set of experiences from academic math departments, engineering teams at Google, and data science teams at large, medium and small life science companies provide a unique perspective on the ways that data and wet lab teams communicate differently, or sometimes don’t communicate at all.
Development Principles for Biotech Data Teams(Business Talk)

Iryna Gurevych, PhD
Iryna Gurevych (PhD 2003, U. Duisburg-Essen, Germany) is professor of Computer Science and director of the Ubiquitous Knowledge Processing (UKP) Lab at the Technical University (TU) of Darmstadt in Germany. Her main research interests are in machine learning for large-scale language understanding and text semantics. Iryna’s work has received numerous awards. Examples are the ACL fellow award 2020 and the first Hessian LOEWE Distinguished Chair award (2,5 mil. Euro) in 2021. Iryna is co-director of the NLP program within ELLIS, a European network of excellence in machine learning. She is currently the president of the Association of Computational Linguistics. In 2022, she received an ERC Advanced Grant to support her vision for the next big step in NLP “InterText – Modeling Text as a Living Object in a Cross-Document Context”.
SQuARE: Towards Multi-Domain and Few-Shot Collaborating Question Answering Agents(Talk)

Haritz Puerto
Haritz Puerto is a Ph.D. candidate in Machine Learning & Natural Language Processing at UKP Lab in TU Darmstadt, supervised by Prof. Iryna Gurevych. His main research interests are reasoning for Question Answering and Graph Neural Networks. Previously, he worked at the Coleridge Initiative, where he co-organized the Kaggle Competition Show US the Data. He got his master’s degree from the School of Computing at KAIST, where he was a research assistant at IR&NLP Lab and was advised by Prof. Sung-Hyon Myaeng.
SQuARE: Towards Multi-Domain and Few-Shot Collaborating Question Answering Agents(Talk)

Daniel Whitenack, PhD
Daniel Whitenack (aka Data Dan) is a Ph.D. trained data scientist working with SIL International on NLP and speech technology for local languages in emerging markets. He has more than ten years of experience developing and deploying machine learning systems at scale. Daniel co-hosts the Practical AI podcast, has spoken at conferences around the world (Applied Machine Learning Days, O’Reilly AI, QCon AI, GopherCon, KubeCon, and more), and occasionally teaches data science/analytics at Purdue University.
Modern NLP: Pre-training, Fine-tuning, Prompt Engineering, and Human Feedback(Workshop)

Dean Pleban
Dean has a background combining physics and computer science. He’s worked on quantum optics and communication, computer vision, software development, and design. He’s currently CEO at DagsHub, where he builds products that enable data scientists to work together and get their models to production, using popular open-source tools.
He’s also the host of the MLOps Podcast, where he speaks with industry experts about ML in production.

David Talby, PhD
David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise.
He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK.
David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards and GameChangers Awards in 2022.

Freddy Boulton
Freddy Boulton started his career as a data scientist for Nielsen where he built predictive models of television viewing behavior to make television ratings more accurate. This gave him a first hand-view of one of the biggest challenges faced by industry data scientists – being able to easily communicate and share machine learning models with stakeholders. He is currently solving that problem by working on Gradio, an open-source python library that lets data scientists create fully interactive demos of machine learning models with just a few lines of code.
A Practical Tutorial on Building Machine Learning Demos with Gradio(Workshop)

Akash Tandon
Akash Tandon is co-founder and CTO of Looppanel where he builds software to help product teams record, store and analyze user research data. He is a co-author of Advanced Analytics with PySpark, published by O’Reilly. Previously, Akash worked as a senior data engineer at Atlan, SocialCops and RedCarpet where he built data infrastructure for enterprise, government and finance use-cases. He has also been a participant and mentor in the Google Summer of Code program with the R Project for Statistical Computing.
From Big Data to NLP insights: Getting started with PySpark and Spark NLP(Workshop)

Daniel Lenton, PhD
Daniel Lenton is the creator of Ivy, which is an open-source framework with an ambitious mission to unify all other ML frameworks. Prior to starting Ivy, Daniel was a PhD student at Imperial College London, where he published research in the areas of machine learning, robotics and computer vision.
Unifying ML With One Line of Code(Tutorial)

Christina Qi
Christina Qi is the CEO of Databento, an on-demand market data platform. She formerly founded Domeyard LP, a hedge fund focused on high frequency trading (HFT) that traded up to $7.1 billion USD per day. Failing to earn a job offer after a Wall Street internship, Christina started Domeyard from her dorm room with $1000 in savings, about 9 years ago. Her fund was a tiny minnow amongst the tigers of the hedge fund world, but after Michael Lewis’s Flash Boys came out in 2014 and HFT firms hid from the spotlight, Domeyard accidentally found itself in the center of the ring. Over the next decade, her company’s story was featured on the front page of Forbes and Nikkei, and quoted in the Wall Street Journal, Bloomberg, CNN, NBC, and the Financial Times as a result of the controversy and fascination with HFT. By a series of accidents, Christina became a voice in her industry, contributing to the World Economic Forum’s research on AI in finance, guest lecturing at dozens of universities, and teaching Domeyard’s case study at Harvard Business School. She is grateful to be able to open up about her mistakes, and to help people turn failures into opportunities.
No amount of therapy has quashed Christina’s impostor syndrome, but she will always be proud of her non-profit volunteer work. Christina was elected as a Member of the MIT Corporation, MIT’s Board of Trustees. She is Co-Chair of the Board of Invest in Girls, bringing financial literacy education to underserved populations across the US. Christina also sits on the Board of Directors of The Financial Executives Alliance (FEA) Hedge Fund Group, drives entrepreneurship efforts at the MIT Sloan Boston Alumni Association (MIT SBAA), and served on the U.S. Non-Profit Boards Committee of 100 Women in Finance. Although “X Under X” lists are a gimmick, she’ll admit that Forbes 30 Under 30 made a positive impact on her life by giving her a community – friends who dragged her out of bed during the lowest days of her life. Christina holds a Bachelor of Science in Management Science from MIT and is a CAIA Charterholder.
When Robots Beat Humans: How ChatGPT is Changing the Financial Industry(Business Talk)

Tomasz Adamusiak, MD, PhD
Tomasz Adamusiak MD Ph.D. is a Chief Scientist in the Clinical Insights & Innovation Cell at MITRE. He leads a multi-disciplinary group driving high-impact contributions to private and public sectors in Clinical and Genomic Data Science. Before MITRE, Tomasz was the Head of Data Science in the Pfizer Innovation Research (PfIRe) Lab. His team was responsible for developing novel digital endpoints, designing decentralized approaches for clinical trials, and applying AI/machine learning methods to generate novel insights from clinical data. Tomasz served in leadership and advisory roles in the American Medical Informatics Association, the SNOMED International, and the Epic Research Data Network.
Unlocking the Potential of Protein Prediction in Drug Discovery(Business Talk)

Allen Downey, PhD
Allen Downey is a Staff Scientist at DrivenData and professor emeritus at Olin College. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. His blog, Probably Overthinking It, features articles about Bayesian statistics. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.
Causation, Collision, and Confusion: Avoiding the most dangerous error in statistics(Talk)

Tendü Yoğurtçu, PhD
Tendü Yoğurtçu, Ph.D., is the Chief Technology Officer (CTO) at Precisely. In this role, she directs the company’s technology strategy and innovation, leading all product research, and development programs.
Prior to becoming Chief Technology Officer, Tendü served as General Manager of Big Data for Syncsort, the precursor to Precisely, leading the global software business for Data Integration, Hadoop, and Cloud. She previously held several engineering leadership roles at the company, directing the development of the Integrate family of products.
Tendü has over 25 years of software industry experience, with a focus on Big Data and Cloud technologies. She has also spent time in academics, working as a Computer Science Adjunct Faculty Member at Stevens Institute of Technology.
In 2019, Tendü was named CTO of the Year at the prestigious Women in IT Awards, and in 2018 was recognized as an Outstanding Executive in Technology by Advancing Women in Technology (AWT).
Tendü received her Ph.D. in Computer Science from Stevens Institute of Technology, NJ, a Master of Science in Industrial Engineering, and a B.S. in Computer Engineering from Bosphorus University in Istanbul.
Power trusted AI/ML Outcomes with Data Integrity(Business Talk)

Florian Jacta
Florian Jacta is a specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop a production-ready AI application. Package pre-sales and after-sales functions. He is data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS. He developed several Predictive Models as part of strategic AI projects. Also, Florian got his master’s degree in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.
How to Build Stunning Data Science Web applications in Python – Taipy Tutorial(Workshop)
Bringing AI to Retail and Fast Food with Taipy’s Applications(Track Keynote)
Demo Session Title: Turning your Data/AI algorithms into full web apps in no time with Taipy
Abstract:
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production
stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to
build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
1. Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio,
Streamlit, Dash, etc.
2. Taipy Core fills a void in the standard Python back-end stack.

Albert Vu
Albert has skills in machine learning and big data to solve (financial) optimization problems. He developed projects of different skill levels for Taipy’s tutorial videos. He got his degree from McGill University – Bachelor of Science. Major in Computer Science & Statistics. Minor in Finance.
How to build stunning Data Science Web applications in Python – Taipy Tutorial(Workshop)
Bringing AI to Retail and Fast Food with Taipy’s Applications(Track Keynote)
Demo Talk Session Title: Turning your Data/AI Algorithms into full web apps in no time with Taipy
Abstract:
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio, Streamlit, Dash, etc.
Taipy Core fills a void in the standard Python back-end stack.

Madhav Thaker
Madhav is a Senior Data Scientist at Shopify where he focuses on building/evaluating recommendation systems. His role includes prototyping potential solutions and scaling them for production. Prior to Shopify, Madhav was a data science consultant where he focused on NLP projects for pharmaceutical companies. He then transitioned to Disney to develop personalized movie recommendations which sparked his passion for recommendation systems. In his free time, Madhav hosts free Q&A sessions for aspiring data scientists who are looking to get into this space.
Generating Content-based Recommendations for Millions of Merchants and Products(Talk)

Arvind Neelakantan, PhD
Arvind Neelakantan is a Research Lead and Manager at OpenAI working on deep learning research for real-world applications. He got his PhD from UMass Amherst where he was also a Google PhD Fellow. His work has received best paper awards at NeurIPS and at Automated Knowledge Base Construction workshop.
Text and Code Embeddings(Talk)

Matt Bezdek, PhD
Matt Bezdek is a Senior Data Scientist at Elder Research. In his work, he empowers commercial clients to make better business decisions, with expertise in machine learning, forecast modeling, natural language processing, and visualization. He has a PhD in Cognitive Psychology from Stony Brook University and has conducted neuroimaging research at Georgia Tech and Washington University in St. Louis.
Topic Modeling using pre-trained large language model embeddings(Talk)

Nils Reimers
Nils Reimers is an NLP / Deep Learning researcher with extensive experience on representing text in dense vector spaces and how to use them for various applications. During his research career, he created sentence-transformers that were the foundation for many today’s semantic search applications.
In 2022, Nils joined Cohere.com to lead the team on smarter semantic search technologies and how to connect LLMs to enterprise data. Here, his teams develop new foundation models that can understand and reason over complex data.
Connecting Large Language Models – Common Pitfalls & Challenges(Talk)

Jonas Mueller
Jonas Mueller is Chief Scientist and Co-Founder at Cleanlab, a software company providing data-centric AI tools to efficiently improve ML datasets. Previously, he was a senior scientist at Amazon Web Services developing AutoML and Deep Learning algorithms which now power ML applications at hundreds of the world’s largest companies. In 2018, he completed his PhD in Machine Learning at MIT, also doing research in NLP, Statistics, and Computational Biology.
Jonas has published over 30 papers in top ML and Data Science venues (NeurIPS, ICML, ICLR, AAAI, JASA, Annals of Statistics, etc). This research has been featured in Wired, VentureBeat, Technology Review, World Economic Forum, and other media. He has also contributed open-source software, including the fastest-growing open-source libraries for AutoML (https://github.com/awslabs/autogluon) and Data-Centric AI (https://github.com/cleanlab/cleanlab).
How to Practice Data-Centric AI and Have AI improve its Own Dataset(Tutorial)

Tejaswini Pedapati
Tejaswini Pedapati works at IBM Research. Her research is focused on interpretability and automating deep learning. To that end, she was involved in developing tools and algorithms to provide these capabilities for IBM products. She has a masters’ degree from Columbia University.
Introduction to AutoML: Hyperparameter Optimization and Neural Architecture Search(Tutorial)

Dan Shiebler
As the Head of Machine Learning at Abnormal Security, Dan builds cybercrime detection algorithms to keep people and businesses safe. Before joining Abnormal Dan worked at Twitter: first as an ML researcher working on recommendation systems, and then as the head of web ads machine learning. Before Twitter Dan built smartphone sensor algorithms at TrueMotion and Computer Vision systems at the Serre Lab.

Brian Lucena, PhD
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
Uncertainty Quantification: Approaches and Methods(Training)

Connor Shorten, PhD
Connor Shorten is a Research Scientist at Weaviate, an Open-Source Vector Search Database. Connor has had a role in the development of Ref2Vec, Hybrid Search, Generative Search, Weaviate’s Pipe API, and Re-Ranking. Connor has also hosted 34 episodes of the Weaviate podcast featuring guests from OpenAI, Cohere, You.com, MosaicML, Jina AI, Deepset, Neural Magic and many others! Connor also co-hosts Weaviate meetups in Boston and New York City! Prior to Weaviate, Connor has earned a Ph.D. in Computer Science from Florida Atlantic University. Connor’s Ph.D. was primarily focusing on Data Augmentation in Deep Learning and Applications of Deep Learning for COVID-19. Connor’s publication “A survey on image data augmentation in deep learning” has achieved over 5,000 citations.
Building Recommendation Systems(Workshop)

Brennan Smith
Brennan is an experienced Machine Learning professional with a background in Information Technology solutions, Business Analytics, big data and AI. Currently, he serves as Senior Machine Learning Engineer at Iguazio (acquired by McKinsey & Company), bringing his expertise to help Data Scientists, Data Engineers, and ML Engineers work together to deploy AI/ML applications faster, more efficiently and in a reproducible way. Before that he spent 8 years at SAS in various technology roles. Brennan holds a BS in Computer Science from UNC Wilmington and previously served in the Marine Corps. He lives in North Carolina with his family, and when he’s not tangling with big data for customers, he enjoys tangling with big fish!
Session Title: Building an ML Factory with OS MLOps Orchestration tool MLRun
Abstract:
MLRun is an open-source MLOps orchestration framework. It exists to accelerate the integration of AI/ML applications into existing business workflows. MLRun introduces Data Scientists to a simple Python SDK that transforms their code into a production-quality application. It does so by abstracting the many layers involved in the MLOps pipeline. Developers can build, test, and tune their work anywhere and leverage MLRun to integrate with other components of their business workflow.
The capabilities of MLRun are extensive, and we will cover the basics to get you started. You will leave this session with enough information to:
Get you started with MLRun, on your own, in 10 minutes, so you can automate and accelerate your path to production and have your first AI app running in 20 minutes
Run local move to Kubernetes
Understand how your Python code can run as a Kubernetes job with no code changes
Track your experiments
Get an introduction to advanced MLOps topics using MLRun

Emily Curtin
Emily is a Staff MLOps Engineer at Intuit Mailchimp, meaning she gets paid to say “it depends” and “well actually.” Professionally she leads a crazy good team focused on helping Data Scientists do higher quality work faster and more intuitively. Non-professionally she paints huge landscapes and hurricanes in oils, crushes sweet V1s (as long as they’re not too crimpy), rides her bike, reads a lot, and bothers her cats. She lives in Atlanta, GA, which is inarguably the best city in the world, with her husband Ryan who’s a pretty darn cool guy.
Containers + GPUs In Depth(Talk)
Extinguishing the Garbage Fire of ML Testing(Lightning Talks)

Elliott Cordo
Elliott is an expert in data engineering, data warehousing, information management, and technology innovation with a passion for helping transform data into powerful information. He has more than a decade of experience implementing cutting-edge, data-driven applications. He has a passion for helping organizations understand the true potential in their data by working as a leader, architect, and hands-on contributor.
Elliott has built nearly a dozen cloud-native data platforms on AWS, ranging from data warehouses and data lakes, to real-time activation platforms in companies ranging from small startups to large enterprises.

Nick Singh
Nick Singh is an Ex-Facebook & Google Engineer turned best-selling author of Ace the Data Science Interview, and founder of SQL Interview Platform DataLemur.com. His career advice on LinkedIn has earned him 100,000 followers, and he’s successfully career coached 578 people to land their dream job in data!
Ace the Data Job Hunt(Career Talk)
Ace the Data Science Interview with Nick Singh(Career Workshop)

Han Wang
Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the creator of the Fugue project, aiming at democratizing distributed computing and machine learning.

Mélissa Rollot
Mélissa is a data scientist engineer. Over the past 7 years working at Quinten Health in the healthcare sector as a Project Manager in data science, she has participated in the development of several decision support solutions powered by AI, e.g. for rare disease diagnosis, disease progression modelling and endotyping, or evaluation of population heterogeneity. She leds multiple studies of real-world data using advanced analytics methods to characterize phenotypes and disease progression for neurological conditions, cardiovascular diseases, and oncology, for pharma companies, research organizations and care providers. Currently, she is managing at Quinten Health the development of AI-powered solutions to support R&D decisions using RW data for our client.
A Natural Language Processing (NLP) Approach to Automate Patients’ Testimonials Analysis(Tutorial)

Andrew Zaldivar, PhD
Andrew Zaldivar is a member of the Responsible AI & Human-Centered Technology organization in Google Research. His role is to advocate for the responsible development and use of AI by disseminating and democratizing research findings from his organization. Andrew works with researchers and designers that are examining and shaping the socio-technical processes underpinning AI technologies through participatory, culturally-inclusive, and intersectional equity-oriented approaches. Before joining Google Research, Andrew was a Senior Strategist in Google’s Trust and Safety team, protecting the integrity of some of Google’s key products by utilizing machine learning to scale, optimize, and automate abuse fighting efforts. Andrew also holds a doctorate in cognitive neuroscience from the University of California, Irvine and was an Insight Data Science fellow.
The Data Cards Playbook: A Toolkit for Transparency in Dataset Documentation(Tutorial)

Mahima Pushkarna
Mahima Pushkarna is a design lead at the People + AI Research Initiative at Google. She brings design thinking and human-centered design into Human-AI Research. Her work explores advanced technologies, including generative AI, and draws from a mix of human-centered, participatory, and speculative design practices to bridge the gap between upstream developer practices and their impact on end user experiences and society. Mahima has designed tools and frameworks for explainability and interpretability that are widely used across industries and academia. She believes design can be a powerful tool for understanding and addressing the needs of people impacted by technology. Mahima is also interested in exploring the intersection of design, technology, and society, and is always looking for new ways to use design to make the world a better place. Mahima holds a masters degree in Information Design and Data Visualization from Northeastern University, Boston, MA. She has published in leading academic journals and conferences, including IEEE Vis, FAccT, and workshops at NeuRIPS. Prior to Google, Mahima worked as a product designer at Innovation by Design, a global think-tank, consulted at MIT’s Design Lab, and designed visualization tools at Ion Interactive. This bio was written with assistance from a language-driven model.
The Data Cards Playbook: A Toolkit for Transparency in Dataset Documentation(Tutorial)

Matteo Pirotta
Bio Coming Soon!
Exploration in Reinforcement Learning(Tutorial)

Swagata Ashwani
Swagata is a Data Professional with over 6 years experience in Healthcare, Retail and Platform Integration industry. She is an avid blogger and writes about state of the art developments in the AI space. She is particularly interested in Natural Language Processing, and focuses on researching how to make NLP models work in practical setting. In her spare time, she loves to play her guitar, sip masala chai and find new spots for doing Yoga. Connect with her here – https://www.linkedin.com/in/swagata-ashwani/
Creating a Custom Vocabulary for NLP Tasks Using exBERT and spaCY(Tutorial)

Frank DeFalco
Frank DeFalco is the Director of Epidemiology Analytics at Janssen Research and Development where he architects software solutions and data platforms for the analysis and application of observational data sources. He is currently the leader and Benevolent Dictator of the OHDSI open source architecture working group. Frank is a presenter and panelist at OHDSI symposiums and has served as faculty for OHDSI symposium tutorials classes on architecture and common data model vocabulary. In addition to leading the OHDSI Architecture working group Frank initiated development of a standardized platform for observational analytics known as ATLAS. He is an active contributor to the open source software repositories developed and released by OHDSI including ATLAS, WebAPI, Achilles, Circe, Arachne, Visualizations, Hermes, Helios and others. Frank’s areas of expertise include computation epidemiology, large scale data platforms, software development and architecture, data visualization and informatics. Prior to joining Janssen Research and Development, Frank held the position of Senior Principal and Director of Collaboration and Analytics at British Telecom where he was a strategic advisor for multiple Fortune 100 companies across sectors including Consumer Products, Telecommunications and Pharmaceuticals. Frank received his undergraduate degrees in Computer Science and Psychology at Rutgers University.”
Patient Level Prediction with Supervised Learning Models in Federated Data Networks(Tutorial)

James Demmel, PhD
James Demmel is the Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and Mathematics at the University of California at Berkeley, and former Chair of the EECS Dept. He also serves as Chief Strategy Officer for the start-up HPC-AI Tech, whose goal is to make large-scale machine learning much more efficient, with little programming effort required by users. Demmel’s research is in high performance computing, numerical linear algebra, and communication avoiding algorithms. He is known for his work on the widely used LAPACK and ScaLAPACK linear algebra libraries. He is a member of the National Academy of Sciences, National Academy of Engineering, and American Academy of Arts and Sciences; a Fellow of the AAAS, ACM, AMS, IEEE and SIAM; and winner of the IPDPS Charles Babbage Award, IEEE Computer Society Sidney Fernbach Award, the ACM Paris Kanellakis Award, the J. H. Wilkinson Prize in Numerical Analysis and Scientific Computing, and numerous best paper prizes.
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training(Tutorial)

Yang You, PhD
Yang You is a Presidential Young Professor at National University of Singapore. He is on an early career track at NUS for exceptional young academic talents with great potential to excel. He received his PhD in Computer Science from UC Berkeley. His advisor is Prof. James Demmel, who was the former chair of the Computer Science Division and EECS Department. Yang You’s research interests include Parallel/Distributed Algorithms, High Performance Computing, and Machine Learning. The focus of his current research is scaling up deep neural networks training on distributed systems or supercomputers. In 2017, his team broke the world record of ImageNet training speed, which was covered by the technology media like NSF, ScienceDaily, Science NewsLine, and i-programmer. In 2019, his team broke the world record of BERT training speed. The BERT training techniques have been used by many tech giants like Google, Microsoft, and NVIDIA. Yang You’s LARS and LAMB optimizers are available in industry benchmark MLPerf. He is a winner of IPDPS 2015 Best Paper Award (0.8%), ICPP 2018 Best Paper Award (0.3%) and ACM/IEEE George Michael HPC Fellowship. Yang You is a Siebel Scholar and a winner of Lotfi A. Zadeh Prize. Yang You was nominated by UC Berkeley for ACM Doctoral Dissertation Award (2 out of 81 Berkeley EECS PhD students graduated in 2020). He also made Forbes 30 Under 30 Asia list (2021) and won IEEE CS TCHPC Early Career Researchers Award for Excellence in High Performance Computing. For more information, please check his lab’s homepage at https://ai.comp.nus.edu.sg/
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training(Tutorial)

Isaac Slavitt
Isaac is a co-founder and Principal Data Scientist at DrivenData, Inc, where he leads client engagements and spearheads development of the data science competition platform. He holds a master’s in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences and a BS in Operations Research from the U.S. Coast Guard Academy, and previously spent seven years as a Coast Guard officer serving in a variety of operational and quantitative roles.
Data Science Hiring is Broken—How Can We Fix It?(Business Talk)

Philip Wauters
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Learn how to Efficiently Build and Operationalize Time Series Models in 2023(Workshop)

Ayush Patel
Ayush is the co-founder of TwelveFold, an AI start-up studio, where he manages a portfolio of MLOps and Generative AI companies with entrepreneurs. He also works as the CEO of Censius, an AI Observability platform that helps to optimize AI models' real-world performance. As a seasoned professional, he has closely worked with customers across industry verticals, AI teams, and research projects to build reliable and compliant AI solutions to solve everyday business problems and scale models at production.
Why do AI Models go Rogue? A Guide to Detect and Fix Silent Model Failures(Business Talk)

Bob Foreman
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.

Mihir Mathur
Mihir Mathur is the lead Product Manager for Machine Learning at Lyft, where he works on building ML/AI tools that power Lyft’s automated intelligent decisions across realtime pricing, ETAs, fraud detection, safety classification etc. In the past Mihir has worked on building delightful products for millions of users at Quora, Houzz, and Thomson Reuters and spoken about his work at conferences such as MLOps World and ODSC. Mihir graduated magna cum laude from UCLA with a Bachelor’s and Master’s in Computer Science.
Powering Millions of Real-time Decisions with Distributed Model Serving(Talk)

Melanie Veale, PhD
Melanie Veale, Ph.D. is a recovering Astrophysicist, currently working as a Data Solutions Architect at Anomalo. Her Ph.D. research on galaxy dynamics introduced her to statistical and computational python, as well as other languages and tools like C++, Fortran, IDL, R, bash, SLURM, and others. She has also dabbled in AWS infrastructure, Kubernetes, Docker, Spark, Ray, Dask and more as a Field Engineer and Field Data Scientist at Domino Data Lab, helping analytics and machine learning teams modernize their collaboration and deployment workflows. Nowadays she is a troubleshooting enthusiast anywhere on the Data, Analytics, and MLOps tech stacks, and enjoys melding her passions for crisp technical communication, good visualizations, and first-principles thinking into helping organizations get the most out of their data.

Robert Blanchard
Robert is a Principal Data Scientist at SAS where he builds end-to-end artificial intelligence applications. He also researches, consults, and teaches machine learning with an emphasis on deep learning and computer vision for SAS. Robert has authored an introductory book on computer vision and has written several professional courses on topics including neural networks, deep learning, and optimization modeling. Before joining SAS, Robert worked under the Senior Vice Provost at North Carolina State University, where he built models pertaining to student success, faculty development, and resource management. Prior to working in academia, Robert was a member of the research and development group on the Workforce Optimization team at Travelers Insurance. His models at Travelers focused on forecasting and optimizing resources. Robert graduated with a master’s degree in Business Analytics and Project Management from the University of Connecticut and a master’s degree in Applied and Resource Economics from East Carolina University.
Building Computer Vision Models and Optimizing Hyperparameters using PyTorch and SAS Viya(Workshop)

Bonny P McClain
Bonny is a geospatial analyst and self described human geographer and social anthropologist. Exploring geographic properties that capture complex interactions, dynamic shifts in ecosystem balance and how activities influence eco-geomorphic conceptual frameworks across a wide variety of environments are the topics of popular public talks and panel discussions.
The ability to apply advanced data analytics, including data engineering and geo-enrichment, to poverty, race, and gender discussions targets judgments about structural determinants, racial equity, and elements of intersectionality to illuminate the confluence of metrics contributing to poverty.
Bonny is the author of the books Python for Geospatial Data Analysis: Theory, Tools, and Practice for Location Intelligence (publisher, O’Reilly Media) and Geospatial Analysis with SQL: A hands on guide to performing geospatial analysis by unlocking the syntax of spatial SQL published by Packt Press. Current projects include a new book in progress with Locate Press, Geospatial Data Science & the Art of Storytelling.

Ari Zitin
Ari Zitin holds bachelor’s degrees in both physics and mathematics from UNC-Chapel Hill. His research focused on collecting and analyzing low energy physics data to better understand the neutrino. Ari taught introductory and advanced physics and scientific programming courses at UC-Berkeley while working on a master’s in physics with a focus on nonlinear dynamics. While at SAS, Ari has worked to develop courses that teach how to use Python code to control SAS analytical procedures.
Building Computer Vision Models and Optimizing Hyperparameters using PyTorch and SAS Viya(Workshop)

Yuval Fernbach
Yuval Fernbach is the Co-founder & CTO of Qwak, where he is focused on building next-generation ML Infrastructure for ML teams of various sizes. Before Qwak, Yuval was an ML Specialist at AWS , where he helped AWS Customers across EMEA with their ML challenges. Previous to that, he was the CTO of the IT department of the IDF (“Mamram”).

Zairah Mustahsan
Zairah is a Data Scientist at you.com, the AI search engine, where she leverages her expertise in statistical and machine-learning techniques to build analytics and experimentation platforms. She recently spoke at NeurIPS 2022 and shared her expertise on data-driven decision-making in a privacy-focused AI-first startup. Previously, Zairah was a Data Scientist at IBM Research, researching Natural Language Processing (NLP) and AI Fairness topics. She has published research and holds patents in these domains. Zairah obtained her M.S. in Computer Science from the University of Pennsylvania, where she researched scikit-learn model performance. Her findings have since been used as guidelines for applying machine learning to supervised classification tasks. Zairah has published her work in top AI conferences such AAAI and has over 300 citations. Aside from work, Zairah enjoys adventure sports and poetry.
From Zero to 100: Lakehouse Architecture for a Privacy Focused Search Engine(Talk)

Andras Zsom, PhD
Andras Zsom is an Assistant Professor of the Practice and Director of Graduate Studies at the Data Science Initiative at Brown University, Providence, RI. He is teaching two mandatory courses in the data science master’s program, and helps the students navigate through their studies and curriculum. He also supervises interns on various research projects related to missing data, interpretability, and developing machine learning pipelines.

Ori Nakar
Ori Nakar is a principal cyber-security researcher, a data engineer, and a data scientist at Imperva Threat Research group. Ori has many years of experience as a software engineer and engineering manager, focused on cloud technologies and big data infrastructure. Ori also has an AWS Data Analytics certification. In the Threat Research group, Ori is responsible for the data infrastructure and involved in analytics projects, machine learning, and innovation projects.
Botnets Detection at Scale – Lesson Learned from Clustering Billions of Web Attacks into Botnets(Talk)

Daniel J. Smith, PhD
Daniel J. Smith, PhD, MBA has worked at WGU for 3 years. He has experience in several industries in analytics through the director level in insurance, health care administration, and higher education. His experience is in AI and machine learning applications in industry using R, Tableau, SAS and Python. He enjoys working with students to improve their analytical, programming, and communication skills.
Hands-On Workshop: Competency-Based Experience Featuring Python & Predictive Modeling(Bootcamp)

Jeffrey Yau, PhD
Jeffrey Yau is currently Chief Data & A.I. Officer at Fanatics Collectibles. Most recently, he served as Global Head of Data Science, Analytics & Engineering at Amazon Music where he oversaw multiple teams who developed both insights-packed analytics and end-to-end statistical and machine learning systems. Prior to Amazon, Jeffrey worked at WalmartLabs as the VP of Data Science & Engineering where he led the team responsible for powering Walmart store mobile apps and the entire store finance system. Further, his team created end-to-end machine learning systems for key business initiatives and had a multi-billion dollar impact annually on Walmart U.S.
Over the years, he has held various senior level positions in quantitative finance at global investment management firm AllianceBernstein, consulting firm Data Science at Silicon Valley Data Science, multinational financial services company Charles Schwab Corporation, and the world’s leading professional services firm KPMG. He began his career as a tenure-track Assistant Professor of Economics at Virginia Tech, and he was an adjunct professor at UC Berkeley, Cornell, and NYU, teaching machine learning and advanced statistical modeling for finance and business.

Benjamin Batorsky, PhD
Ben is a Senior Data Scientist at the Institute for Experiential AI at Northeastern University. He obtained his Masters in Public Health (MPH) from Johns Hopkins and his PhD in Policy Analysis from the Pardee RAND Graduate School. Since 2014, he has been working in data science for government, academia and the private sector. His major focus has been on Natural Language Processing (NLP) technology and applications. Throughout his career, he has pursued opportunities to contribute to the larger data science community. He has presented his work at conferences, published articles, taught courses in data science and NLP, and is co-organizer of the Boston chapter of PyData. He also contributes to volunteer projects applying data science tools for public good.
Bagging to BERT – A Tour of Applied NLP(Workshop)

Leticia Rabor
Leticia Rabor worked as a professional Software and Systems Engineer in the Defense and Aerospace industries for over 13 years. She has designed, implemented, and tested various image formation subsystem components for ground system development.
She has also worked in Academia since 2012. Her roles include program chair and instructor. Leticia is currently an adjunct professor at Fort Hays State University and a full-time senior instructor at Western Governor University.
She has a Master of Science degree in Information Assurance and a bachelor’s degree in Computer Science. Her yearly activities include conducting an external one hour workshop in both mobile development and JavaScript at the Geek Girls Tech Conference at University of San Diego (USD). She participated as one of the panel experts for “The future of mobile development” at the Geek Girls Tech Conference in San Diego, California. She is a member of the Women Who Code (WWC) and a recipient for “Faculty of the Year” award in 2017.
Hands-On Workshop: Competency-Based Experience Featuring Python & Predictive Modeling(Bootcamp)

Mingo Sanchez
Mingo is a Senior Sales Engineer at Plotly. After graduating from Bowdoin College with a degree in computer science, he started working with organizations in the master data management and data science spaces. Throughout his career, Mingo has partnered with large financial institutions, life sciences organizations, retail companies, and government agencies to help them better understand their data and more effectively serve their customers. Mingo enjoys building relationships with people to understand their pain points and help them solve their most challenging business and technical problems.
Learn how to Build Interactive Data Apps with Plotly Dash(Workshop)

Andrew Lamb
Andrew Lamb is the chair of the Apache Arrow Program Management Committee (PMC) and a Staff Software Engineer at InfluxData. He works on InfluxDB IOx, a time series database engine written in Rust, that heavily uses the Apache Arrow ecosystem. He actively contributes to many open source software projects including the Apache Arrow Rust implementation and the Apache Arrow DataFusion query engine.
Tutorial: Introduction to Apache Arrow and Apache Parquet, using Python and Pyarrow(Workshop)

Fabiana Clemente
Fabiana Clemente is the co-founder and CDO of YData, combining Data Understanding, Causality, and Privacy as her main fields of work and research, with the mission to make data actionable for organizations. Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies. Host of “When Machine Learning meets privacy” podcast and a guest speaker at Datacast and Privacy Please, the previous WebSummit speaker, was recently awarded “Founder of the Year” by the South Europe Startup Awards.
Missing Data: A Synthetic Data Approach for Missing Data Imputation(Workshop)

Gary Nakanelua
Gary Nakanelua is a professional technologist with over 17 years of experience and the author of Experiment or Expire. Gary is the Managing Director of Innovation at Blueprint, a data intelligence company based in Bellevue, WA. He’s responsible for the experimentation and creation of Blueprint’s transformative solutions and accelerators. With his diverse background, Gary brings a different perspective to problems that businesses are facing today to create quantifiable solutions driven through a high level of collaborative thought processing, strategic planning, and cannibalization.
Streamlining Your Streaming Analytics with Delta Lake & Rust(Talk)

Greg West
A member of CSI for a decade, Greg has developed a wealth of expertise on knowledge graph technology. His true speciality lies demonstrating and developing custom solutions that leverage Anzo’s unique capabilities.
Session Title: Accelerating AI/ML Initiatives with Knowledge Graph
Abstract: Integrating and unifying data from diverse sources is foundational to AI and ML workflows. This workshop will demonstrate how Anzo’s knowledge graph platform can create an enterprise scale knowledge graph from several sources – setting organizations up for sustainable success with collective intelligence. During this workshop, users will:
Create a sample knowledge graph from several sources.
Demonstrate flexible data preparation for training datasets.
Analyze the knowledge graph with native visualizations and graph algorithms
Connect to the knowledge graph for additional data science operations
From its hyper agile in-memory MPP graph engine to its point-and-click user experience and open flexible architecture, Anzo transcends the limitations of traditional knowledge graphs and gives you all the capabilities and flexibilities that complex, enterprise-scale solutions need.
Join this demo to see why Anzo might be the solution you need.

Dani Herzberg
Dani Herzberg is an Analyst on the Product Management and Development team at S&P Global Market Intelligence. On this team, she creates notebooks in Databricks, assists in analytic visualizations of S&P Global data, and provides SQL query support. She holds a Master of Science in Business Analytics from Georgetown University.
Session Title: Data Visualizations Utilizing S&P Global Marketplace Workbench
Abstract:
We will be using the plotly library to create visualizations in S&P Global Marketplace Workbench, which is powered by Databricks, and showcasing a Databricks Dashboard from the different charts. I would say this demo talk is best suited for beginner – intermediate audience.

Alexandra Ebert
Alexandra Ebert is a Responsible AI, synthetic data & privacy expert and serves as Chief Trust Officer at MOSTLY AI. As a member of the company’s senior leadership team, she is engaged in public policy issues in the emerging field of synthetic data and Ethical AI and is responsible for engaging with the privacy community, with regulators, the media, and with customers. She regularly speaks at international conferences on AI, privacy, and digital banking and hosts The Data Democratization Podcast, where she discusses emerging digital policy trends as well as Responsible AI and privacy best practices with regulators, policy experts and senior executives.
Apart from her work at MOSTLY AI, she serves as the chair of the IEEE Synthetic Data IC expert group and was pleased to be invited to join the group of AI experts for the #humanAIze initiative, which aims to make AI more inclusive and accessible to everyone.
Before joining the company, she researched GDPR’s impact on the deployment of artificial intelligence in Europe and its economic, societal, and technological consequences. Besides being an advocate for privacy protection, Alexandra is deeply passionate about Ethical AI and ensuring the fair and responsible use of machine learning algorithms. She is the co-author of an ICLR paper and a popular blog series on fairness in AI and fair synthetic data, which was featured in Forbes, IEEE Spectrum, and by distinguished AI expert Andrew Ng.
When Privacy Meets AI – Your Kick-Start Guide to Machine Learning with Synthetic Data(Tutorial)

Pavel Klushin
Pavel Klushin is a seasoned solution architecture expert who currently leads the function at Qwak. With years of experience in the technology industry, he is known for his exceptional ability to design and deliver innovative solutions that meet the specific needs of his clients. Pavel previously led the solution architecture team at Spot (Aquired by NetApp).
Session Title: End to End Machine Learning Pipeline Management
Abstract: Join this demo to find how to centralize your ML pipeline and cut down operational complexities at each stage along the way. Qwak’s platform supports multiple use cases across any business vertical and allows data teams to productionize their models more efficiently and without depending on engineering resources. Join us to watch how <presenter name> uses Qwak to create features from data and build, train and deploy models into production. All under a single platform and with unprecedented simplicity.

Seth Juarez
My name is Seth Juarez. I currently live near Redmond, Washington and work for Microsoft.
I received my Bachelors Degree in Computer Science at UNLV with a Minor in Mathematics. I also completed a Masters Degree at the University of Utah in the field of Computer Science. I currently am interested in Artificial Intelligence specifically in the realm of Machine Learning. I currently work as a Program Manager in the Azure Artificial Intelligence Product Group.
I’ve been married now for 21 years to a fabulously talented woman and have two beautiful daughters, and two feisty sons.
Session Title: Ask the Experts! ML Pros Deep-Dive into Machine Learning Techniques and MLOps
Abstract: Experienced machine learning engineers and data scientists care about ways to easily get their models up and running quickly and share ML assets across teams for collaboration. Collaborate and streamline the management of thousands of models across teams with new, innovative features in Azure Machine Learning. Come and join us in this interactive session with our product experts and get your questions answered on the latest capabilities in Azure Machine Learning!

Kerstin Frailey
Kerstin is CEO and Co-founder of SuperUse, a collaboration platform. She has led data science initiatives at startups across industries, from healthcare to CPG. She takes pride in mentoring fantastic data scientists and nurturing talent. A builder at heart, she regularly pushes code, trains models, and uncovers insights. She has Masters degrees in Mathematical Computer Science and Mathematical Statistics. She is expecting her PhD from Cornell in early 2023. She spends her free time going on long hikes with her two small dogs through the big mountains outside Seattle.

Jared Lander
Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fundraising to finance and humanitarian relief efforts. He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.
Machine Learning in R Part I & II(Training)

Peter Wang
Peter Wang is the CEO and co-founder of Anaconda, Inc. Prior to founding Anaconda (formerly Continuum Analytics), Peter spent 15 years in software design and development across a broad range of areas, including 3D graphics, geophysics, large data simulation and visualization, financial risk modeling, and medical imaging. As a creator of the PyData community and conferences, he devotes time and energy to growing the Python data science community and advocating for increasing data literacy around the world. Peter holds a BA in Physics from Cornell University.
Open Source, Community Innovation, and Democratization of AI(Talk)

Temilade Oyeniyi, CFA
Temilade (“Temi”) Oyeniyi, CFA is Vice President at S&P Global Market Intelligence’s Quantamental Research Group, which is responsible for building global equity strategies for institutional investors.

Sagar Samtani, PhD
Dr. Sagar Samtani is an Assistant Professor and Grant Thornton Scholar in the Department of Operations and Decision Technologies at Indiana University. Dr. Samtani graduated with his Ph.D. from the AI Lab from University of Arizona. Dr. Samtani’s research interests are in AI for Cybersecurity, developing deep learning approaches for cyber threat intelligence, vulnerability assessment, open-source software, AI risk management, and Dark Web analytics. He has received funding from NSF’s SaTC, CICI, and SFS programs and has published over 40 peer-reviewed articles in leading information systems, machine learning, and cybersecurity venues. He is deeply involved with industry, serving on the Board of Directors for the DEFCON AI Village and Executive Advisory Council for the CompTIA ISAO.

Avi Pfeffer, PhD
Dr. Avi Pfeffer is Chief Scientist at Charles River Analytics. Dr. Pfeffer is a leading researcher on a variety of computational intelligence techniques including probabilistic reasoning, machine learning, and computational game theory. Dr. Pfeffer has developed numerous innovative probabilistic representation and reasoning frameworks, such as probabilistic programming, which enables the development of probabilistic models using the full power of programming languages, and statistical relational learning, which provides the ability to combine probabilistic and relational reasoning. He is the lead developer of Charles River Analytics’ Figaro™ probabilistic programming language. As an Associate Professor at Harvard, he developed IBAL, the first general-purpose probabilistic programming language. While at Harvard, he also produced systems for representing, reasoning about, and learning the beliefs, preferences, and decision making strategies of people in strategic situations. Prior to joining Harvard, he invented object-oriented Bayesian networks and probabilistic relational models, which form the foundation of the field of statistical relational learning. Dr. Pfeffer serves as Action Editor of the Journal of Machine Learning Research and served as Associate Editor of Artificial Intelligence Journal and as Program Chair of the Conference on Uncertainty in Artificial Intelligence. He has published many journal and conference articles and is the author of a text on probabilistic programming. Dr. Pfeffer received his Ph.D. in computer science from Stanford University and his B.A. in computer science from the University of California, Berkeley.

Laura Skylaki, PhD
Laura Skylaki is a Manager of Applied Research in Thomson Reuters Labs, where she leads advanced machine learning projects in the domain of Legal and Tax AI.With a career spanning more than a decade at the intersection of research and practical application, she has contributed technical expertise in diverse fields such as bioinformatics and stem cell biology, image processing and natural language processing. She holds a doctorate in stem cell bioinformatics from the University of Edinburgh, UK, and has been publishing on machine learning applications in leading academic journals since 2012.
NLP Fundamentals(Training)

Rehgan Avon
Rehgan Avon is the co-founder & CEO of AlignAI, a Knowledge Management Platform helping companies sustainably transform their organizations to effectively work with data & Artificial Intelligence. With a background in Integrated Systems Engineering and a strong focus on building technology to support analytics and machine learning, Rehgan has worked on architecting solutions and products to operationalize machine learning models at scale within the large enterprise. Rehgan’s previous experience has been fueled by a passion for early-stage startups and product development.
Rehgan has built an extensive community of analytics & data experts through Women in Analytics, a global organization she founded in 2016 to provide more visibility to diverse individuals making an impact in this space. She hosts a global annual conference that has put over 250 women on the stage. The community has over 5000 members from around the world that participate in tutorials, learning groups, discussion boards, and mentorship programs. She was also inducted into the inaugural class of Columbus CEO’s Future 50.
Building a Capability Roadmap: The Maturity Stages of Data & AI(Business Talk)

Joshy George, PhD
Joshy George is a bioinformatics researcher with a Ph.D. in Bioinformatics from the University of Melbourne, Australia, and a Master's in Computer Science from the Indian Institute of Science. With his background in data science and machine learning, Dr. George has co-authored over 100 peer- reviewed scientific articles, showcasing expertise in developing principled methods to solve complex biological problems. In his current role, he leads a team that is focused on building predictive models for cancer precision medicine and understanding the molecular mechanisms leading to diseases.
Is Machine Learning Necessary to Solve Problems in Biology(Talk)

Sanja Cvijic, PhD
Ms. Sanja Cvijic is a Senior Scientist at Charles River Analytics who leads our Probabilistic AI Representations and Reasoning Systems group and has pioneered the application of Scruff to real-world problems in ISR and maintenance. Dr. Cvijic’s research activities are centered around applications of probabilistic programming to condition monitoring, fault detection and prediction systems. She developed a prognostic health management tool for assessing health and status of power transformers in Scruff. She also developed a probabilistic tool in Scruff for improved space domain awareness for assessing risks to satellites in space. Previously, she worked as a Director of Software and a Consultant in power industry at New Electricity Transmission Software Solutions. She earned her Doctoral degree in Electrical and Computer Engineering, Power Systems, at Carnegie Mellon University in 2013. She earned her Bachelors in Electrical and Computer Engineering at the University of Belgrade, Serbia, in 2008.

Robert F. Dougherty, PhD
As the Vice President of Digital Health Research at COMPASS Pathways, Bob is leading the data science and machine learning efforts aimed at improving the safety, efficacy, and scalability of psilocybin therapy. He is an accomplished neuroscientist and engineer with deep expertise in measuring human brain and behavior, and building data-driven solutions to mental health care challenges. Prior to joining COMPASS Pathways, Bob was VP of Research at Mindstrong, leading the research and data science teams in the development of digital biomarkers for mental health. Prior to Mindstrong, Bob was the Research Director of the Stanford Center for Neurobiological Imaging. He has published over one hundred peer-reviewed articles in the fields of psychology, psychiatry, neuroscience, statistics, and magnetic resonance technology over his 30+ year scientific career. Bob completed his PhD in Experimental Psychology at the University of California at Santa Cruz, and postdoctoral fellowships at the University of British Columbia and Stanford University.

Anna Jung
Anna Jung is a Senior ML Open Source Engineer at VMware, leading the open source team as part of the VMware AI Labs. She currently contributes to various upstream ML-related open source projects focusing on the project’s overall health, adoption, and innovation. She believes in the importance of giving back to the community and is passionate about increasing diversity in open source. When away from the keyboard, Anna is often at film festivals supporting independent filmmakers.

Jenna Reps, PhD
Jenna Reps is a Director at Janssen Research and Development where she is focusing on developing novel solutions to personalize risk prediction. Jenna’s areas of expertise include applying machine learning and data mining techniques to develop solutions for various healthcare problems. She is currently working within the patient level prediction OHDSI workgroup with the aim of developing open source and user friendly software for developing risk models using data sets in the OMOP Common Data Model format. Prior to joining Janssen Research and Development, Jenna was a Senior Research Fellow at the University of Nottingham where she developed supervised learning techniques to signal adverse drug reactions using UK primary care data and acted as a data consultant to other researchers within the University. Jenna received her BSc in Mathematics and MSc in Mathematical Biology at the University of Bath and her PhD in Computer Science at the University of Nottingham.
Patient Level Prediction with Supervised Learning Models in Federated Data Networks(Tutorial)

Christian Ramirez
Christian is Machine Learning Technical Leader at Mercado Libre, the largest e-commerce/fintech company in Latin America, where he dedicates his efforts to creating tools for monitoring and quality of learning models. He is a Computer Engineer and Master in Science with a major in Astronomy from UNAM (Universidad Nacional Autonoma de Mexico). He is a “Xoogler” and has more than 15 years of experience in the field of machine learning. He has lectured in almost a dozen countries.
Introduction to Topological Data Analysis Workshop(Tutorial)

Hakan Baba
Hakan is a staff software engineer in ML Platform team at Lyft. They build ML development, training and serving systems helping 40+ teams. Previously, Hakan was a staff engineer in Box. He helped build cloud content management applications focused on security and also scaled kubernetes clusters, service meshes in an on-premise infrastructure. He started his career at the hardware level, building ASICs and transitioned to distributed systems software in a startup experience. Hakan is passionate about wearing many hats, switching abstraction levels, operational excellence and mentorship, and loves challenges and solving problems that take the whole team to address.
Powering Millions of Real-time Decisions with Distributed Model Serving(Talk)

Shanshan Bi, PhD
Shanshan is working for Novelis as Lead Data Scientist. Her field focuses on advanced operation data analytics, and AI implementation in aluminum rolling and recycling. Her team is leading AI Eco system build up in Novelis. She got her PhD degree from Missouri S&T, and worked for Center of Intelligent Maintenance Systems with focusing on fault diagnosis, prognosis and predictive maintenance in IIoT systems.
The Power of AI in Aluminum Manufacturing(Lightning Talk)

RJ He
RJ He is Zoox’s Director of Perception, where he is responsible for Zoox robotaxi’s ability to see and understand the world around them. He also leads the Zoox Boston office to assemble a world-class team of AI engineers. RJ was previously co-founder & CEO of Strio AI, an agriculture robotics startup, as well as VP Eng at Optimus Ride, an AV startup. RJ has also commanded a mechanized infantry unit, and holds an MIT PhD in autonomous systems.
Leveraging MLOps to Accelerate Autonomous Vehicle Development(Lightning Talks)

Mehrnoosh Sameki, PhD
Mehrnoosh Sameki is a principal PM manager at Microsoft, where she leads emerging Responsible AI technology and tools and for the Azure Machine Learning platform. She has cofounded Error Analysis, Fairlearn and Responsible AI Toolbox and has been a contributor to the InterpretML offering. She earned her PhD degree in computer science at Boston University, where she currently serves as an adjunct assistant professor, offering courses in responsible AI. Previously, she was a data scientist in the retail space, incorporating data science and machine learning to enhance customers’ personalized shopping experiences.

Dr Douglas Blank
Dr. Blank is Professor Emeritus at Bryn Mawr College and Head of Research at Comet ML. Doug has 30 years of experience in Deep Learning and Robotics, was one of the founders of the area of Developmental Robotics, and is a contributor to the open source Jupyter Project, a core tool in Data Science. He currently lives in San Francisco, California, along with his family and animals.
How to Explore and Analyze Mixed-Media Data Quickly and Easily(Talk)

Dr. Mohammed Taboun
Dr. Mohammed Taboun is a Principal Data Scientist at Precisely, where he uses his experience in analytics, optimization, and machine learning to drive innovation and business growth. Over the past 15 years, Mohammed has consistently demonstrated exceptional expertise in his field applied to various industries including technology, oil and gas, energy and utilities and telecommunications. With a strong academic background, Mohammed holds a PhD in Mechanical Engineering, specializing in Intelligent Control Systems, as well as a Master of Applied Science (MASc) and a Bachelor of Applied Science (BASc) in Industrial Engineering, focusing on Operations Research.

Hajime Takeda
Hajime is a data professional with five years of expertise in marketing, retail, and eCommerce, working across Japan and the United States.
As a Data Analyst at Procter and Gamble and MIKI HOUSE Americas, Hajime has led data-driven strategy formulation and implemented technology initiatives such as e-commerce expansion, advertising optimization, and the identification of growth opportunities.
As an organizer of PyData NYC, Hajime is dedicated to fostering a vibrant community centered around the exchange of knowledge on open-source technologies in New York. Additionally, Hajime lends his expertise as a contributing technical writer for Towards Data Science.
Media Mix Modeling: How to Measure the Effectiveness of Advertising in Python(Talk)

Danny Bharat
Danny Bharat is a seasoned supply chain industry professional and the Senior Vice President of Analytics at Cedric Millar Integrated Solutions. As a co-founder of Beacon Analytics, powered by Cedric Millar, he leads a growing team of solutions architects and data scientists in delivering comprehensive business intelligence and supply-chain solutions for end-to-end operations. With a deep focus on corporate planning, strategy, and digital transformation, Danny has accumulated a wealth of experience in multiple industries. He is dedicated to encouraging continuous professional growth and development through mentorship. Danny strongly believes that leaders with technical competence are more effective, and he practices what he preaches by being a self-taught dabbler in Python and DAX languages. He is passionate about using his expertise to help businesses succeed and deliver exceptional results for their customers.
Demo Session Title: Achieving Flexibility and Speed with Schema-on-Read Architecture: Moving Beyond SQL
and RDBMS
Abstract:
Beacon Analytics helps customers transition from rigid and monolithic data solutions to flexible microservices architecture, enabling better performance and faster access to critical information. By breaking up data into smaller, independent services, customers gain greater access and modification capabilities. The team recommends using the Polars library, which is based on Apache Arrow, in combination with Dash Plotly to create easy to maintain, high-performance solutions at an excellent price-to-performance ratio. Join Danny Bharat, Senior Vice President of Analytics at Cedric Millar and co-founder of Beacon Analytics, as he shares how his team’s innovative approach to data solutions allows them to build comprehensive 360° intelligence and deliver actionable insights. Beacon Analytics empowers customers to achieve success in a rapidly changing business and technology landscape by utilizing schema-on-read approaches, unstructured data storage, and on-the-fly analysis and transformation.

Kristen Kehrer
Kristen is a Developer Advocate at CometML. Since 2010, Kristen has been delivering innovative and actionable statistical modeling solutions in industry in the utilities, healthcare, and eCommerce. Kristen was a LinkedIn Top Voice – Data Science & Analytics in 2018. Previously Kristen was Faculty/SME at Emeritus Institute of Management and Creator of Data Moves Me, LLC. Kristen holds an MS in Applied Statistics from Worcester Polytechnic Institute and a BS in Mathematics.
Session Title: On the Scent: Detecting Dogs on Edge Devices With YOLOv8 and Comet
Abstract:
Proper tracking is crucial for ensuring the reproducibility of results obtained during model development and fostering effective collaboration among multiple developers on a machine learning project.In this talk, Kristen will discuss the process of developing a dog detection system using YOLOv8 on edge devices and the role of Comet, an experiment management platform, in handling the intricacies of the project.
Kristen will guide you through the entire process, from generating a data artifact to deploying the model, emphasizing the benefits of utilizing Comet at each stage. She will showcase how Comet was employed to monitor experiment metrics, visualize model performance, and illustrate the ease with which the selected model can be tracked in production. Participants will gain valuable insights on how to leverage an experiment tracking and monitoring solution like Comet to enhance their model development process, making it more transparent and reproducible.

Eric Vogelpohl
Eric Vogelpohl is the Managing Director of Tech Strategy at Blueprint. He’s a proven IT professional with more than 20 years of experience and a high degree of technical and business acumen. He has an insatiable passion for all-things-tech, pro-cloud/SaaS, leadership, learning, and sharing ideas on how technology can turn data into information & transform user experiences.
Session Title: Top 5 Cool Tricks of Delta for Data Scientists – Why Your Data Lake Should be a Delta Lake
Abstract:
In this 25-minute demo, we will explore the top 5 cool tricks of Delta for data scientists and discuss why your data lake should be a Delta Lake. Delta Lake is an open-source storage layer that brings reliability to data lakes by providing ACID transactions, scalable metadata handling, and data versioning. We will first introduce the concept of Delta Lake and explain how it helps data scientists to manage their data pipelines with ease. We will then dive into the top 5 cool tricks of Delta Lake, which include performance optimizations, time travel, schema enforcement, automatic data merging, and data validation. We will demonstrate these tricks using real-world examples and show how they can simplify your data pipeline and reduce your development time. By the end of this talk, you will have a better understanding of Delta Lake’s features and how it can help you to manage your data lake efficiently. You will also have learned about the benefits of using Delta Lake and why it’s a must-have for data scientists working with large data sets.

Bernard Kleynhans
Bernard is a Director in the AI Center of Excellence at Fidelity Investments working on personalization and recommender systems. His work is primarily concentrated in recommender systems and optimization, and he regularly presents on these topics, most recently at IJCAI’21 and CPAIOR’21 conferences. He is the lead developer of the open-source libraries Selective, MABWiser and Mab2Rec. He holds a MS in Computational Science and Engineering from Harvard University.
Mab2Rec: A Modular Approach to Building Bandit-based Recommenders(Lightning Talks)

Minsoo Thigpen
Minsoo is a Senior Product Manager at Microsoft Azure Machine Learning designing and building out Responsible AI tools for data scientists. She’s worked with OSS tools such as InterpretML, Fairlearn, Responsible AI Toolbox and contributed to the UX of the Responsible AI dashboard now released in Azure Machine Learning. She has bachelor’s degrees in Applied Mathematics and Painting from Brown University and Rhode Island School of Design (RISD). Coming from an interdisciplinary background with experience in building machine learning models and products, analyzing data, and designing UX, she is always finding work at the intersection of AI/ML, design, and social sciences to empower data and ML practitioners to work ethically and responsibly end-to-end.

Raghu Marwaha
Raghu Marwaha has been in the IT industry for the past 25 years. His recent interests in AI has led to his quest to find the best tools for various applications that could benefit from the recent advances in AI. As a director at IntraEdge overseas multiple IT projects, products and teams.
Session Title: Easy tools for Business Professionals to quickly build Ontologies, Data Taxonomies, and Keyword Lists.
Abstract:
Yet another meeting to discuss nuisance hairsplitting details for your data taxonomies and keywords list?
It shouldn’t take a team of domain experts, Excel specialists, Python developers and Data Scientists weeks or months to build it. It is a simple problem that requires a simple solution.
You should be able to quickly and accurately analyze contracts, customer comments and any other text-based content while easily building explainable NLP models.
Stop scrubbing through volumes of data to find key examples and then reducing the content to specific keywords and variations.
Join us as we explore a new and exciting solution using human language to easily develop ontologies, data taxonomies and keyword lists, which you can share across your business with just a few simple clicks.
Accelerate these NLP tasks in every project and help eliminate those long-drawn meetings to discuss keywords for data taxonomies. Unless you enjoy those nuisance meetings 🙂

Guillaume Moutier
Guillaume Moutier is a Sr. Principal Data Engineering Architect at Red Hat, focusing his work on data services, AI/ML workloads and data science platforms. Former Project Manager, Architect and CTO for large organizations, he is constantly looking for and promoting new and innovative solutions, always with a focus on usability and business alignment brought by 20 years of IT architecture and management experience. When he’s not tinkering with IT, electronics or other high tech toys, Guillaume plays music (guitar, bass, drums, keyboards), is a video games enthusiast, and a reading addict.
AI/ML, Edge Computing and 5G in Action: Anatomy of an Intelligent Agriculture Architecture!(Workshop)

Tom Corcoran
Tom Corcoran is a Principal Solution Architect at Red Hat.
Tom’s areas of specialty are Red Hat Application Services, AI-ML Ops, and Openshift. Having held this role with Red Hat for the past five years, Tom has managed projects throughout Europe, the US, and ANZ.
Tom has over 15 years of experience in Java Development in a variety of industry sectors and geo.s, which ensures deep technical expertise across Red Hat’s products and solutions spanning application and AI/ML workloads. His extensive experience lends a sharp focus to the solution architect role and he brings a passion for delivering outstanding technical and business value to Red Hat’s customers’ projects and products.

Mayank Kasturia
Mayank Kasturia is a Senior Sales Engineer at Precisely responsible for developing interactive demo applications, demonstrating proof of concepts (POC), and implementing solutions for customers using Precisely’s Geo Addressing, Spatial Analytics and Data Enrichment capabilities in big data and cloud-native environments.
Session Title: Leverage Geospatial Data and Machine Learning to Discover Hidden Insights – A Live Demo
Abstract:
Location intelligence can provide valuable insights by leveraging geospatial data in machine learning. This demonstration will showcase how machine learning and location information can work together to help organizations extract more value from their data. We will use a comprehensive suite of geocoding, spatial analytics, and data enrichment capabilities to visualize and analyze data, identify patterns, and derive insights that can be used to make informed business decisions.
In this session, we will use Amazon SageMaker to train a machine-learning model using property attributes, historical weather data, and fire data. The goal will be to predict the fire risk for a property. You will see how quickly and efficiently we can build and train a machine-
learning model using various algorithms, such as decision trees and neural networks, to find the best approach for our dataset. While this demo example will highlight fire predictions, these location intelligence solutions can be applied across multiple industries, including financial services, telecommunications, insurance, retail, real estate, and more. We look forward to discussing how you can leverage geospatial data in your machine-learning models.

Sean Martin
Sean’s experience covers multiple aspects of starting and growing a software company, including holding various titles from President through to co-lead dish washer. He continues in a leadership role as CTO and serves on the board.

Kaitlyn Abdo
Kaitlyn Abdo is an Associate Technical Marketing Manager, working on technical enablement surrounding the AI/ML products and services at Red Hat. She has been at Red Hat for 2 years, and is interested in discovering and learning about new and innovative solutions in the AI/ML space. In her free time, Kaitlyn enjoys building Legos, cooking and spending time with animals.

Vatsala Sarathy
A multi-faceted leader with in-depth experience in technology, marketing, finance, and operations, Vatsala strives to connect the dots between strategy and execution. She works at Stanford University’s Graduate School of Business as the Managing Director of Technology and Finance for Executive Education. As an ICF-credentialed leadership coach, she works with people–especially women executives, entrepreneurs, and youth–to uncover possibilities within themselves that even they are surprised by. She is an award-winning speaker and panel moderator, and has most recently spoken at the Women in Technology Conference, Argyle Forum, Dreamforce, and Stanford’s IT Unconference.
Data-curiosity: How to Create and Nurture a Data-curious Culture in your Organization(Career Talk)

Colleen Molloy Farrelly
Colleen M. Farrelly is a lead data scientist whose expertise spans generative AI, topological data analysis, network science, and NLP, among others. She’s recently focused her research on the geometry of generative AI models and how this impacts their performance on tasks such as bias detection, and her volunteer work includes mentoring African machine learning students. She and Dr. Yae Gaba are the authors of The Shape of Data, an overview of machine learning from a geometric perspective.

Matthew Dzugan
Matt is Director of Data Science with over a decade of experience solving complex business problems with data, modeling and simulation. Over the past year in his tenure at project44, Matt has been scaling the data science team from a few disparate efforts to a full department of 30 team members around the globe. The data science team at project44 uses the billions of shipments that are tracked through project44’s platform to extract insights that help customers made data-driven decisions: everything from “estimated time of delivery” to “impact of the latest disruptions”. Project44’s data science team uses state-of-the-art Machine Learning techniques to capture the dynamic trends and patterns of today’s supply chain. Despite the pandemic and global nature of Matt’s team – the data scientists at project44 routinely hold “virtual whiteboarding sessions” where they brainstorm, trade ideas about statistical techniques, and also discuss their latest Netflix favorites.

Allen Roush
Allen is a Principal Machine Learning Architect and AI Researcher working for Oracle Cloud Infrastructure.
Enabling MLOps at scale with Oracle Cloud(Talk)
Session Title: Open Source Generative AI: The Future of Game Asset Creation on Oracle Cloud
Abstract:
Oracle Cloud Infrastructure (OCI) is proud to showcase a new product demo of Stable Diffusion for game content creation using popular user interfaces and the 3D modeling tool Blender. OCI’s demo of Stable Diffusion is powered by NVIDIA A10 Tensor Core GPUs in the Oracle cloud. Stable Diffusion, an innovative deep learning model released in 2022, has been primarily used for generating detailed images based on text descriptions. However, its capabilities extend to creating game textures, models, depthmaps, skins, and other game content. Diffusion models can even be utilized for other modalities, enabling tasks as diverse as music generation. The combination of Stable Diffusion and Blender allows artists to create high-quality game assets with complete control over the creative process, while benefitting from quicker creative iterations. Artists can further train Stable Diffusion on their individual styles and develop complete workflows that allow for greater creative freedom and flexibility in game development.

Cassie Thompson
Cassie is a Senior Product Marketing Manager at CloudFactory, serving as the bridge between technical expertise and creative communications for their AI data labeling products. She holds an MBA from North Carolina State University and has spent her career in product marketing roles for B2B technology companies.

Mohammad Soltanieh-ha, PhD
Mohammad Soltanieh-ha is a Clinical Assistant Professor in the Information Systems department at Boston University’s Questrom School of Business. He specializes in data science programming, big data analytics, and business applications. He earned his Ph.D. in computational physics from Northeastern University and currently focuses his research on computer vision applications in cancer diagnosis, macroeconomic forecasting, and high-performance computing. Mohammad holds leadership roles at Google and the American Physical Society (APS). He founded APS’s data science unit in 2019 and serves as a Faculty Expert at Google Cloud, where he supports cloud computing education and best practices for fellow faculty members.
Google Cloud Big Data Essentials(Tutorial)

Matt Beale
Originally from Cambridge, Matt now helps clients move to a data centric ML approach having worked with clients across autonomous vehicles, green energy and fintech whilst providing meaningful work in the developing world. Away from work Matt has a passion for photography, traveling and unusual cars. In fact his passion for unusual cars bought him to import a Nissan Stagea from Japan to the UK.
Train and Sustain: Why data leaders need to pay attention to HITL(Talk)

Dr. Aaron Cheng
Aaron is currently the Vice President of Data Science and Solutions at dotData. As a data science practitioner with 14 years of research and industrial experience, he has held various leadership positions in spearheading new product development in the fields of data science and business intelligence. At dotData, Aaron leads the data science team in working directly with clients and solving their most challenging problems.
Prior to joining dotData, he was a Data Science Principle Manager with Accenture Digital, responsible for architecting data science solutions and delivering business values for the tech industry on the West Coast. He was instrumental in the strategic expansion of Accenture Digital’s footprint in the data science market in North America.
Aaron received his Ph.D. degree in Applied Physics from Northwestern University.
How Programmatic Feature Discovery Changes the Data Science Workflow(Track Keynote)

Roozbeh Davari
Roozbeh Davari is a highly experienced data scientist and technology leader with a diverse background in research, development, software engineering, and product management. He has a track record of developing and deploying innovative solutions that leverage AI and data science to solve complex business problems. He holds a Ph.D. in Astrophysics from the University of California, Riverside and Carnegie Observatories. Currently, he serves as the Director of Data Science at Aisera, a software company that provides an AI-driven service management platform. Prior to joining Aisera, he worked as a Data Scientist at The Honest Company and Happy Money, where he built predictive models and data analytics solutions for various business applications.
NLP for AIOPS: Leveraging Natural Language Processing to Automate and Optimize IT Operations(Lightning Talk)

Danica Fine
Danica Fine is a Senior Developer Advocate at Confluent where she helps others get the most out of Kafka and their event-driven pipelines. In her previous role as a software engineer on a streaming infrastructure team, she predominantly worked on Kafka Streams- and Kafka Connect-based projects to support computing financial market data at scale. She can be found on Twitter, tweeting about tech, plants, and baking @TheDanicaFine.
Practical Pipelines: A Houseplant Alerting System with ksqlDB(Talk)

Rebecca Vislay-Wade, PhD
Rebecca Vislay-Wade is a Principal Data Scientist at Moderna, where she leads a team of scientists developing AI applications for clinical operations, regulatory science, and pharmacovigilance. Prior to Moderna, she worked as Senior Research Data Scientist at Highmark Health. Rebecca holds a PhD in biochemistry from Harvard University and did postdoctoral work in neuroscience at the NIH and Children’s National Medical Center in Washington, DC. She currently lives in the Boston area with her family.
Data Science @ Moderna: Accelerating Regulatory Communication with Natural Language Processing(Talk)

Cal Al-Dhubaib
Cal Al-Dhubaib is a data scientist, entrepreneur, and innovator in responsible artificial intelligence, specializing in high-risk sectors such as healthcare, energy, and defense. He is the founder and CEO of Pandata, a consulting company that helps organizations to design and develop AI-driven solutions for complex business challenges. Their clients include globally recognized organizations like the Cleveland Clinic, Progressive Insurance, University Hospitals, and Parker Hannifin.
Cal frequently speaks on topics including AI ethics, change management, data literacy, and the unique challenges of implementing AI solutions in high-risk industries. His insights have been featured in numerous publications such as Forbes, Ohiox, the Marketing AI Institute, Open Data Science, and AI Business News. Cal has also received recognition among Crain’s Cleveland Notable Immigrant Leaders, Notable Entrepreneurs, and most recently, Notable Technology Executives.

Katie Roberts, PhD
Katie is a Data Science Solution Architect at Neo4j. She completed her degree in Cognitive Neuroscience at Harvard University. Passionate about people and problem solving, she transitioned to focusing on helping people and businesses leverage data for impactful outcomes. As a customer-facing data scientist, she has had the opportunity to work with large and small organizations across a variety of industries. At Neo4j she helps teams up-level their data science practice with graph data science.
Unlock Hidden Signals in Your Data with Graph Data Science(Talk)

Rajeev Prabhakar
Rajeev Prabhakar is a Machine Learning Platform Engineer at Lyft. Currently he is focused on building model observability at scale for a wide range of ML applications across teams at Lyft. Prior to Lyft, he worked at Quantcast on the ML platform team. Enabling distributed computing with spark and notebooks on k8s, building control systems for optimal spend budget allocation and optimising real time prediction latency in a low latency serving environment are some of the things he worked on.
Being well informed: Building a ML Model Observability Pipeline(Tutorial)

Elijah Meeks
Elijah Meeks is a co-founder and Chief Innovation Officer of Noteable, a startup focused on evolving how we analyze and communicate data. He is known for his pioneering work in the digital humanities while at Stanford, where he was the technical lead for acclaimed works like ORBIS and Kindred Britain. He was Netflix’s first Senior Data Visualization Engineer, and while at Netflix and Apple worked to develop the charting library Semiotic as well as bring cutting-edge data visualization techniques to analytical applications for stakeholders across the organization including A/B testing, conversation flows, algorithms, membership, people analytics, content, image testing and social media. He is a prolific writer, speaker and leader in the field of data visualization and the co-founder and first executive director of the Data Visualization Society.
The Future Is Notebooks(Talk)

Kshetrajna Raghavan
Kshetrajna is a Staff Data Scientist at Shopify working in the Merchant Services Org. Over the last 10 years of his career he has built and productionalized many ML models in various domains including retail, ad-tech and healthcare. His interests are mainly applied ML and ML systems and enjoys solving complex problems to help use machine learning at scale. Outside of work, Kshetrajna loves to spend time with his dogs, play music on his guitar, and is an avid gamer.
Product Classification with Structured Metadata for Online Retail(Talk)

Sydney Beckett
Sydney “Syd” became a graph enthusiast through her work with clients to build graph-based solutions as well as supporting data science teams during her time at Deloitte and Accenture. Now she uses her graph expertise, to help customers realize the value of graph technology for their organization. She also contributes by teaching Neo4j graph database and data science training classes. Syd’s hobbies include interior design and defeating her car navigation system’s estimated drive time.
Session Title: Unlocking the Value of Graph Data Science in the Age of AI
Abstract:
Thinking about incorporating relationships into your data to improve predictions and machine learning models? Maybe you are creating a knowledge graph or looking for a way to improve customer 360, fraud detection, or supply chain performance. Relationships are highly predictive of behavior. With graphs, they’re embedded in the data itself, making it easy to unlock and add predictive capabilities in your existing practices.
Join us for a demo to learn why graph databases are a top choice for scalable analytics, intelligent app development and advanced AI/ML pipelines. We’ll showcase graphs using Neo4j’s enterprise-ready graph data platform. You’ll see firsthand how easy it is to get started and we’ll highlight a graph use case using Neo4j’s cloud platform for Graph Data Science. All attendees will get a link to download and try Neo4j for free using your own data.

Andrew Cheesman
Andrew is the head of data science at Bigeye, a data observability company. Prior to joining Bigeye, Andrew built ML-powered tools for Citi and (as a consultant) a range of top consumer banks; he specialized in pricing and underwriting problems. In his free time, Andrew enjoys cooking, travel, and using his TVR Chimaera to escape New York.
Human-in-the-Loop: Strategies for Improving Time Series Anomaly Detection(Talk)
Session Title: Data Observability for Data Science Teams
Abstract:
When putting models into production it’s critical to know how they’re performing over time. As the last mile of the data pipeline, models can be impacted by a variety of issues, often outside the control of the data science team. “Observability” promises to help teams detect and prevent issues that could impact their models—but what is observability vs. data observability vs. ML observability? Get practical answers and recommendations from Kyle Kirwan, former product leader for Uber’s metadata tools, and founder of data observability company, Bigeye.

Afrah Shafquat, PhD
Afrah Shafquat is a Senior Data Scientist II at Medidata, a Dassault Systemès company where she leads synthetic data solutions in clinical trials. At Medidata, her work focuses on innovative solutions to generate synthetic data, synthetic data evaluation (fidelity and privacy metrics), and new use cases for synthetic data. She has a Ph.D. in Computational Biology from Cornell University and an S.B. in Biological Engineering from Massachusetts Institute of Technology.
Revolutionizing Healthcare with Synthetic Clinical Trial Data(Talk)

Zain Hasan
Zain Hasan is a Senior Developer Advocate at Weaviate, an open-source vector database. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto, building artificially intelligent assistive technologies for elderly patients. He then founded his company, developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients using data from their medical devices. More recently, he practised as a consultant senior data scientist in Toronto. He is passionate about the field of data science and machine learning and loves to share his love for the field with anyone interested in the domain.

Oluleye H Babatunde, Ph.D
Dr Hezekiah O Babatunde is a faculty at the University of Virginia’s College at Wise, VA, USA and a Machine Learning Consultant. He completed his PhD degree in Computer Science from the School of Computer and Security Science, Edith Cowan University, Perth, WA, Australia in 2015. He is a certified Big Data Consultant and Scientist from Arcitura Certification. He also holds a B.Sc degree in Mathematical Sciences (computer major) and three MSc degrees in Applied Mathematics, Computer Sciences and Organizational Leadership from FUNNAB, University of Ibadan and Charleston Southern University (USA) respectively. He worked as a Postdoctoral Research Associate in Systems Biology at Professor John Yin’s Laboratory at the university of Wisconsin, Madison, USA.
Annexing MATLAB Map-Reduce Capability for Big Data Analytics(Tutorial)

Drazen Dodik
Bio Coming Soon!
Session Title: Driving AI Forward: Continental Tire’s Journey to MLOps Excellence
Abstract:
In this session, we will hear from Continental Tire about their journey towards implementing MLOps since 2015. We will explore how they enable data scientists from diverse backgrounds to easily build models with the languages, frameworks, and tools they are comfortable with.
The session will delve into the challenges faced by Continental Tire’s data science teams, and the strategies they have used to address them. Additionally, the session will cover important considerations for those starting on their MLOps journey, including what to keep in mind when building infrastructure and workflows for data science projects.
The session will conclude with a demo and overview of the Valohai platform, which has been used by Continental Tire to streamline their MLOps workflows.

Lior Durahly
Lior Durahly is a data and ML engineer at Superwise, where he is responsible for researching and developing monitoring capabilities related to Responsible AI, including feature importance, fairness, and explainability. He is also the key contributor to the OSS package Elemeta, a meta-feature extractor for NLP and vision. Prior to Superwise, Lior held positions as a software and data engineer at APM observability leader Coralogix and a data science engineer at the Israeli Defense Forces 8200 intelligence unit. He is currently in his second year of achieving a BsC. in Computer Science (with a focus on Data Science) at the Open University of Israel. He’s also passionate about physics and medicine and how they intersect with artificial intelligence. In his free time, Lior studies violin, which is a new passion that he picked up only last year, or can be found hunting for eateries in Tel Aviv with Asian food or unique themes.
Session Title: Introducing Elemeta: OSS meta-feature extractor for NLP and vision
Abstract:
With DALLE and ChatGPT, we have reached incredible capabilities and results, fundamentally changing our ability to tap into and leverage unstructured data in machine learning. With that said, general architectural understanding and intuition into how these models make decisions do not translate into minute detail interpretability.
We’re at a crossroads. This new “”breed”” of ML applications is here to stay, and unstructured data is only growing, but they are black boxes, and black boxes fail silently. So how can we as practitioners leverage NLP and vision while enjoying similar monitoring, interpretability, and explainability available to their tabular counterparts?
In this talk, we will introduce Elemeta, our OSS meta-feature extractor library in Python, which applies a structured approach to unstructured data by extracting information from text and images to create enriched tabular representations. With Elemeta, practitioners can utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations and engineer alternative features to be utilized in simpler models such as decision trees.
In this talk, we’ll introduce you to Elemeta through a live notebook example and explain how it can be applied to text and images.

Hiro Kobashi
Hiro Kobashi is the head of Artificial Intelligence Division at Fujitsu Research of America where he leads a team of researchers both in the United States and Japan working on AutoML (Automation for Machine Learning) to realize sustainable and efficient AI creation. He joined Fujitsu in 2003 and has worked at Fujitsu research organizations both in Japan and United Kingdom. His research interests include artificial intelligence, machine learning, and distributed systems.
Session Title: Fujitsu AI Innovation Platform: Advanced AI Technologies Ready for Customer Adoption
Abstract:
In this session we will introduce Fujitsu’s unique and advanced AI technologies which are being demonstrated at the Fujitsu booth as part of Fujitsu AI Innovation Platform. The first is Actlyzer technology which can automatically sense human behavior, relationship between people and the environment and predict future actions via human and context sensing to support applications in many industries including retail, security, and manufacturing. Secondly, we will present Fujitsu’s Auto ML technology for structured data that creates high-quality ML models quickly with less data and limited resources and automatically generates production-ready ML Code accelerating the AI adoption by enterprises. Finally, we will present Galileo XAI solution jointly developed by Fujitsu and our partner Larus. Galileo XAI enables extraction of insights with built-in explain-ability from graphs, which are ubiquitous in todays connected world, leading to several practical applications including fraud detection, business process optimization, pandemic tracking, and threat analysis.

Anindya Saha
Anindya Saha is a Staff Machine Learning Platform Engineer @Lyft, focusing on distributed computing solutions for machine learning and data engineering. He led and implemented the Spark on Kubernetes support on ml platform for feature engineering at scale with ephemeral Spark clusters on k8s. He is currently working on enabling scalable distributed model training on the ML platform.
Being well informed: Building a ML Model Observability Pipeline(Tutorial)

Ido Michael
Ido Michael, a seasoned data engineering and science professional, co-founded Ploomber & JupySQL with the mission of empowering data scientists to build faster and more efficient solutions. Prior to this, he led data engineering and science teams at Amazon Web Services (AWS), where he played an instrumental role in building hundreds of data pipelines during various customer engagements, working closely with his team.
A proud alumnus of Columbia University, Ido moved to New York to pursue his Master’s degree in Computer Engineering. It was during his time at Columbia that he identified the challenges in working with multiple data sources and Jupyter notebooks for reliable model development. This realization inspired him to concentrate on building Ploomber, a platform designed to address these issues and streamline the data science workflow.
SQL driven ML(Talk)

Jon Malloy
Jon Malloy is a Data Strategist at Snowplow where he is responsible for helping customers get the most value from their pipeline and derive meaningful insights. Prior to join Snowplow, Jon spent 4 years as a Technical Analyst in the US health care communications industry and 4 years as a Data Scientist in the US health care communications / finance industry. He holds as Master of Science in Business Analytics from Bentley University and resides in Boston, MA.
Allow Data Scientists to Seize the Means of Production(Talk)

Adam Ross Nelson
Dr. Adam Ross Nelson, is a career coach and a data science consultant. As a career coach he helps others enter and level up in data related professions. As a data science consultant he provides research, data science, machine learning, and data governance services. Previously, he was the inaugural data scientist at The Common Application which provides undergraduate college application platforms for institutions around the world. He holds a PhD from The University of Wisconsin – Madison in Educational Leadership & Policy Analysis. Adam is also formerly an attorney with a history of working in higher education, teaching all ages, and working as an educational administrator. Adam sees it as important for him to focus time, energy, and attention on projects that may promote access, equity, and integrity in the field of data science. This commitment means he strives to find ways for his work to challenge system oppression, injustice, and inequity.
For Data’s Rising Stars: How Individual Contributor Data Science Pros Can Amplify Their Impacts(Career Talk)

Alison Cossette
Alison Cossette is a dynamic Data Science Strategist, Educator, and Podcast Host. As a Developer Advocate at Neo4j specializing in Graph Data Science, she brings a wealth of expertise to the field. With her strong technical background and exceptional communication skills, Alison bridges the gap between complex data science concepts and practical applications.
Alison’s passion for responsible AI shines through in her work. She actively promotes ethical and transparent AI practices and believes in the transformative potential of responsible AI for industries and society. Through her engagements with industry professionals, policymakers, and the public, she advocates for the responsible development and deployment of AI technologies.
Alison’s academic journey includes pursuing her Master of Science in Data Science program, specializing in Artificial Intelligence, at Northwestern University and research with Stanford University Human-Computer Interaction Crowd Research Collective. Alison combines academic knowledge with real-world experience. She leverages this expertise to educate and empower individuals and organizations in the field of data science.
Overall, Alison Cossette’s multifaceted background, commitment to responsible AI, and expertise in data science make her a respected figure in the field. Through her role as a Developer Advocate at Neo4j and her podcast, she continues to drive innovation, education, and responsible practices in the exciting realm of data science and AI.
Bridging the Gap: Light Code Solutions to Uniting Social Science and Modern Knowledge Graphs(Workshop)

Andreas Spanner
Andreas is leading the Cloud Strategy & Transformation topics for Red Hat across Australia & New Zealand. His hands-on experience in startups as well as large scale enterprise transformation programs has given Andreas a solid understanding of business drivers and value creation. Andreas has worked on a wide range of initiatives across different industries in Europe, North America and APAC including full-scale ERP migrations, HR, finance and accounting, manufacturing, supply chain logistics transformations and scalable core banking strategies to support regional business growth strategies. Since joining Red Hat in 2015, Andreas is focussed on helping Red Hat customers to build the necessary capabilities and to make the best-fit technology, methodology and architecture choices to be a successful digital competitor. Andreas is part of Red Hat’s global #redhatchiefs network and works closely with the CTO office on emerging technologies related engineering topics. Andreas got his first Commodore 64 when he was 12 years old and started to work as a software developer in 1996 with Krauss-Maffei in Munich building full mission simulators. Andreas holds an Engineering degree from the University of Ravensburg, Germany.

Shan Chidambaram
Shan is a data analytics, AI, and management consulting practitioner with over 20 years of experience in solving complex business problems for clients through innovative technology solutions. Shan joined Fujitsu in 2017 to grow and lead the data analytics business at Fujitsu North America. In his current role as the Head of AI offerings, Shan is responsible for shaping the AI go-to-market strategy, and product offerings, and for promoting AI adoption amongst Fujitsu’s clients. Shan has published thought leadership articles/whitepapers for the Fujitsu Global blog on technology and industry topics such as AI Enabled Trusted Society, AI and Advanced Analytics, Mobility Industry, and Smart Cities. Shan lives in Dallas, Texas, and is an avid NBA fan and motorcyclist.

Alexander Antony, Ph.D.
Alex Antony is a senior staff data scientist at GE Aerospace where he leads modeling, analytic development, reporting, and forecasting for the market intelligence function. He has 10 years of experience in the data science field and 15 years of experience working with the Department of Defense. He holds a MS in Applied Statistics and a PhD from Indiana University where he focused on Computational and Quantitative Social Science.
Can You Forecast the Next Two Weeks? How about the Next 20 Years?: Digital Transformation in Market Forecasting at GE Aerospace(Lightning Talk)

Carol Willing
Carol Willing is the VP of Engineering at Noteable, a three-time Python Steering Council member, a Python Core Developer, PSF Fellow, and a Project Jupyter core contributor. In 2019, she was awarded the Frank Willison Award for technical and community contributions to Python. As part of the Jupyter core team, Carol was awarded the 2017 ACM Software System Award for Project Jupyter’s lasting influence. She’s also a leader in open science and open-source governance serving on Quansight Labs Advisory Board and the CZI Open Science Advisory Board. She’s driven to make open science accessible through open tools and learning materials.
The Future Is Notebooks(Talk)

Max Cembalest
Max Cembalest is a researcher at Arthur focused on simplifying and explaining machine learning models. Previously, he received an M.S. in Data Science from Harvard University, where he concentrated on interpretability and graph-based models. He is particularly excited about recent advances in applying abstract algebra, topology, and category theory to neural network design.
Reckoning with the Disagreement Problem: Post-Hoc Explanation Agreement as a Training Objective(Talk)

Viraj Parekh
Bio Coming Soon!
Why Orchestration and Airflow is the secret ingredient in MLOps(Lightning Talk)

Aishwarya Naresh Reganti
Bio Coming Soon!
Building Robust Graph Embeddings for Massive Real World Graphs(Talk)

Amber Roberts
Amber Roberts is a ML Growth Lead at Arize AI, a ML observability company built for maintaining models in production. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile.
Troubleshooting Large Language Models in Production with Embeddings and Evals(Talk)

Kenny Dahill
Kenny’s experience in startup product management and marketing has led him to joining the Gram X team. He became interested in the AI industry due to its future potential impact on every industry and believes that the biggest successful companies will be so due to their leveraging of AI. Fun fact: Kenny has been to 43 states.
Session Title: Easy tools for Business Professionals to quickly build Ontologies, Data Taxonomies, and Keyword Lists.
Abstract:
Yet another meeting to discuss nuisance hairsplitting details for your data taxonomies and keywords list?
It shouldn’t take a team of domain experts, Excel specialists, Python developers and Data Scientists weeks or months to build it. It is a simple problem that requires a simple solution.
You should be able to quickly and accurately analyze contracts, customer comments and any other text-based content while easily building explainable NLP models.
Stop scrubbing through volumes of data to find key examples and then reducing the content to specific keywords and variations.
Join us as we explore a new and exciting solution using human language to easily develop ontologies, data taxonomies and keyword lists, which you can share across your business with just a few simple clicks.
Accelerate these NLP tasks in every project and help eliminate those long-drawn meetings to discuss keywords for data taxonomies. Unless you enjoy those nuisance meetings 🙂

Oren Netzer
Oren is a serial entrepreneur with 23 years of experience. In 2007, Oren founded DoubleVerify (NYSE: DV), which pioneered the ad verification category and grew to become a global leader in advertising measurement and analytics. In 2012, as CEO of DoubleVerify, Oren received the distinguished Technology Pioneers Award from the World Economic Forum in Davos. Later, Oren went on to start cClearly, an advertising optimization company, before starting his most recent company, DataHeroes.
Session Title: For Better ML Models, Use Less Data…
Abstract:
As companies collect larger and larger amounts of data and apply more complex ML models, the time and resources required to build and maintain the models continues to grow. In fact, in the past decade, training compute time has been growing at a staggering pace of 10x per year! But do we really need so much data to build better models?
In this talk, we will walk you through DataHeroes’ Python-based framework that uses a unique data reduction methodology to reduce your dataset size by orders of magnitude while maintaining the statistical properties and corner cases of the full dataset. We will demonstrate how having the reduced dataset makes it significantly easier and faster to clean the data and train and tune the model, which produces a better and more accurate model, at a fraction of the time and cost.

Eitan Netzer
Eitan has 10 Years of experience as a data scientist and has taught data science and machine learning at the Technion, Israel’s Institute of Technology. Prior to his data science career, Eitan served as a Systems Engineer in the Israeli Military Forces 8200 Unit.
Session Title: For Better ML Models, Use Less Data…
Abstract:
As companies collect larger and larger amounts of data and apply more complex ML models, the time and resources required to build and maintain the models continues to grow. In fact, in the past decade, training compute time has been growing at a staggering pace of 10x per year! But do we really need so much data to build better models?
In this talk, we will walk you through DataHeroes’ Python-based framework that uses a unique data reduction methodology to reduce your dataset size by orders of magnitude while maintaining the statistical properties and corner cases of the full dataset. We will demonstrate how having the reduced dataset makes it significantly easier and faster to clean the data and train and tune the model, which produces a better and more accurate model, at a fraction of the time and cost.

Ruth Yakubu
Ruth Yakubu is a Principal Cloud Advocate at Microsoft. Ruth specializes in Java, Advanced Analytics, Data Platforms and Artificial Intelligence (AI).
In addition, she’s been a tech speaker at several conferences like Microsoft Ignite, O’reilly velocity, Devoxx UK, Grace Hopper Dublin, TechSummit, Websummit and numerous other developer conferences. Prior to Microsoft, She has also worked for great companies like UNISYS, ACCENTURE and DIRECTV over the years where she gained a lot of experience with software architectural design and programming. She’s awarded Dzone.com’s Most Valued Blogger.

Bhaktipriya Radharapu
Bhakti is a Responsible AI Tech Lead at Google Research, where she develops fair, safe, and robust AI systems. She has spearheaded numerous projects at Google, including YouTube, Maps, Android, and Ads, making significant advancements to ensure that ML in these applications is fair, transparent, and safe for all. She is also a strong supporter of open-source technology and is the maintainer of several offerings in the TF Responsible AI toolkit, used globally by developers in the industry to make their ML workflows more responsible.
The 10 Ways Machine Learning Systems Can Fail and How to Avoid Them(Talk)

Alex Sherstinsky
Alex Sherstinsky is a staff machine learning and data products engineer on the team developing the core platform of Great Expectations, the leading open source data quality platform. Previously, Alex developed augmented intelligence systems that harness machine learning and gig work models to transform and scale customer service at Directly, Inc. He was a product and technical co-founder at GrowthHackers.com and Qualaroo, and a product/engineering executive at other venture capital-backed startups. Alex earned his Ph.D. in machine learning from MIT, with research conducted at the Media Lab. His scientific publications appear in refereed journals and conference proceedings; he holds 5 U.S. patents.

Haritha Yanam
Haritha Yanam is the Director of Data science, Innovation at Liberty Mutual Insurance, where she focuses on building AI/ML solutions which help in mitigating Insurance Risk. Before joining Liberty Haritha worked at several fortune 500 companies leading data science and data engineering teams. Haritha comes with a strong Data Science/Machine Learning, Data Engineering & Data Analytics background. She enjoys teaching and currently works as an adjunct professor(part-time) at the University of Maryland Baltimore teaching data science.
Let’s Build Explainable AI!!(Lightning Talk)

Peter Hunt
Pete joined Elementl as head of engineering in early 2022, and took over the reins as CEO in November of that year. Pete was previously co-founder and CEO of Smyte, an anti-abuse provider that was acquired by Twitter. Prior to his Pete led Instagram’s web team, built Instagram’s business analytics products, and helped to open source Facebook’s React.js.

Christy King
Bio Coming Soon!
Exploring Pharma Industry Voices to Drive Insights(Women Ignite)

Aneeta Xavier
Bio Coming Soon!
From Analyst to Authority: Developing Early Career Leadership Skills(Women Ignite)

Sergey Yurgenson
Sergey is a data scientist with a background in physics and neurobiology. FeatureByte is Sergey’s second startup. He was one of the first employees at DataRobot where he created and led a professional services group and helped the company grow into a unicorn. Sergey is widely known for being a Kaggle Grandmaster and holding the #1 rank on Kaggle in the past. Multiple times he was mentioned as one of the top data scientists by various publications. Sergey’s passion is in machine learning, predictive modeling and inventive feature engineering.
Integrating Language Models for Automating Feature Engineering Ideation(Talk)

Razi Raziuddin
Razi Raziuddin is the Co-Founder and CEO of FeatureByte. His analytics and growth experience spans the leadership team of two unicorn startups. Razi helped scale DataRobot from 10 to 850 employees in under 6 years. He pioneered a services-led go-to-market strategy that became the hallmark of DataRobot’s rapid growth. At Netezza, an IBM company, I ran Product Marketing and Regional Sales, bringing the most successful Data Warehousing appliance to the global market. These experiences led him to co-found FeatureByte to tackle a problem that’s been an Achilles Heel for data science teams since the very early days of Enterprise AI.
Session Title: Key to Scaling AI: Self-Service Data Environment for Data Scientists
Abstract:
Great AI starts with great features. While the modern data stack has made self-service ingestion and consumption a reality for BI, AI data remains a huge challenge. Feature engineering is non-standard, ML pipelines are manual, and data governance is a nightmare –
limiting the scalability you can achieve with AI. We will discuss the shortcomings of the modern data stack for AI, and practical approaches for creating a self-service data environment for data scientists. Learn about strategies to accelerate feature engineering and experimentation, shorten the time to deploy feature pipelines, and govern the data and infrastructure. And ultimately, to truly scale AI in your organization.

Youssef Idelcaid
Youssef Idelcaid is an applied mathematics engineer by training, he is the head of CMG Data Science at Genentech, renowned for his pioneering work in leveraging machine learning to tackle complex health equity and customer engagement challenges. With a proven track record of success across various industries, Youssef Idelcaid has held key positions at Levi Strauss & Co as the global head of AI and director of digital products, where he created the company’s first AI-powered Live Streaming Demand platform. Youssef Idelcaid’s career began in the food industry, where he served as a data analyst at Danone before joining L’Oreal Research & Innovation as a scientific computing manager, later becoming the director of the brand’s technology incubator in the US. At L’Oreal, Youssef Idelcaid led the development of the company’s first ML-powered formulation assistant for chemists and initiated with other scientists L’Oreal’s connected and augmented beauty initiatives.
Session Title: Key to Scaling AI: Self-Service Data Environment for Data Scientists
Abstract:
Great AI starts with great features. While the modern data stack has made self-service ingestion and consumption a reality for BI, AI data remains a huge challenge. Feature engineering is non-standard, ML pipelines are manual, and data governance is a nightmare –
limiting the scalability you can achieve with AI. We will discuss the shortcomings of the modern data stack for AI, and practical approaches for creating a self-service data environment for data scientists. Learn about strategies to accelerate feature engineering and experimentation, shorten the time to deploy feature pipelines, and govern the data and infrastructure. And ultimately, to truly scale AI in your organization.
ODSC EAST 2024 - April 23-25th
REGISTER your interestParticipate at ODSC East 2024
As part of the global data science community we value inclusivity, diversity, and fairness in the pursuit of knowledge and learning. We seek to deliver a conference agenda, speaker program, and attendee participation that moves the global data science community forward with these shared goals. Learn more on our code of conduct, speaker submissions, or speaker committee pages.
ODSC Newsletter
Stay current with the latest news and updates in open source data science. In addition, we’ll inform you about our many upcoming Virtual and in person events in Boston, NYC, Sao Paulo, San Francisco, and London. And keep a lookout for special discount codes, only available to our newsletter subscribers!