
ODSC hosts a fantastic lineup of some of the best and brightest expert speakers and core contributors to data science
Register Now & Save 40%Training Sessions
Workshops
Speakers
Hours of Content

Pedro Domingos, PhD
Pedro Domingos is a professor emeritus of computer science and engineering at the University of Washington and the author of The Master Algorithm. He is a winner of the SIGKDD Innovation Award and the IJCAI John McCarthy Award, two of the highest honors in data science and AI. He is a Fellow of the AAAS and AAAI, and has received an NSF CAREER Award, a Sloan Fellowship, a Fulbright Scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. Pedro received an undergraduate degree (1988) and M.S. in Electrical Engineering and Computer Science (1992) from IST, in Lisbon, and an M.S. (1994) and Ph.D. (1997) in Information and Computer Science from the University of California at Irvine. He is the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on the program committees of AAAI, ICML, IJCAI, KDD, NIPS, SIGMOD, UAI, WWW, and others. I’ve written for the Wall Street Journal, Spectator, Scientific American, Wired, and others. He helped start the fields of statistical relational AI, data stream mining, adversarial learning, machine learning for information integration, and influence maximization in social networks.
Secrets of Successful AI Projects(Keynote)

Raluca Ada Popa, PhD
Raluca Ada Popa is the Robert E. and Beverly A. Brooks associate professor of computer science at UC Berkeley working in computer security, systems, and applied cryptography. She is a co-founder and co-director of the RISELab and SkyLab at UC Berkeley, as well as a co-founder of Opaque Systems and PreVeil, two cybersecurity companies. Raluca has received her PhD in computer science as well as her Masters and two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of the 2021 ACM Grace Murray Hopper Award, a Sloan Foundation Fellowship award, Jay Lepreau Best Paper Award at OSDI 2021, Distinguished Paper Award at IEEE Euro S&P 2022, Jim and Donna Gray Excellence in Undergraduate Teaching Award, NSF Career Award, Technology Review 35 Innovators under 35, Microsoft Faculty Fellowship, and a George M. Sprowls Award for best MIT CS doctoral thesis.
Confidential Data Computing and Collaboration for Data Scientists(Keynote)

Eve Psalti
Eve Psalti is 20+year tech and business leader, currently the Senior Director at Microsoft’s Azure AI engineering organization responsible for scaling & commercializing artificial intelligence solutions. She was previously the Head of Strategic Platforms at Google Cloud where she worked with F500 companies helping them grow their businesses through digital transformation initiatives. Prior to Google, Eve held business development, sales and marketing leadership positions at Microsoft and startups across the US and Europe leading 200-people teams and $600M businesses. A native of Greece, she holds a Master’s degree and several technology and business certifications from London Business School and the University of Washington. Eve currently serves on the board of WE Global Studios, a full-stack startup innovation studio supporting female entrepreneurs.

Dr. Jon Krohn
Jon Krohn is Chief Data Scientist at the machine learning company untapt. He authored the book Deep Learning Illustrated, which was released by Addison-Wesley in 2019 and became an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy, as well as online via O’Reilly, YouTube, and his A4N podcast on A.I. news. Jon holds a doctorate in neuroscience from Oxford and has been publishing on machine learning in leading academic journals since 2010.

Jeff Clune, PhD
Jeff Clune is an Associate Professor of Computer Science at the University of British Columbia and a Faculty Member at the Vector Institute.
Previously, he was a Research Team Leader at OpenAI. Before that he was a Senior Research Manager and founding member of Uber AI Labs, which was formed after Uber acquired a startup our startup. Prior to Uber, he was the Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming.
He conducts research in three related areas of machine learning (and combinations thereof):
– Deep Learning: Improving our understanding of deep neural networks, harnessing them in novel applications, and advancing deep reinforcement learning
– Evolving Neural Networks: Investigating open questions in evolutionary biology regarding how intelligence evolved and harnessing those discoveries to improve our ability to evolve more complex, intelligent neural networks
– Robotics: Making robots more like animals in being adaptable and resilient
A good way to learn about Jeff’s research is by visiting Google Scholar page, which lists all of his publications.

Tamilla Triantoro, PhD
Tamilla Triantoro is an Associate Professor of Computer Information Systems at Quinnipiac University and a leader of the Masters Program in Business Analytics. She was previously an Academic Director of Data Analytics at the University of Connecticut. Dr. Triantoro is an author, speaker, researcher, and educator in the fields of artificial intelligence, data analytics, user experience with technology, and the future of work. She received her Ph.D. from the City University of New York where she researched online user behavior. Dr. Triantoro presents her research around the world, attempting to demystify the complexity of today’s digital world and to make it understandable and relevant to business professionals and the general audience.
Graph Viz: Exploring, Analyzing and Visualizing Graphs and Networks with Gephi and ChatGPT(Workshop)

Sheamus McGovern
Sheamus McGovern is the founder of ODSC (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Irina Rish, PhD
Irina Rish is a Full Professor in the Computer Science and Operations Research Department at the Université de Montréal (UdeM) and a core faculty member of MILA – Quebec AI Institute. She holds Canada Excellence Research Chair (CERC) in Autonomous AI and a Canadian Institute for Advanced Research (CIFAR) Canada AI Chair. She received her MSc and PhD in AI from University of California, Irvine and MSc in Applied Mathematics from Moscow Gubkin Institute. Dr. Rish’s research focus is on machine learning, neural data analysis and neuroscience-inspired AI. Before joining UdeM and MILA in 2019, Irina was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She received multiple IBM awards, including IBM Eminence & Excellence Award and IBM Outstanding Innovation Award in 2018, IBM Outstanding Technical Achievement Award in 2017, and IBM Research Accomplishment Award in 2009. Dr. Rish holds 64 patents, has published over 80 research papers in peer-reviewed conferences and journals, several book chapters, three edited books, and a monograph on Sparse Modeling.

Aric LaBarr, PhD
A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first Master of Science in Analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
Advanced Fraud Modeling & Anomaly Detection with Python & R part 1(Training)
Advanced Fraud Modeling & Anomaly Detection with Python & R part 2(Training)

Stefanie Molin
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Joe Dery, PhD
Joe Dery joined Western Governors University’s College of IT as the VP & Dean of Data Analytics in summer, 2022. At WGU, Joe is working to help more than 3,000 current analytics students learn how to effect change in their professional roles – surgically balancing a combination of mathematics, data management, programming, and business influence skills. Prior to joining academia full-time, Joe spent much of his corporate career working for EMC – and later, Dell Technologies – where he joined as a “hands-on-keyboard” Data Scientist in 2011. Joe went on to hold leadership positions in Dell’s Sales, Finance, and Supply Chain organizations driving efforts in Data Science, Business Intelligence, Digital Strategy, and Digital Transformation. Across these domains, Joe’s efforts touched a wide variety of business problems, including ML-driven sales quota allocations, sales forecasting & opportunity prioritization, customer cross-sell/whitespace targeting, addressable marketing opportunity sizing, sales territory optimization, supply chain planning optimization, data/analytics literacy training, and self-service BI. Building from his experiences, Joe is often invited to speak on the crucial role of decision intelligence frameworks, change management, and “improv” in bringing analytics solutions to life. Joe holds a Ph.D in Business Analytics & an M.S. in Marketing Analytics, both from Bentley University.
Unlock the Power of Data Science for Real Change: A Blueprint for Decision Intelligence(Track Keynote)

Dan Roth, PhD
Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, a VP/Distinguished Scientist at Amazon AWS, and a Fellow of the AAAS, the ACM, AAAI, and the ACL.
In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.”
Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Until February 2017 Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was acquired by Reveal in 2020. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.

Matt Harrison
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Machine Learning with XGBoost(Workshop)
Idiomatic Pandas(Workshop)

Julia Lintern
Julia Lintern currently works as a Director of Data Science at Gartner. Previously, she worked as a Data Scientist for the New York Times. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods and discovered a deep appreciation for the combination of mathematics and visualizations. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.
Introduction to Machine Learning(Bootcamp)

Julien Simon
Julien is currently Chief Evangelist at Hugging Face. He’s recently spent 6 years at Amazon Web Services where he was the Global Technical Evangelist for AI & Machine Learning. Prior to joining AWS, Julien served for 10 years as CTO/VP Engineering in large-scale startups.
Hyper-productive NLP with Hugging Face Transformers(Workshop)

Chandra Khatri
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
Self-Supervised and Unsupervised Learning for Conversational AI and NLP(Workshop)

Daniel Gerlanc
Daniel Gerlanc has worked as a data scientist for more than decade and been writing software for nearly 20 years. He frequently teaches live trainings on oreilly.com and is the author of the video course Programming with Data: Python and Pandas. He has coauthored several open source R packages, published in peer-reviewed journals, and is a graduate of Williams College.
Programming with Data: Python and Pandas(Bootcamp)

Thomas J. Fan
Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.
Introduction to scikit-learn: Machine Learning in Python (Training)

Jacob Andreas, PhD
Jacob Andreas is the X Consortium Assistant Professor at MIT. His research aims to build intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. As a researcher at Microsoft Semantic Machines, he founded the language generation team and helped develop core pieces of the technology that powers conversational interaction in Microsoft Outlook. He has been the recipient of Samsung’s AI Researcher of the Year award, MIT’s Kolokotrones teaching award, and paper awards at NAACL and ICML.
Interpreting Features in Deep Networks(Tutorial)

Dr. Hongxia Yang, PhD
Dr. Hongxia Yang, PhD from Duke University, led the team to develop AI open sourced platforms and systems such as AliGraph, M6, Luoxi. Dr. Yang has published nearly 100 top conference and journal papers, and held more than 20 patents. She has been awarded the highest prize of the 2019 World Artificial Intelligence Conference, Super AI Leader (SAIL Award), the second prize of the 2020 National Science and Technology Progress Award (China’s Top tech award), the first prize of Science and Technology Progress of the Chinese Institute of Electronics in 2021, and the Forbes China Top 50 Women in Science and Technology in 2022. She used to work as the Senior Staff Data Scientist and Director in Alibaba Group, Principal Data Scientist at Yahoo! Inc and Research Staff Member at IBM T.J. Watson Research Center, joint adjunct professor at Zhejiang University Shanghai Advanced Research Institute respectively.
Towards the Next Generation of Artificial Intelligence with its Applications in Practice(Talk)

Noah Giansiracusa, PhD
Noah Giansiracusa (PhD in math from Brown University) is a tenured associate professor of mathematics and data science at Bentley University, a business school near Boston. His research interests range from algebraic geometry to machine learning to empirical legal studies. After publishing the book How Algorithms Create and Prevent Fake News in July 2021, Noah has gotten more involved in public writing and policy discussions concerning data-driven algorithms and their role in society. He’s written op-eds for Barron’s, Boston Globe, Wired, Slate, and Fast Company and is currently working on a second book, Robin Hood Math: How to Fight Back When the World Treats You Like a Number, with a Foreword by Nobel Prize-winning economist Paul Romer.
Deepfakes: How’re They Made, Detected, and How They Impact Society(Tutorial)

Bill Franks
Bill Franks is the Director of the Center for Statistics and Analytical Research at Kennesaw State University. He is also Chief Analytics Officer for The International Institute For Analytics (IIA) and serves on several corporate advisory boards. Franks is also the author of the books Winning The Room, Taming The Big Data Tidal Wave, The Analytics Revolution, and 97 Things About Ethics Everyone In Data Science Should Know. He is a sought after speaker and frequent blogger who has over the years been ranked a top global big data influencer, a top global artificial intelligence and big data influencer, a top AI influencer, and was an inaugural inductee into the Analytics Hall of Fame. His work, including several years as Chief Analytics Officer for Teradata (NYSE: TDC), has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.
Winning The Room: Creating And Delivering An Effective Data-Driven Presentation(Business Talk)

Eric Eager, PhD
Eric Eager is the VP of Research and Development at SumerSports, a football analytics startup founded by Paul Tudor Jones and Jack Jones. Prior to joining Sumer, he held similar roles at Pro Football Focus, and is responsible for many of the insights that have grown the game of American football to this day. Eric holds a PhD in Mathematical Biology from the University of Nebraska, and has taught at Wharton, DataCamp and the University of Wisconsin – La Crosse, publishing over 25 academic papers during his career.
Using Data Science to Better Evaluate American Football Players(Talk)

Pradeep Ravikumar, PhD
Pradeep Ravikumar is a Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously an Associate Director at the Center for Big Data Analytics, at the University of Texas at Austin. His thesis has received honorable mentions in the ACM SIGKDD Dissertation award and the CMU School of Computer Science Distinguished Dissertation award. He is a Sloan Fellow, a Siebel Scholar, a recipient of the NSF CAREER Award, and was Program Chair for the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2013. He is Associate Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and action editor for the Machine Learning journal, and the Journal of Machine Learning Research.
Dr. Ravikumar’s research group at CMU works on the foundations of statistical machine learning, with recent focus on “next generation” machine learning systems, that are explainable, robust to train and test time corruptions, and resilient to distribution shifts, and are learnt under resource constraints by leveraging or discovering various notions of “structure” and domain knowledge.

Ariel Procaccia, PhD
Ariel Procaccia is Gordon McKay Professor of Computer Science at Harvard University. He works on a broad and dynamic set of problems related to AI, algorithms, economics, and society. He has helped create systems and platforms that are widely used to solve everyday fair division problems, resettle refugees, mitigate bias in peer review and select citizens’ assemblies. To make his research accessible to the public, he regularly writes opinion and exposition pieces for publications such as the Washington Post, Bloomberg, Wired and Scientific American. His distinctions include the Social Choice and Welfare Prize (2020), Guggenheim Fellowship (2018), IJCAI Computers and Thought Award (2015) and Sloan Research Fellowship (2015).

David P. Woodruff, PhD
David Woodruff is a professor at Carnegie Mellon University in the Computer Science Department. Before that he was a research scientist at the IBM Almaden Research Center, which he joined in 2007 after completing his Ph.D. at MIT in theoretical computer science. His research interests include data stream algorithms, distributed algorithms, machine learning, numerical linear algebra, optimization, sketching, and sparse recovery. He is the recipient of the 2020 Simons Investigator Award, the 2014 Presburger Award, and Best Paper Awards at STOC 2013, PODS 2010, and PODS, 2020. At IBM he was a member of the Academy of Technology and a Master Inventor.

Jordan Boyd-Graber, PhD
Jordan is an associate professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, iSchool, and Language Science Center. Previously, he was an assistant professor at Colorado’s Department of Computer Science (tenure granted in 2017). He was a graduate student at Princeton with David Blei.
His research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.
If We Want AI to be Interpretable, We Need to Measure Interpretability(Talk)

Moran Beladev
Moran is a machine learning manager at booking.com, researching and developing computer vision and NLP models for the tourism domain. Moran is a Ph.D candidate in information systems engineering at Ben Gurion University, researching NLP aspects in temporal graphs. Previously worked as a Data Science Team Leader at Diagnostic Robotics, building ML solutions for the medical domain and NLP algorithms to extract clinical entities from medical visit summaries.
Leverage Reviews Data for Multi Label Topics Classification in Booking.com(Talk)

Panos Alexopoulos, PhD
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book “Semantic Modeling for Data – Avoiding Pitfalls and Breaking Dilemmas” (O’Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.
Mastering Adversarial Evaluation for NLP: A Practical Workshop(Workshop)

Leonardo De Marchi
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks. He now works in Thomson Reuters as VP of Labs, and also provides consultancy and training for small and large companies. His previous experience includes being Head of Data Science and Analytics in Bumble, the largest dating site with over 500 million users, heading the team through acquisition and an IPO.
Creative AI(Training)
NLP Fundamentals(Training)

Ali Rossi
Ali Rossi is a Data Science Tech Lead at Foursquare, working closely with their first-party foot traffic panel to deliver insights against a broad range of client business questions. She is passionate about consumer behavioral data, with experience building consumer panels, researching normalization methodologies, and developing methods to derive actionable insights. Previously, she worked in product management at Foursquare, Amazon and Nielsen, mainly focused on building analytics products using consumer-sourced data. She studied chemistry and mathematics at the University of Connecticut and is currently pursuing a Master of Science in computer science at the Georgia Institute of Technology.
Uncovering Behavioral Segments by Applying Unsupervised Learning to Location Data(Talk)

Freddy Boulton
Freddy Boulton started his career as a data scientist for Nielsen where he built predictive models of television viewing behavior to make television ratings more accurate. This gave him a first hand-view of one of the biggest challenges faced by industry data scientists – being able to easily communicate and share machine learning models with stakeholders. He is currently solving that problem by working on Gradio, an open-source python library that lets data scientists create fully interactive demos of machine learning models with just a few lines of code.
A Practical Tutorial on Building Machine Learning Demos with Gradio(Workshop)

Moez Ali
Innovator, Technologist, and a Data Scientist turned Product Manager with proven track record of building and scaling data products, platforms, and communities. Experienced in building and leading teams of data scientists, data engineers, and product managers. Strongly opinionated tech visionary and a thought partner to C-level leadership.
Moez Ali is an inventor and creator of PyCaret. PyCaret is an open-source, low-code, machine learning software. Ranked in top 1%, 8M+ downloads, 7K+ GitHub stars, 100+ contributors, and 1000+ citations.
Globally recognized personality for open-source work on PyCaret. Keynote speaker and top ten most-read writer in the field of artificial intelligence. Teaching AI and ML courses at Cornell, NY and Queens University, CA. Currently building world’s first hyper-focused Data and ML Platform.
Automate Machine Learning Workflows with PyCaret 3.0(Workshop)

Tomasz Adamusiak, MD, PhD
Tomasz Adamusiak MD Ph.D. is a Chief Scientist in the Clinical Insights & Innovation Cell at MITRE. He leads a multi-disciplinary group driving high-impact contributions to private and public sectors in Clinical and Genomic Data Science. Before MITRE, Tomasz was the Head of Data Science in the Pfizer Innovation Research (PfIRe) Lab. His team was responsible for developing novel digital endpoints, designing decentralized approaches for clinical trials, and applying AI/machine learning methods to generate novel insights from clinical data. Tomasz served in leadership and advisory roles in the American Medical Informatics Association, the SNOMED International, and the Epic Research Data Network.
Unlocking the Potential of Protein Prediction in Drug Discovery(Business Talk)

Timo Walther
Timo Walther is a Principal Software Engineer at Confluent and a long-time member of Apache Flink’s management committee. He studied Computer Science at TU Berlin and was part of the Database Group there – the origins of Apache Flink. He worked as a software engineer at DataArtisans and led SQL team at Ververica. He was a Co-Founder of Immerok which was acquired by Confluent in 2023. In Flink, he is working on various topics in the Table & SQL ecosystem to make stream processing accessible for everyone.

Tom Shafer, PhD
Tom Shafer works as a Lead Data Scientist at Elder Research, a recognized leader in data science, machine learning, and artificial intelligence consulting since its founding in 1995. As a lead scientist, Tom contributes technically to a wide variety of projects across the company, mentors data scientists, and helps to direct the company’s technical vision. His current interests focus on Bayesian modeling, interpretable ML, and data science workflow. Before joining Elder Research, Tom completed a PhD in Physics at the University of North Carolina, modeling nuclear radioactive decays using high-performance computing.
Beyond Credit Scoring: Hybrid Scorecard Models for Accuracy and Interpretability(Talk)

Meg Kurdziolek, PhD
Meg is currently the Lead UXR for Intrinsic.ai, where she focuses her work on making it easier for engineers to adopt and automate with industrial robotics. She is a “Xoogler”, and prior to Intrinsic worked on the Explainable AI services on Google Cloud. Meg has had a varied career working for start-ups and large corporations alike, and she has published on topics such as user research, information visualization, educational-technology design, voice user interface (VUI) design, explainable AI (XAI), and human-robot interaction (HRI). Meg is also a proud alumnus of Virginia Tech, where she received her Ph.D. in Human-Computer Interaction.

Nikolay Manchev, PhD
Nikolay is an experienced Data Science professional who currently leads the EMEA Data Science team at Domino Data Lab. He holds an MSc in Software Technologies, an MSc in Data Science, and is currently undertaking postgraduate research at King’s College London. His area of expertise is Statistics, Mathematics, and Data Science in general, and his research interests are in Neural Networks with emphasis on biological plausibility. He writes articles and blogs regularly and speaks at various European conferences (ODSC, Big Data Spain, Strata, Big Data London etc.) to build awareness about data science and artificial intelligence. He is also the organizer of the London Data Science and Machine Learning meetup and recipient of several technical mastery awards like the Oracle ACE Award and the IBM Outstanding Technical Achievement Award.

Jesse Johnson
Jesse Johnson is Vice President of Data Science and Data Engineering at Dewpoint Therapeutics, a drug development Biotech startup founded in 2019 around a scientific field called biomolecular condensates. In this role, Jesse’s diverse set of experiences from academic math departments, engineering teams at Google, and data science teams at large, medium and small life science companies provide a unique perspective on the ways that data and wet lab teams communicate differently, or sometimes don’t communicate at all.
Development Principles for Biotech Data Teams(Business Talk)

Iryna Gurevych, PhD
Iryna Gurevych (PhD 2003, U. Duisburg-Essen, Germany) is professor of Computer Science and director of the Ubiquitous Knowledge Processing (UKP) Lab at the Technical University (TU) of Darmstadt in Germany. Her main research interests are in machine learning for large-scale language understanding and text semantics. Iryna’s work has received numerous awards. Examples are the ACL fellow award 2020 and the first Hessian LOEWE Distinguished Chair award (2,5 mil. Euro) in 2021. Iryna is co-director of the NLP program within ELLIS, a European network of excellence in machine learning. She is currently the president of the Association of Computational Linguistics. In 2022, she received an ERC Advanced Grant to support her vision for the next big step in NLP “InterText – Modeling Text as a Living Object in a Cross-Document Context”.
SQuARE: Towards Multi-Domain and Few-Shot Collaborating Question Answering Agents(Talk)

Haritz Puerto
Haritz Puerto is a Ph.D. candidate in Machine Learning & Natural Language Processing at UKP Lab in TU Darmstadt, supervised by Prof. Iryna Gurevych. His main research interests are reasoning for Question Answering and Graph Neural Networks. Previously, he worked at the Coleridge Initiative, where he co-organized the Kaggle Competition Show US the Data. He got his master’s degree from the School of Computing at KAIST, where he was a research assistant at IR&NLP Lab and was advised by Prof. Sung-Hyon Myaeng.
SQuARE: Towards Multi-Domain and Few-Shot Collaborating Question Answering Agents(Talk)

Daniel Whitenack, PhD
Daniel Whitenack (aka Data Dan) is a Ph.D. trained data scientist working with SIL International on NLP and speech technology for local languages in emerging markets. He has more than ten years of experience developing and deploying machine learning systems at scale. Daniel co-hosts the Practical AI podcast, has spoken at conferences around the world (Applied Machine Learning Days, O’Reilly AI, QCon AI, GopherCon, KubeCon, and more), and occasionally teaches data science/analytics at Purdue University.
Modern NLP: Pre-training, Fine-tuning, Prompt Engineering, and Human Feedback(Workshop)

Dean Pleban
Dean has a background combining physics and computer science. He’s worked on quantum optics and communication, computer vision, software development, and design. He’s currently CEO at DagsHub, where he builds products that enable data scientists to work together and get their models to production, using popular open-source tools.
He’s also the host of the MLOps Podcast, where he speaks with industry experts about ML in production.

David Talby, PhD
David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise.
He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK.
David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards and GameChangers Awards in 2022.

Arun Verma, PhD
Arun heads the Bloomberg Quantitative Research Solutions Team. Arun’s work initially focused on Stochastic Volatility Models for Derivatives & Exotics pricing/hedging and more generally around asset pricing using traditional quantitative finance methods. More recently, he has enjoyed working at the intersection of diverse areas such as data science, innovative quantitative finance models and using AI/Machine Learning methods to help reveal embedded signals in traditional & alternative data such as Company Financials, ESG, News/Social, Supply Chain, Geolocational & Extreme Weather and their potential impact on capital markets. Most recently in an attempt to complete a full circle, he has been exploring use of ML methods in asset pricing , e.g. Derivatives pricing and illiquid instrument pricing.
Prior to joining Bloomberg, he earned his Ph.D from Cornell University in the areas of computer science and applied mathematics and a B. Tech in Computer Science from IIT Delhi, India. Arun is also an editorial board member of The Journal of Financial Data Science.
Machine Learning Models for Quantitative Finance and Trading(Talk)

Akash Tandon
Akash Tandon is co-founder and CTO of Looppanel where he builds software to help product teams record, store and analyze user research data. He is a co-author of Advanced Analytics with PySpark, published by O’Reilly. Previously, Akash worked as a senior data engineer at Atlan, SocialCops and RedCarpet where he built data infrastructure for enterprise, government and finance use-cases. He has also been a participant and mentor in the Google Summer of Code program with the R Project for Statistical Computing.
From Big Data to NLP insights: Getting started with PySpark and Spark NLP(Workshop)

Daniel Lenton, PhD
Daniel Lenton is the creator of Ivy, which is an open-source framework with an ambitious mission to unify all other ML frameworks. Prior to starting Ivy, Daniel was a PhD student at Imperial College London, where he published research in the areas of machine learning, robotics and computer vision.
Unifying ML With One Line of Code(Tutorial)

Brian Lucena, PhD
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
Advanced Gradient Boosting (I): Fundamentals, Interpretability, and Categorical Structure(Training)
Advanced Gradient Boosting (II): Calibration, Probabilistic Regression and Conformal Prediction(Training)

Christina Qi
Christina Qi is the CEO of Databento, an on-demand market data platform. She formerly founded Domeyard LP, a hedge fund focused on high frequency trading (HFT) that traded up to $7.1 billion USD per day. Failing to earn a job offer after a Wall Street internship, Christina started Domeyard from her dorm room with $1000 in savings, about 9 years ago. Her fund was a tiny minnow amongst the tigers of the hedge fund world, but after Michael Lewis’s Flash Boys came out in 2014 and HFT firms hid from the spotlight, Domeyard accidentally found itself in the center of the ring. Over the next decade, her company’s story was featured on the front page of Forbes and Nikkei, and quoted in the Wall Street Journal, Bloomberg, CNN, NBC, and the Financial Times as a result of the controversy and fascination with HFT. By a series of accidents, Christina became a voice in her industry, contributing to the World Economic Forum’s research on AI in finance, guest lecturing at dozens of universities, and teaching Domeyard’s case study at Harvard Business School. She is grateful to be able to open up about her mistakes, and to help people turn failures into opportunities.
No amount of therapy has quashed Christina’s impostor syndrome, but she will always be proud of her non-profit volunteer work. Christina was elected as a Member of the MIT Corporation, MIT’s Board of Trustees. She is Co-Chair of the Board of Invest in Girls, bringing financial literacy education to underserved populations across the US. Christina also sits on the Board of Directors of The Financial Executives Alliance (FEA) Hedge Fund Group, drives entrepreneurship efforts at the MIT Sloan Boston Alumni Association (MIT SBAA), and served on the U.S. Non-Profit Boards Committee of 100 Women in Finance. Although “X Under X” lists are a gimmick, she’ll admit that Forbes 30 Under 30 made a positive impact on her life by giving her a community – friends who dragged her out of bed during the lowest days of her life. Christina holds a Bachelor of Science in Management Science from MIT and is a CAIA Charterholder.
When Robots Beat Humans: How ChatGPT is Changing the Financial Industry(Business Talk)

Allen Downey, PhD
Allen Downey is a Staff Scientist at DrivenData and professor emeritus at Olin College. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. His blog, Probably Overthinking It, features articles about Bayesian statistics. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.
Causation, Collision, and Confusion: Avoiding the most dangerous error in statistics(Talk)

Tendü Yoğurtçu, PhD
Tendü Yoğurtçu, Ph.D., is the Chief Technology Officer (CTO) at Precisely. In this role, she directs the company’s technology strategy and innovation, leading all product research, and development programs.
Prior to becoming Chief Technology Officer, Tendü served as General Manager of Big Data for Syncsort, the precursor to Precisely, leading the global software business for Data Integration, Hadoop, and Cloud. She previously held several engineering leadership roles at the company, directing the development of the Integrate family of products.
Tendü has over 25 years of software industry experience, with a focus on Big Data and Cloud technologies. She has also spent time in academics, working as a Computer Science Adjunct Faculty Member at Stevens Institute of Technology.
In 2019, Tendü was named CTO of the Year at the prestigious Women in IT Awards, and in 2018 was recognized as an Outstanding Executive in Technology by Advancing Women in Technology (AWT).
Tendü received her Ph.D. in Computer Science from Stevens Institute of Technology, NJ, a Master of Science in Industrial Engineering, and a B.S. in Computer Engineering from Bosphorus University in Istanbul.
Power trusted AI/ML Outcomes with Data Integrity(Business Talk)

Florian Jacta
Florian Jacta is a specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop a production-ready AI application. Package pre-sales and after-sales functions. He is data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS. He developed several Predictive Models as part of strategic AI projects. Also, Florian got his master’s degree in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.
How to build stunning Data Science Web applications in Python – Taipy Tutorial(Workshop)
Demo Talk Session Title: Turning your Data/AI Algorithms into full web apps in no time with Taipy
Abstract:
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio, Streamlit, Dash, etc.
Taipy Core fills a void in the standard Python back-end stack.

Albert Vu
Albert has skills in machine learning and big data to solve (financial) optimization problems. He developed projects of different skill levels for Taipy’s tutorial videos. He got his degree from McGill University – Bachelor of Science. Major in Computer Science & Statistics. Minor in Finance.
How to build stunning Data Science Web applications in Python – Taipy Tutorial(Workshop)
Demo Talk Session Title: Turning your Data/AI Algorithms into full web apps in no time with Taipy
Abstract:
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio, Streamlit, Dash, etc.
Taipy Core fills a void in the standard Python back-end stack.

Madhav Thaker
Madhav is a Senior Data Scientist at Shopify where he focuses on building/evaluating recommendation systems. His role includes prototyping potential solutions and scaling them for production. Prior to Shopify, Madhav was a data science consultant where he focused on NLP projects for pharmaceutical companies. He then transitioned to Disney to develop personalized movie recommendations which sparked his passion for recommendation systems. In his free time, Madhav hosts free Q&A sessions for aspiring data scientists who are looking to get into this space.
Generating Content-based Recommendations for Millions of Merchants and Products(Talk)

Arvind Neelakantan, PhD
Arvind Neelakantan is a Research Lead and Manager at OpenAI working on deep learning research for real-world applications. He got his PhD from UMass Amherst where he was also a Google PhD Fellow. His work has received best paper awards at NeurIPS and at Automated Knowledge Base Construction workshop.
Text and Code Embeddings(Talk)

Matt Bezdek, PhD
Matt Bezdek is a Senior Data Scientist at Elder Research. In his work, he empowers commercial clients to make better business decisions, with expertise in machine learning, forecast modeling, natural language processing, and visualization. He has a PhD in Cognitive Psychology from Stony Brook University and has conducted neuroimaging research at Georgia Tech and Washington University in St. Louis.
Topic Modeling using pre-trained large language model embeddings(Talk)

Chen Karako-Argaman
Chen is a Senior Data Science Manager at Shopify, where she leads the Discovery Experience data team. Chen has focused on building search and discovery products using machine learning techniques, experimenting and running A/B tests to improve and measure feature impact, and collaborating with cross-disciplinary teams. She enjoys building high impact data science teams, and providing technical and strategic leadership. Aside from day to day work, Chen is also interested in fairness in AI and has published research in this domain. Prior to joining Shopify, Chen obtained an M.Sc. in astrophysics from McGill University, where she discovered 30 radio pulsars by developing signal processing algorithms for telescope data.
Generating Content-based Recommendations for Millions of Merchants and Products(Talk)

Nils Reimers
Nils Reimers is an expert on search relevance using pre-trained transformer network. In 2018, he authored and open-sourced the popular sentence-transformers library, which is the most popular framework to design semantic search applications. Recently, he joined cohere.ai as director of machine learning to lead the Search-as-a-Service team to develop new state-of-the-art neural search models and to make them broadly accessible as API endpoints.
Semantic Search(Talk)

Jonas Mueller
Jonas Mueller is Chief Scientist and Co-Founder at Cleanlab, a software company providing data-centric AI tools to efficiently improve ML datasets. Previously, he was a senior scientist at Amazon Web Services developing AutoML and Deep Learning algorithms which now power ML applications at hundreds of the world’s largest companies. In 2018, he completed his PhD in Machine Learning at MIT, also doing research in NLP, Statistics, and Computational Biology.
Jonas has published over 30 papers in top ML and Data Science venues (NeurIPS, ICML, ICLR, AAAI, JASA, Annals of Statistics, etc). This research has been featured in Wired, VentureBeat, Technology Review, World Economic Forum, and other media. He has also contributed open-source software, including the fastest-growing open-source libraries for AutoML (https://github.com/awslabs/autogluon) and Data-Centric AI (https://github.com/cleanlab/cleanlab).
Improving ML Datasets with Cleanlab, a Standard Framework for Data-Centric AI(Talk)

Tejaswini Pedapati
Tejaswini Pedapati works at IBM Research. Her research is focused on interpretability and automating deep learning. To that end, she was involved in developing tools and algorithms to provide these capabilities for IBM products. She has a masters’ degree from Columbia University.
Introduction to AutoML: Hyperparameter Optimization and Neural Architecture Search(Tutorial)

Dan Shiebler
As the Head of Machine Learning at Abnormal Security, Dan builds cybercrime detection algorithms to keep people and businesses safe. Before joining Abnormal Dan worked at Twitter: first as an ML researcher working on recommendation systems, and then as the head of web ads machine learning. Before Twitter Dan built smartphone sensor algorithms at TrueMotion and Computer Vision systems at the Serre Lab.

Ahmed Alaa, PhD
Ahmed Alaa is an Assistant Professor of Computational Precision Health at UC Berkeley and UCSF, with affiliations in the EECS and Statistics departments at UC Berkeley. Previously, he was a postdoctoral associate at Massachusetts Institute of Technology (MIT CSAIL and IMES) and the Broad Institute of MIT and Harvard University. He was also a joint postdoctoral scholar at Cambridge University, Cambridge Center for AI in Medicine and the University of California, Los Angeles (UCLA). He obtained his Ph.D. in Electrical and Computer Engineering from UCLA, where he received the 2021 Edward K. Rice Outstanding Doctoral Student Award from the UCLA Samueli School of Engineering. His research interests include machine learning for healthcare, computer vision for medical imaging, clinical informatics, statistics, and causal inference.
Synthetic Data in Healthcare: Methods, Challenges, and Use Cases(Talk)

Connor Shorten, PhD
Connor Shorten is a Research Scientist at Weaviate, an Open-Source Vector Search Database. Connor has had a role in the development of Ref2Vec, Hybrid Search, Generative Search, Weaviate’s Pipe API, and Re-Ranking. Connor has also hosted 34 episodes of the Weaviate podcast featuring guests from OpenAI, Cohere, You.com, MosaicML, Jina AI, Deepset, Neural Magic and many others! Connor also co-hosts Weaviate meetups in Boston and New York City! Prior to Weaviate, Connor has earned a Ph.D. in Computer Science from Florida Atlantic University. Connor’s Ph.D. was primarily focusing on Data Augmentation in Deep Learning and Applications of Deep Learning for COVID-19. Connor’s publication “A survey on image data augmentation in deep learning” has achieved over 5,000 citations.
Building Recommendation Systems(Workshop)

Emily Curtin
Emily is a Staff MLOps Engineer at Intuit Mailchimp, meaning she gets paid to say “it depends” and “well actually.” Professionally she leads a crazy good team focused on helping Data Scientists do higher quality work faster and more intuitively. Non-professionally she paints huge landscapes and hurricanes in oils, crushes sweet V1s (as long as they’re not too crimpy), rides her bike, reads a lot, and bothers her cats. She lives in Atlanta, GA, which is inarguably the best city in the world, with her husband Ryan who’s a pretty darn cool guy.

Elliott Cordo
Elliott is an expert in data engineering, data warehousing, information management, and technology innovation with a passion for helping transform data into powerful information. He has more than a decade of experience implementing cutting-edge, data-driven applications. He has a passion for helping organizations understand the true potential in their data by working as a leader, architect, and hands-on contributor.
Elliott has built nearly a dozen cloud-native data platforms on AWS, ranging from data warehouses and data lakes, to real-time activation platforms in companies ranging from small startups to large enterprises.

Nick Singh
Nick Singh is an Ex-Facebook & Google Engineer turned best-selling author of Ace the Data Science Interview, and founder of SQL Interview Platform DataLemur.com. His career advice on LinkedIn has earned him 100,000 followers, and he’s successfully career coached 578 people to land their dream job in data!
Ace the Data Job Hunt(Career Talk)
Ace the Data Science Interview with Nick Singh(Career Workshop)

Han Wang
Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the creator of the Fugue project, aiming at democratizing distributed computing and machine learning.
Why Dataframe is not Always the Best Option for Distributed Computing(Talk)

Andrew Zaldivar
Bio Coming Soon!
The Data Cards Playbook: A Toolkit for Transparency in Dataset Documentation(Tutorial)

Mahima Pushkarna
Bio Coming Soon!
The Data Cards Playbook: A Toolkit for Transparency in Dataset Documentation(Tutorial)

Avi Pfeffer, PhD
Dr. Avi Pfeffer is Chief Scientist at Charles River Analytics. Dr. Pfeffer is a leading researcher on a variety of computational intelligence techniques including probabilistic reasoning, machine learning, and computational game theory. Dr. Pfeffer has developed numerous innovative probabilistic representation and reasoning frameworks, such as probabilistic programming, which enables the development of probabilistic models using the full power of programming languages, and statistical relational learning, which provides the ability to combine probabilistic and relational reasoning. He is the lead developer of Charles River Analytics’ Figaro™ probabilistic programming language. As an Associate Professor at Harvard, he developed IBAL, the first general-purpose probabilistic programming language. While at Harvard, he also produced systems for representing, reasoning about, and learning the beliefs, preferences, and decision making strategies of people in strategic situations. Prior to joining Harvard, he invented object-oriented Bayesian networks and probabilistic relational models, which form the foundation of the field of statistical relational learning. Dr. Pfeffer serves as Action Editor of the Journal of Machine Learning Research and served as Associate Editor of Artificial Intelligence Journal and as Program Chair of the Conference on Uncertainty in Artificial Intelligence. He has published many journal and conference articles and is the author of a text on probabilistic programming. Dr. Pfeffer received his Ph.D. in computer science from Stanford University and his B.A. in computer science from the University of California, Berkeley.

Swagata Ashwani
Swagata is a Data Professional with over 6 years experience in Healthcare, Retail and Platform Integration industry. She is an avid blogger and writes about state of the art developments in the AI space. She is particularly interested in Natural Language Processing, and focuses on researching how to make NLP models work in practical setting. In her spare time, she loves to play her guitar, sip masala chai and find new spots for doing Yoga. Connect with her here – https://www.linkedin.com/in/swagata-ashwani/
Creating a Custom Vocabulary for NLP Tasks Using exBERT and spaCY(Tutorial)

Frank DeFalco
Frank DeFalco is the Director of Epidemiology Analytics at Janssen Research and Development where he architects software solutions and data platforms for the analysis and application of observational data sources. He is currently the leader and Benevolent Dictator of the OHDSI open source architecture working group. Frank is a presenter and panelist at OHDSI symposiums and has served as faculty for OHDSI symposium tutorials classes on architecture and common data model vocabulary. In addition to leading the OHDSI Architecture working group Frank initiated development of a standardized platform for observational analytics known as ATLAS. He is an active contributor to the open source software repositories developed and released by OHDSI including ATLAS, WebAPI, Achilles, Circe, Arachne, Visualizations, Hermes, Helios and others. Frank’s areas of expertise include computation epidemiology, large scale data platforms, software development and architecture, data visualization and informatics. Prior to joining Janssen Research and Development, Frank held the position of Senior Principal and Director of Collaboration and Analytics at British Telecom where he was a strategic advisor for multiple Fortune 100 companies across sectors including Consumer Products, Telecommunications and Pharmaceuticals. Frank received his undergraduate degrees in Computer Science and Psychology at Rutgers University.”
Patient Level Prediction with Supervised Learning Models in Federated Data Networks(Tutorial)

James Demmel, PhD
James Demmel is the Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and Mathematics at the University of California at Berkeley, and former Chair of the EECS Dept. He also serves as Chief Strategy Officer for the start-up HPC-AI Tech, whose goal is to make large-scale machine learning much more efficient, with little programming effort required by users. Demmel’s research is in high performance computing, numerical linear algebra, and communication avoiding algorithms. He is known for his work on the widely used LAPACK and ScaLAPACK linear algebra libraries. He is a member of the National Academy of Sciences, National Academy of Engineering, and American Academy of Arts and Sciences; a Fellow of the AAAS, ACM, AMS, IEEE and SIAM; and winner of the IPDPS Charles Babbage Award, IEEE Computer Society Sidney Fernbach Award, the ACM Paris Kanellakis Award, the J. H. Wilkinson Prize in Numerical Analysis and Scientific Computing, and numerous best paper prizes.
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training(Tutorial)

Yang You, PhD
Yang You is a Presidential Young Professor at National University of Singapore. He is on an early career track at NUS for exceptional young academic talents with great potential to excel. He received his PhD in Computer Science from UC Berkeley. His advisor is Prof. James Demmel, who was the former chair of the Computer Science Division and EECS Department. Yang You’s research interests include Parallel/Distributed Algorithms, High Performance Computing, and Machine Learning. The focus of his current research is scaling up deep neural networks training on distributed systems or supercomputers. In 2017, his team broke the world record of ImageNet training speed, which was covered by the technology media like NSF, ScienceDaily, Science NewsLine, and i-programmer. In 2019, his team broke the world record of BERT training speed. The BERT training techniques have been used by many tech giants like Google, Microsoft, and NVIDIA. Yang You’s LARS and LAMB optimizers are available in industry benchmark MLPerf. He is a winner of IPDPS 2015 Best Paper Award (0.8%), ICPP 2018 Best Paper Award (0.3%) and ACM/IEEE George Michael HPC Fellowship. Yang You is a Siebel Scholar and a winner of Lotfi A. Zadeh Prize. Yang You was nominated by UC Berkeley for ACM Doctoral Dissertation Award (2 out of 81 Berkeley EECS PhD students graduated in 2020). He also made Forbes 30 Under 30 Asia list (2021) and won IEEE CS TCHPC Early Career Researchers Award for Excellence in High Performance Computing. For more information, please check his lab’s homepage at https://ai.comp.nus.edu.sg/
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training(Tutorial)

Isaac Slavitt
Isaac is a co-founder and Principal Data Scientist at DrivenData, Inc, where he leads client engagements and spearheads development of the data science competition platform. He holds a master’s in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences and a BS in Operations Research from the U.S. Coast Guard Academy, and previously spent seven years as a Coast Guard officer serving in a variety of operational and quantitative roles.
Data Science Hiring is Broken—How Can We Fix It?(Business Talk)

Bonny P McClain
Bonny is a geospatial analyst and self described human geographer and social anthropologist. Exploring geographic properties that capture complex interactions, dynamic shifts in ecosystem balance and how activities influence eco-geomorphic conceptual frameworks across a wide variety of environments are the topics of popular public talks and panel discussions.
The ability to apply advanced data analytics, including data engineering and geo-enrichment, to poverty, race, and gender discussions targets judgments about structural determinants, racial equity, and elements of intersectionality to illuminate the confluence of metrics contributing to poverty.
Bonny is the author of the books Python for Geospatial Data Analysis: Theory, Tools, and Practice for Location Intelligence (publisher, O’Reilly Media) and Geospatial Analysis with SQL: A hands on guide to performing geospatial analysis by unlocking the syntax of spatial SQL published by Packt Press. Current projects include a new book in progress with Locate Press, Geospatial Data Science & the Art of Storytelling.

Benjamin Batorsky, PhD
Ben is a Senior Data Scientist at the Institute for Experiential AI at Northeastern University. He obtained his Masters in Public Health (MPH) from Johns Hopkins and his PhD in Policy Analysis from the Pardee RAND Graduate School. Since 2014, he has been working in data science for government, academia and the private sector. His major focus has been on Natural Language Processing (NLP) technology and applications. Throughout his career, he has pursued opportunities to contribute to the larger data science community. He has presented his work at conferences, published articles, taught courses in data science and NLP, and is co-organizer of the Boston chapter of PyData. He also contributes to volunteer projects applying data science tools for public good.
Bagging to BERT – A Tour of Applied NLP(Workshop)

Andras Zsom, PhD
Andras Zsom is an Assistant Professor of the Practice and Director of Graduate Studies at the Data Science Initiative at Brown University, Providence, RI. He is teaching two mandatory courses in the data science master’s program, and helps the students navigate through their studies and curriculum. He also supervises interns on various research projects related to missing data, interpretability, and developing machine learning pipelines.

Ayush Patel
Ayush is the co-founder of TwelveFold, an AI start-up studio, where he manages a portfolio of MLOps and Generative AI companies with entrepreneurs. He also works as the CEO of Censius, an AI Observability platform that helps to optimize AI models' real-world performance. As a seasoned professional, he has closely worked with customers across industry verticals, AI teams, and research projects to build reliable and compliant AI solutions to solve everyday business problems and scale models at production.
Why do AI Models go Rogue? A Guide to Detect and Fix Silent Model Failures(Business Talk)

Bob Foreman
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
Relational Dataset Analytics for Clear Customer Insights(Workshop)

Alexandra Ebert
Alexandra Ebert is a Responsible AI, synthetic data & privacy expert and serves as Chief Trust Officer at MOSTLY AI. As a member of the company’s senior leadership team, she is engaged in public policy issues in the emerging field of synthetic data and Ethical AI and is responsible for engaging with the privacy community, with regulators, the media, and with customers. She regularly speaks at international conferences on AI, privacy, and digital banking and hosts The Data Democratization Podcast, where she discusses emerging digital policy trends as well as Responsible AI and privacy best practices with regulators, policy experts and senior executives.
Apart from her work at MOSTLY AI, she serves as the chair of the IEEE Synthetic Data IC expert group and was pleased to be invited to join the group of AI experts for the #humanAIze initiative, which aims to make AI more inclusive and accessible to everyone.
Before joining the company, she researched GDPR’s impact on the deployment of artificial intelligence in Europe and its economic, societal, and technological consequences. Besides being an advocate for privacy protection, Alexandra is deeply passionate about Ethical AI and ensuring the fair and responsible use of machine learning algorithms. She is the co-author of an ICLR paper and a popular blog series on fairness in AI and fair synthetic data, which was featured in Forbes, IEEE Spectrum, and by distinguished AI expert Andrew Ng.

Philip Wauters
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Learn how to Efficiently Build and Operationalize Time Series Models in 2023(Workshop)
Demo Talk Session Title: Customer Success Manager and Value engineer
Abstract: Modeling time series data is difficult due to its large quantities and constantly evolving nature. Existing techniques have limitations in scalability, agility, explainability, and accuracy. Despite 50 years of research, current techniques often fall short when applied to time series data. The Tangent Information Modeler (TIM) offers a game-changing approach with efficient and effective feature engineering based on Information Geometry. This multivariate modeling co-pilot can handle a wider range of time series use cases with award-winning results and incredible performance.
During this demo session we will showcase how best-in-class and very transparent time series models can be built with just one iteration through the data. We will cover several concrete use cases for advanced time series forecasting, anomaly detection and root cause analysis.

Mihir Mathur
Mihir Mathur is the lead Product Manager for Machine Learning at Lyft, where he works on building ML/AI tools that power Lyft’s automated intelligent decisions across realtime pricing, ETAs, fraud detection, safety classification etc. In the past Mihir has worked on building delightful products for millions of users at Quora, Houzz, and Thomson Reuters and spoken about his work at conferences such as MLOps World and ODSC. Mihir graduated magna cum laude from UCLA with a Bachelor’s and Master’s in Computer Science.
Powering Millions of Real-time Decisions with Distributed Model Serving(Talk)

Melanie Veale, PhD
Melanie Veale, Ph.D. is a recovering Astrophysicist, currently working as a Data Solutions Architect at Anomalo. Her Ph.D. research on galaxy dynamics introduced her to statistical and computational python, as well as other languages and tools like C++, Fortran, IDL, R, bash, SLURM, and others. She has also dabbled in AWS infrastructure, Kubernetes, Docker, Spark, Ray, Dask and more as a Field Engineer and Field Data Scientist at Domino Data Lab, helping analytics and machine learning teams modernize their collaboration and deployment workflows. Nowadays she is a troubleshooting enthusiast anywhere on the Data, Analytics, and MLOps tech stacks, and enjoys melding her passions for crisp technical communication, good visualizations, and first-principles thinking into helping organizations get the most out of their data.

Robert Blanchard
Robert is a Principal Data Scientist at SAS where he builds end-to-end artificial intelligence applications. He also researches, consults, and teaches machine learning with an emphasis on deep learning and computer vision for SAS. Robert has authored an introductory book on computer vision and has written several professional courses on topics including neural networks, deep learning, and optimization modeling. Before joining SAS, Robert worked under the Senior Vice Provost at North Carolina State University, where he built models pertaining to student success, faculty development, and resource management. Prior to working in academia, Robert was a member of the research and development group on the Workforce Optimization team at Travelers Insurance. His models at Travelers focused on forecasting and optimizing resources. Robert graduated with a master’s degree in Business Analytics and Project Management from the University of Connecticut and a master’s degree in Applied and Resource Economics from East Carolina University.
Building Computer Vision Models and Optimizing Hyperparameters using PyTorch and SAS Viya(Workshop)

Ari Zitin
Ari Zitin holds bachelor’s degrees in both physics and mathematics from UNC-Chapel Hill. His research focused on collecting and analyzing low energy physics data to better understand the neutrino. Ari taught introductory and advanced physics and scientific programming courses at UC-Berkeley while working on a master’s in physics with a focus on nonlinear dynamics. While at SAS, Ari has worked to develop courses that teach how to use Python code to control SAS analytical procedures.
Building Computer Vision Models and Optimizing Hyperparameters using PyTorch and SAS Viya(Workshop)

Yuval Fernbach
Yuval Fernbach is the Co-founder & CTO of Qwak, where he is focused on building next-generation ML Infrastructure for ML teams of various sizes. Before Qwak, Yuval was an ML Specialist at AWS , where he helped AWS Customers across EMEA with their ML challenges. Previous to that, he was the CTO of the IT department of the IDF (“Mamram”).

Zairah Mustahsan
Zairah is a Data Scientist at you.com, the AI search engine, where she leverages her expertise in statistical and machine-learning techniques to build analytics and experimentation platforms. She recently spoke at NeurIPS 2022 and shared her expertise on data-driven decision-making in a privacy-focused AI-first startup. Previously, Zairah was a Data Scientist at IBM Research, researching Natural Language Processing (NLP) and AI Fairness topics. She has published research and holds patents in these domains. Zairah obtained her M.S. in Computer Science from the University of Pennsylvania, where she researched scikit-learn model performance. Her findings have since been used as guidelines for applying machine learning to supervised classification tasks. Zairah has published her work in top AI conferences such AAAI and has over 300 citations. Aside from work, Zairah enjoys adventure sports and poetry.
From Zero to 100: Lakehouse Architecture for a Privacy Focused Search Engine(Talk)

Jeffrey Yau, PhD
Jeffrey Yau is currently Chief Data & A.I. Officer at Fanatics Collectibles. Most recently, he served as Global Head of Data Science, Analytics & Engineering at Amazon Music where he oversaw multiple teams who developed both insights-packed analytics and end-to-end statistical and machine learning systems. Prior to Amazon, Jeffrey worked at WalmartLabs as the VP of Data Science & Engineering where he led the team responsible for powering Walmart store mobile apps and the entire store finance system. Further, his team created end-to-end machine learning systems for key business initiatives and had a multi-billion dollar impact annually on Walmart U.S.
Over the years, he has held various senior level positions in quantitative finance at global investment management firm AllianceBernstein, consulting firm Data Science at Silicon Valley Data Science, multinational financial services company Charles Schwab Corporation, and the world’s leading professional services firm KPMG. He began his career as a tenure-track Assistant Professor of Economics at Virginia Tech, and he was an adjunct professor at UC Berkeley, Cornell, and NYU, teaching machine learning and advanced statistical modeling for finance and business.

Eric van Heck, PhD
Eric van Heck is Professor of Information Management and Markets at Rotterdam School of Management, Erasmus University, Netherlands. His research is focused on circular and digital business design, explainable AI, and designing AI algorithms for circular business. He led the ‘AI in the Floriculture Chain’ project that has been awarded with the AIS Impact Award and the INFORMS ISS Design Science Award. He is the author of Technology Meets Flowers: Unlocking the Circular and Digital Economy (Springer Nature, 2021). He holds a MSc. and PhD. degree from Wageningen University.
Five Ways to Improve Your Algorithms for Circular Business(Talk)

Ori Nakar
Ori Nakar is a principal cyber-security researcher, a data engineer, and a data scientist at Imperva Threat Research group. Ori has many years of experience as a software engineer and engineering manager, focused on cloud technologies and big data infrastructure. Ori also has an AWS Data Analytics certification. In the Threat Research group, Ori is responsible for the data infrastructure and involved in analytics projects, machine learning, and innovation projects.
Botnets detection at scale – Lesson learned from clustering billions of web attacks into botnets(Talk)

Hakan Baba
Hakan is a staff software engineer in ML Platform team at Lyft. They build ML development, training and serving systems helping 40+ teams. Previously, Hakan was a staff engineer in Box. He helped build cloud content management applications focused on security and also scaled kubernetes clusters, service meshes in an on-premise infrastructure. He started his career at the hardware level, building ASICs and transitioned to distributed systems software in a startup experience. Hakan is passionate about wearing many hats, switching abstraction levels, operational excellence and mentorship, and loves challenges and solving problems that take the whole team to address.
Powering Millions of Real-time Decisions with Distributed Model Serving(Talk)

Eric Lagally, PhD
Eric Lagally, PhD earned a B.S. in Physics from Washington University in St. Louis and a Ph.D. in Bioengineering from the University of California, Berkeley. He has served as an assistant professor of Chemical and Biological Engineering at the University of British Columbia in Vancouver, Canada, and has taught high school, undergraduate, and graduate learners both in-person and online in subjects including physics, biology, math, and chemical engineering. Eric began at Western Governors University as a course instructor in General Education beginning in 2013 and has served as a Program Manager in the College of Information Technology beginning in 2015, as a Senior Manager beginning in 2017, and as Administrative Director in 2018. His current role is Program Director and Associate Dean for the data analytics programs in the College of Information Technology. His key goals are to expand access and equity in higher education using innovative instructional and organizational approaches.

Leticia Rabor
Leticia Rabor worked as a professional Software and Systems Engineer in the Defense and Aerospace industries for over 13 years. She has designed, implemented, and tested various image formation subsystem components for ground system development.
She has also worked in Academia since 2012. Her roles include program chair and instructor. Leticia is currently an adjunct professor at Fort Hays State University and a full-time senior instructor at Western Governor University.
She has a Master of Science degree in Information Assurance and a bachelor’s degree in Computer Science. Her yearly activities include conducting an external one hour workshop in both mobile development and JavaScript at the Geek Girls Tech Conference at University of San Diego (USD). She participated as one of the panel experts for “The future of mobile development” at the Geek Girls Tech Conference in San Diego, California. She is a member of the Women Who Code (WWC) and a recipient for “Faculty of the Year” award in 2017.

Daniel J. Smith, PhD
Daniel J. Smith, PhD, MBA has worked at WGU for 3 years. He has experience in several industries in analytics through the director level in insurance, health care administration, and higher education. His experience is in AI and machine learning applications in industry using R, Tableau, SAS and Python. He enjoys working with students to improve their analytical, programming, and communication skills.

Fabiana Martins Clemente
Fabiana Clemente is the co-founder and CDO of YData, combining Data Understanding, Causality, and Privacy as her main fields of work and research, with the mission to make data actionable for organizations. Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies. Host of the “When Machine Learning meets privacy” podcast and a guest speaker at Datacast and Privacy Please, the previous WebSummit speaker, was recently awarded “Founder of the Year” by the South Europe Startup Awards.
Hands-on Data-Centric AI: Data Preparation Tuning – Why and How?(Workshop)

Mingo Sanchez
Mingo is a Senior Sales Engineer at Plotly. After graduating from Bowdoin College with a degree in computer science, he started working with organizations in the master data management and data science spaces. Throughout his career, Mingo has partnered with large financial institutions, life sciences organizations, retail companies, and government agencies to help them better understand their data and more effectively serve their customers. Mingo enjoys building relationships with people to understand their pain points and help them solve their most challenging business and technical problems.
Learn how to Build Interactive Data Apps with Plotly Dash(Workshop)

Andrew Lamb
Andrew Lamb is the chair of the Apache Arrow Program Management Committee (PMC) and a Staff Software Engineer at InfluxData. He works on InfluxDB IOx, a time series database engine written in Rust, that heavily uses the Apache Arrow ecosystem. He actively contributes to many open source software projects including the Apache Arrow Rust implementation and the Apache Arrow DataFusion query engine.

Kyle Kirwan
Kyle Kirwan is the co-founder and CEO of Bigeye, the data observability company. Before starting Bigeye, Kyle led the development of Uber’s internal data operations tools: a data catalog, data lineage collector, data pipeline testing, and incident management tools. He enjoys hiking and tiki bars.
Session Title: Data Observability for Data Science Teams
Abstract: When putting models into production it’s critical to know how they’re performing over time. As the last mile of the data pipeline, models can be impacted by a variety of issues, often outside the control of the data science team. “Observability” promises to help teams detect and prevent issues that could impact their models—but what is observability vs. data observability vs. ML observability? Get practical answers and recommendations from Kyle Kirwan, former product leader for Uber’s metadata tools, and founder of data observability company, Bigeye.

Gary Nakanelua
Gary Nakanelua is a professional technologist with over 17 years of experience and the author of Experiment or Expire. Gary is the Managing Director of Innovation at Blueprint, a data intelligence company based in Bellevue, WA. He’s responsible for the experimentation and creation of Blueprint’s transformative solutions and accelerators. With his diverse background, Gary brings a different perspective to problems that businesses are facing today to create quantifiable solutions driven through a high level of collaborative thought processing, strategic planning, and cannibalization.
Streamlining Your Streaming Analytics with Delta Lake & Rust(Talk)

Greg West
A member of CSI for a decade, Greg has developed a wealth of expertise on knowledge graph technology. His true speciality lies demonstrating and developing custom solutions that leverage Anzo’s unique capabilities.
Session Title: Accelerating AI/ML Initiatives with Knowledge Graph
Abstract: Integrating and unifying data from diverse sources is foundational to AI and ML workflows. This workshop will demonstrate how Anzo’s knowledge graph platform can create an enterprise scale knowledge graph from several sources – setting organizations up for sustainable success with collective intelligence. During this workshop, users will:
Create a sample knowledge graph from several sources.
Demonstrate flexible data preparation for training datasets.
Analyze the knowledge graph with native visualizations and graph algorithms
Connect to the knowledge graph for additional data science operations
From its hyper agile in-memory MPP graph engine to its point-and-click user experience and open flexible architecture, Anzo transcends the limitations of traditional knowledge graphs and gives you all the capabilities and flexibilities that complex, enterprise-scale solutions need.
Join this demo to see why Anzo might be the solution you need.

Pavel Klushin
Pavel Klushin is a seasoned solution architecture expert who currently leads the function at Qwak. With years of experience in the technology industry, he is known for his exceptional ability to design and deliver innovative solutions that meet the specific needs of his clients. Pavel previously led the solution architecture team at Spot (Aquired by NetApp).
Session Title: End to end Machine learning pipeline management
Abstract: Join this demo to find how to centralize your ML pipeline and cut down operational complexities at each stage along the way. Qwak’s platform supports multiple use cases across any business vertical and allows data teams to productionize their models more efficiently and without depending on engineering resources. Join us to watch how <presenter name> uses Qwak to create features from data and build, train and deploy models into production. All under a single platform and with unprecedented simplicity.

Seth Juarez
My name is Seth Juarez. I currently live near Redmond, Washington and work for Microsoft.
I received my Bachelors Degree in Computer Science at UNLV with a Minor in Mathematics. I also completed a Masters Degree at the University of Utah in the field of Computer Science. I currently am interested in Artificial Intelligence specifically in the realm of Machine Learning. I currently work as a Program Manager in the Azure Artificial Intelligence Product Group.
I’ve been married now for 21 years to a fabulously talented woman and have two beautiful daughters, and two feisty sons.
Session Title: Ask the Experts! ML Pros Deep-Dive into Machine Learning Techniques and MLOps
Abstract: Experienced machine learning engineers and data scientists care about ways to easily get their models up and running quickly and share ML assets across teams for collaboration. Collaborate and streamline the management of thousands of models across teams with new, innovative features in Azure Machine Learning. Come and join us in this interactive session with our product experts and get your questions answered on the latest capabilities in Azure Machine Learning!

Kerstin Frailey
Kerstin is CEO and Co-founder of SuperUse, a collaboration platform. She has led data science initiatives at startups across industries, from healthcare to CPG. She takes pride in mentoring fantastic data scientists and nurturing talent. A builder at heart, she regularly pushes code, trains models, and uncovers insights. She has Masters degrees in Mathematical Computer Science and Mathematical Statistics. She is expecting her PhD from Cornell in early 2023. She spends her free time going on long hikes with her two small dogs through the big mountains outside Seattle.

Peter Wang
Peter Wang is the CEO and co-founder of Anaconda, Inc. Prior to founding Anaconda (formerly Continuum Analytics), Peter spent 15 years in software design and development across a broad range of areas, including 3D graphics, geophysics, large data simulation and visualization, financial risk modeling, and medical imaging. As a creator of the PyData community and conferences, he devotes time and energy to growing the Python data science community and advocating for increasing data literacy around the world. Peter holds a BA in Physics from Cornell University.

Temilade Oyeniyi, CFA
Temilade (“Temi”) Oyeniyi, CFA is Vice President at S&P Global Market Intelligence’s Quantamental Research Group, which is responsible for building global equity strategies for institutional investors.

Sagar Samtani, PhD
Dr. Sagar Samtani is an Assistant Professor and Grant Thornton Scholar in the Department of Operations and Decision Technologies at Indiana University. Dr. Samtani graduated with his Ph.D. from the AI Lab from University of Arizona. Dr. Samtani’s research interests are in AI for Cybersecurity, developing deep learning approaches for cyber threat intelligence, vulnerability assessment, open-source software, AI risk management, and Dark Web analytics. He has received funding from NSF’s SaTC, CICI, and SFS programs and has published over 40 peer-reviewed articles in leading information systems, machine learning, and cybersecurity venues. He is deeply involved with industry, serving on the Board of Directors for the DEFCON AI Village and Executive Advisory Council for the CompTIA ISAO.

Laura Skylaki, PhD
Laura Skylaki is a Manager of Applied Research in Thomson Reuters Labs, where she leads advanced machine learning projects in the domain of Legal and Tax AI.With a career spanning more than a decade at the intersection of research and practical application, she has contributed technical expertise in diverse fields such as bioinformatics and stem cell biology, image processing and natural language processing. She holds a doctorate in stem cell bioinformatics from the University of Edinburgh, UK, and has been publishing on machine learning applications in leading academic journals since 2012.
NLP Fundamentals(Training)

Rehgan Avon
Rehgan Avon is the co-founder & CEO of AlignAI, a Knowledge Management Platform helping companies sustainably transform their organizations to effectively work with data & Artificial Intelligence. With a background in Integrated Systems Engineering and a strong focus on building technology to support analytics and machine learning, Rehgan has worked on architecting solutions and products to operationalize machine learning models at scale within the large enterprise. Rehgan’s previous experience has been fueled by a passion for early-stage startups and product development.
Rehgan has built an extensive community of analytics & data experts through Women in Analytics, a global organization she founded in 2016 to provide more visibility to diverse individuals making an impact in this space. She hosts a global annual conference that has put over 250 women on the stage. The community has over 5000 members from around the world that participate in tutorials, learning groups, discussion boards, and mentorship programs. She was also inducted into the inaugural class of Columbus CEO’s Future 50.
Building a Capability Roadmap: The Maturity Stages of Data & AI(Business Talk)

Joshy George, PhD
Joshy George is a bioinformatics researcher with a Ph.D. in Bioinformatics from the University of Melbourne, Australia, and a Master's in Computer Science from the Indian Institute of Science. With his background in data science and machine learning, Dr. George has co-authored over 100 peer- reviewed scientific articles, showcasing expertise in developing principled methods to solve complex biological problems. In his current role, he leads a team that is focused on building predictive models for cancer precision medicine and understanding the molecular mechanisms leading to diseases.
Is Machine Learning Necessary to Solve Problems in Biology(Talk)

Sanja Cvijic, PhD
Ms. Sanja Cvijic is a Senior Scientist at Charles River Analytics who leads our Probabilistic AI Representations and Reasoning Systems group and has pioneered the application of Scruff to real-world problems in ISR and maintenance. Dr. Cvijic’s research activities are centered around applications of probabilistic programming to condition monitoring, fault detection and prediction systems. She developed a prognostic health management tool for assessing health and status of power transformers in Scruff. She also developed a probabilistic tool in Scruff for improved space domain awareness for assessing risks to satellites in space. Previously, she worked as a Director of Software and a Consultant in power industry at New Electricity Transmission Software Solutions. She earned her Doctoral degree in Electrical and Computer Engineering, Power Systems, at Carnegie Mellon University in 2013. She earned her Bachelors in Electrical and Computer Engineering at the University of Belgrade, Serbia, in 2008.

Jared Lander
Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fundraising to finance and humanitarian relief efforts. He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Robert F. Dougherty, PhD
As the Vice President of Digital Health Research at COMPASS Pathways, Bob is leading the data science and machine learning efforts aimed at improving the safety, efficacy, and scalability of psilocybin therapy. He is an accomplished neuroscientist and engineer with deep expertise in measuring human brain and behavior, and building data-driven solutions to mental health care challenges. Prior to joining COMPASS Pathways, Bob was VP of Research at Mindstrong, leading the research and data science teams in the development of digital biomarkers for mental health. Prior to Mindstrong, Bob was the Research Director of the Stanford Center for Neurobiological Imaging. He has published over one hundred peer-reviewed articles in the fields of psychology, psychiatry, neuroscience, statistics, and magnetic resonance technology over his 30+ year scientific career. Bob completed his PhD in Experimental Psychology at the University of California at Santa Cruz, and postdoctoral fellowships at the University of British Columbia and Stanford University.

Anna Jung
Anna Jung is a Senior ML Open Source Engineer at VMware, contributing to various open source projects related to Machine Learning. She believes in the importance of giving back to the community and is passionate about increasing diversity in open source. When away from the keyboard, Anna is often at film festivals supporting independent filmmakers.
Do You Know About The People Behind The Tools?(Career Talk)

Jenna Reps, PhD
Jenna Reps is a Director at Janssen Research and Development where she is focusing on developing novel solutions to personalize risk prediction. Jenna’s areas of expertise include applying machine learning and data mining techniques to develop solutions for various healthcare problems. She is currently working within the patient level prediction OHDSI workgroup with the aim of developing open source and user friendly software for developing risk models using data sets in the OMOP Common Data Model format. Prior to joining Janssen Research and Development, Jenna was a Senior Research Fellow at the University of Nottingham where she developed supervised learning techniques to signal adverse drug reactions using UK primary care data and acted as a data consultant to other researchers within the University. Jenna received her BSc in Mathematics and MSc in Mathematical Biology at the University of Bath and her PhD in Computer Science at the University of Nottingham.
Patient Level Prediction with Supervised Learning Models in Federated Data Networks(Tutorial)

Christian Ramirez
Christian is Machine Learning Technical Leader at Mercado Libre, the largest e-commerce/fintech company in Latin America, where he dedicates his efforts to creating tools for monitoring and quality of learning models. He is a Computer Engineer and Master in Science with a major in Astronomy from UNAM (Universidad Nacional Autonoma de Mexico). He is a “Xoogler” and has more than 15 years of experience in the field of machine learning. He has lectured in almost a dozen countries.
Introduction to Topological Data Analysis and its Advantages in Machine Learning(Lightning Talk)

Shanshan Bi, PhD
Shanshan is working for Novelis as Lead Data Scientist. Her field focuses on advanced operation data analytics, and AI implementation in aluminum rolling and recycling. Her team is leading AI Eco system build up in Novelis. She got her PhD degree from Missouri S&T, and worked for Center of Intelligent Maintenance Systems with focusing on fault diagnosis, prognosis and predictive maintenance in IIoT systems.
The Power of AI in Aluminum Manufacturing(Lightning Talk)

Minsoo Thigpen
Minsoo is a Senior Product Manager at Microsoft Azure Machine Learning designing and building out Responsible AI tools for data scientists. She’s worked with OSS tools such as InterpretML, Fairlearn, Responsible AI Toolbox and contributed to the UX of the Responsible AI dashboard now released in Azure Machine Learning. She has bachelor’s degrees in Applied Mathematics and Painting from Brown University and Rhode Island School of Design (RISD). Coming from an interdisciplinary background with experience in building machine learning models and products, analyzing data, and designing UX, she is always finding work at the intersection of AI/ML, design, and social sciences to empower data and ML practitioners to work ethically and responsibly end-to-end.

Mehrnoosh Sameki, PhD
Mehrnoosh Sameki is a principal PM manager at Microsoft, where she leads emerging Responsible AI technology and tools and for the Azure Machine Learning platform. She has cofounded Error Analysis, Fairlearn and Responsible AI Toolbox and has been a contributor to the InterpretML offering. She earned her PhD degree in computer science at Boston University, where she currently serves as an adjunct assistant professor, offering courses in responsible AI. Previously, she was a data scientist in the retail space, incorporating data science and machine learning to enhance customers’ personalized shopping experiences.

Dr Douglas Blank
Dr. Blank is Professor Emeritus at Bryn Mawr College and Head of Research at Comet ML. Doug has 30 years of experience in Deep Learning and Robotics, was one of the founders of the area of Developmental Robotics, and is a contributor to the open source Jupyter Project, a core tool in Data Science. He currently lives in San Francisco, California, along with his family and animals.

Yaron Haviv
Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in AI, cloud, data and networking to leading startups and enterprises since the late 1990s. As the Co-Founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless framework with over 4,000 Github stars and MLRun, a cutting-edge open source MLOps orchestration framework. Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA – NASDAQ: NVDA), where he led technology innovation, software development and solution integrations. He also served as the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007 and was later acquired by Mellanox (NASDAQ:MLNX). Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He sits on the Data Science Committee of the AI Infrastructure Alliance (AIIA), of which Iguazio is a founding member. He is co-authoring a book on Implementing MLOps in the Enterprise for O’Reilly. Yaron presents at major industry events worldwide and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.

Hajime Takeda
Hajime is a data professional with five years of expertise in marketing, retail, and eCommerce, working across Japan and the United States.
As a Data Analyst at Procter and Gamble and MIKI HOUSE Americas, Hajime has led data-driven strategy formulation and implemented technology initiatives such as e-commerce expansion, advertising optimization, and the identification of growth opportunities.
As an organizer of PyData NYC, Hajime is dedicated to fostering a vibrant community centered around the exchange of knowledge on open-source technologies in New York. Additionally, Hajime lends his expertise as a contributing technical writer for Towards Data Science.
Media Mix Modeling: How to Measure the Effectiveness of Advertising in Python(Talk)

Danny Bharat
Danny Bharat is a seasoned supply chain industry professional and the Senior Vice President of Analytics at Cedric Millar Integrated Solutions. As a co-founder of Beacon Analytics, powered by Cedric Millar, he leads a growing team of solutions architects and data scientists in delivering comprehensive business intelligence and supply-chain solutions for end-to-end operations. With a deep focus on corporate planning, strategy, and digital transformation, Danny has accumulated a wealth of experience in multiple industries. He is dedicated to encouraging continuous professional growth and development through mentorship. Danny strongly believes that leaders with technical competence are more effective, and he practices what he preaches by being a self-taught dabbler in Python and DAX languages. He is passionate about using his expertise to help businesses succeed and deliver exceptional results for their customers.
Demo Session Title: Achieving Flexibility and Speed with Schema-on-Read Architecture: Moving Beyond SQL
and RDBMS
Abstract:
Beacon Analytics helps customers transition from rigid and monolithic data solutions to flexible microservices architecture, enabling better performance and faster access to critical information. By breaking up data into smaller, independent services, customers gain greater access and modification capabilities. The team recommends using the Polars library, which is based on Apache Arrow, in combination with Dash Plotly to create easy to maintain, high-performance solutions at an excellent price-to-performance ratio. Join Danny Bharat, Senior Vice President of Analytics at Cedric Millar and co-founder of Beacon Analytics, as he shares how his team’s innovative approach to data solutions allows them to build comprehensive 360° intelligence and deliver actionable insights. Beacon Analytics empowers customers to achieve success in a rapidly changing business and technology landscape by utilizing schema-on-read approaches, unstructured data storage, and on-the-fly analysis and transformation.

Kristen Kehrer
Kristen is a Developer Advocate at CometML. Since 2010, Kristen has been delivering innovative and actionable statistical modeling solutions in industry in the utilities, healthcare, and eCommerce. Kristen was a LinkedIn Top Voice – Data Science & Analytics in 2018. Previously Kristen was Faculty/SME at Emeritus Institute of Management and Creator of Data Moves Me, LLC. Kristen holds an MS in Applied Statistics from Worcester Polytechnic Institute and a BS in Mathematics.
Session Title: On the Scent: Detecting Dogs on Edge Devices With YOLOv8 and Comet
Abstract:
Proper tracking is crucial for ensuring the reproducibility of results obtained during model development and fostering effective collaboration among multiple developers on a machine learning project.In this talk, Kristen will discuss the process of developing a dog detection system using YOLOv8 on edge devices and the role of Comet, an experiment management platform, in handling the intricacies of the project.
Kristen will guide you through the entire process, from generating a data artifact to deploying the model, emphasizing the benefits of utilizing Comet at each stage. She will showcase how Comet was employed to monitor experiment metrics, visualize model performance, and illustrate the ease with which the selected model can be tracked in production. Participants will gain valuable insights on how to leverage an experiment tracking and monitoring solution like Comet to enhance their model development process, making it more transparent and reproducible.

Eric Vogelpohl
Eric Vogelpohl is the Managing Director of Tech Strategy at Blueprint. He’s a proven IT professional with more than 20 years of experience and a high degree of technical and business acumen. He has an insatiable passion for all-things-tech, pro-cloud/SaaS, leadership, learning, and sharing ideas on how technology can turn data into information & transform user experiences.
Session Title: Top 5 Cool Tricks of Delta for Data Scientists – Why Your Data Lake Should be a Delta Lake
Abstract:
In this 25-minute demo, we will explore the top 5 cool tricks of Delta for data scientists and discuss why your data lake should be a Delta Lake. Delta Lake is an open-source storage layer that brings reliability to data lakes by providing ACID transactions, scalable metadata handling, and data versioning. We will first introduce the concept of Delta Lake and explain how it helps data scientists to manage their data pipelines with ease. We will then dive into the top 5 cool tricks of Delta Lake, which include performance optimizations, time travel, schema enforcement, automatic data merging, and data validation. We will demonstrate these tricks using real-world examples and show how they can simplify your data pipeline and reduce your development time. By the end of this talk, you will have a better understanding of Delta Lake’s features and how it can help you to manage your data lake efficiently. You will also have learned about the benefits of using Delta Lake and why it’s a must-have for data scientists working with large data sets.
More Speakers Added Weekly
ODSC East 2023 | May 9th - 11th
REGISTER HEREParticipate at ODSC East 2023
As part of the global data science community we value inclusivity, diversity, and fairness in the pursuit of knowledge and learning. We seek to deliver a conference agenda, speaker program, and attendee participation that moves the global data science community forward with these shared goals. Learn more on our code of conduct, speaker submissions, or speaker committee pages.
ODSC Newsletter
Stay current with the latest news and updates in open source data science. In addition, we’ll inform you about our many upcoming Virtual and in person events in Boston, NYC, Sao Paulo, San Francisco, and London. And keep a lookout for special discount codes, only available to our newsletter subscribers!