Abstract: Declarative, large-scale machine learning (ML) aims to simplify the development and usage of custom, large-scale ML algorithms. In Apache SystemML, data scientists specify ML algorithms in a high-level language with R-like syntax and the system automatically generates hybrid runtime execution plans that combine both single-node, in-memory operations and distributed operations on MapReduce or Spark. The compilation of large-scale ML programs exhibits many opportunities for automatic optimization, which is crucial to achieve both high efficiency and scalability if required. In this talk, we motivate declarative ML using real-world usecases, and we provide an up-to-date overview of SystemML including its various APIs for different deployments. We also discuss selected optimization and runtime techniques to help understand performance characteristics and limitations of the underlying system.
Bio: Matthias Boehm is a Research Staff Member at IBM Research - Almaden, where he is working since 2012 on optimization and runtime techniques for declarative, large-scale machine learning in SystemML. Since Apache SystemML's open source release in 2015, he is also a member of its incubator PMC. He received his Ph.D. from Technische Universitaet Dresden in 2011 with a dissertation on cost-based optimization of integration flows under the supervision of Prof. Wolfgang Lehner. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. In 2016, he received the VLDB Best Paper Award.