
Abstract: Having access to data is crucial for machine learning success. This will bring us to the question,
Where data lives these days?
A Great Deal In Databases With the increase of data in volume, variety, velocity in today's databases. Is there a better place to bring machine learning to, than being able to do machine learning straight in the databases? Database users meet the most important aspect of applied machine learning, which is to understand what predictive questions are important and what data is relevant to answer those questions.
Additionally, adding the statistical analysis for creating the most appropriate model to that, will yield the best combination which is auto-machine learning straight from the database. Bringing AutoML to those that know data best can significantly augment the capacity to solve important problems.
How Can We Achieve This?
Now we can have a seamless integration of the state of the art open-source AutoML tools into the most popular open-source databases, in a way such that their users can create, train and test machine learning models with the same knowledge they have of Structured Query Language (SQL).
How does it work?
We make use of databases’ neat capabilities of accessing external tables as if they were internal tables. As such, the integration of these models is painless and transparent allowing to:
- Exposing machine learning models like tables that can be queried. You simply SELECT what you want to predict and you pass in the WHERE statement the conditions for the prediction.
- Automatically, build, test, and train machine learning models with a simple INSERT statement, where you specify what you want to learn and from what query.
In this talk, we want to present what we have learned in the effort of enabling existing open-source databases like MariaDB, Postgres, MySQL, Clickhouse, etc, with frictionless ML powers.
Bio: Jorge Torres is the Co-founder & CEO of MindsDB. He is also a visiting scholar at UC Berkeley researching machine learning automation and explainability. Prior to founding MindsDB, he worked for a number of data-intensive start-ups, most recently working with Aneesh Chopra (the first CTO in the US government) building data systems that analyze billions of patients records and lead to the highest savings for millions of patients. He started his work on scaling solutions using machine learning in early 2008 while working as the first full-time engineer at Couchsurfing where he helped grow the company from a few thousand users to a few million. Jorge had degrees in electrical engineering & computer science, including a master's degree in computer systems with a focus on applied Machine Learning) from the Australian National University.