
Abstract: This talk describes how to extend the GPT paradigm to learning to act by watching (training on) videos. We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions). Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents. Ultimately this paradigm could help automate much of the work humans do on computers.
Bio: Jeff Clune is an Associate Professor of Computer Science at the University of British Columbia and a Faculty Member at the Vector Institute and a Senior Research Advisor at DeepMind.
Previously, he was a Research Team Leader at OpenAI. Before that he was a Senior Research Manager and founding member of Uber AI Labs, which was formed after Uber acquired a startup our startup. Prior to Uber, he was the Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming.
He conducts research in three related areas of machine learning (and combinations thereof):
- Deep Learning: Improving our understanding of deep neural networks, harnessing them in novel applications, and advancing deep reinforcement learning
- Evolving Neural Networks: Investigating open questions in evolutionary biology regarding how intelligence evolved and harnessing those discoveries to improve our ability to evolve more complex, intelligent neural networks
- Robotics: Making robots more like animals in being adaptable and resilient
A good way to learn about Jeff's research is by visiting the Google Scholar page, which lists all of his publications.

Jeff Clune, PhD
Title
Associate Professor, Computer Science at University of British Columbia | Canada CIFAR AI Chair at Vector Institute | Senior Research Advisor at DeepMind
