Abstract: Nearly all state-of-the-art results in popular deep learning benchmarks rely upon curated hyperparameter schedules, such as for the learning rate or momentum. Interestingly, the best practices for setting hyper parameters are highly problem and domain dependent. As such, most deep learning practitioners spend a significant amount of time hand-crafting and tuning hyperparameters to achieve their desired result, leading to increased human labor and compute costs in developing deep learning applications. Despite the widespread use of such schedules and their marked impact on practical deep learning benchmarks, no comprehensive analysis has yet been performed that defines the potential approaches for hyperparameter scheduling and outlines the best approaches for handling common hyperparameters in different applications.
Closing this gap, we explore scheduling options for three common hyperparameters in neural network training—the momentum, learning rate, and precision. For each of these areas, a suite of potential schedules is explored across various experimental settings, forming a comprehensive picture of scheduling behavior for different hyperparameters, models, and datasets. Numerous interesting findings arise from this analysis, such as the fact that momentum decay is a strong alternative to learning rate decay and that certain learning rate decay schedules yield impressive improvements for budgeted training settings. Additionally, by analyzing macro trends in the behavior of hyperparameter schedules, we arrive at various best practices that can be used by practitioners to minimize time spent tuning hyperparameters and maximize performance and efficiency within their deep learning applications.
Bio: Cameron Wolf is a Ph.D. student in Computer Science at Rice University in Houston, TX advised by Dr. Anastasios Kyrillidis. His interests are loosely related to math and machine/deep learning, including non-convex optimization, theoretically-grounded algorithms for deep neural networks, continual learning, and deep learning on video data. Prior to Rice, he was an undergraduate student in Computer Science at UT Austin, where he worked with the Neural Networks Research Group on research related to genetic algorithms and evolutionary computation.
Outside of academia, He is a Research Scientist at Alegion, a software startup based in Austin, TX. At Alegion, he focuses on the development of long-term, practical research projects in several areas, including streaming training of deep neural networks and evaluating the quality of video-based annotations for computer vision applications. Additionally, he produces a lot of technical content—both internally and externally—with the goal of familiarizing those outsides of academia with important topics and considerations relevant to artificial intelligence.