|0:00 - 0:45 AEST|
|9:00 am - 9:45 am CDT|
|16:00 - 16:45 CEST|
|14:00 - 14:45 UTC|
Keynote 1 — Optimizing Machine Learning on Any Device, Automatically!
Optimizing Machine Learning on Any Device, Automatically!
by Matt Welsh, OctoML
Machine learning models are notorious for being computationally expensive. This is especially a problem when running ML models on computationally limited edge devices. The standard way to deal with this is to use vendor-specific, hand-tuned libraries, which provide fast implementations of common ML operators, such as convolution and pooling. However, it’s a huge amount of engineering effort to build and maintain these libraries for a wide range of devices and network operators. And even a good hand-tuned implementation may not perform optimally in all cases.
Apache TVM takes a different approach — it automatically generates fast binary code for any model, on any device, by exploring a large search space of potential optimizations. TVM itself uses machine learning to guide its code synthesis process, saving months of engineering time. The code generated by TVM can be many times faster than hand-optimized libraries — in some cases exceeding a speedup of 30x over hand-tuned code. TVM is an open source project with hundreds of contributors and an active developer community.
In this talk, I will give an overview of how Apache TVM delivers this level of performance, and how we are using it at OctoML to develop the Octomizer, a cloud-based, automatic ML model compiler for edge and server devices.
Matt Welsh is the VP of Engineering at OctoML, a Seattle-based startup founded by a team from University of Washington and the inventors of Apache TVM. Matt’s research interests focus on machine learning systems, mobile computing, and distributed systems. Prior to OctoML, Matt has been a professor at Harvard, an engineering director at Google, and an engineering lead at Apple and Xnor.ai. He received his PhD from UC Berkeley.