Michael Sarahan
Michael started out as a chemist and electron microscopist, but went to the dark side of build infrastructure after experiencing the pain of trying to distribute the tools he wrote. He's been involved with Conda and Conda-forge for quite a while, and now works on the RAPIDS build infrastructure team at NVIDIA. He has a strange obsession with encoding compatibility into package management systems, and dreams of a world where no user ever has to wonder why they are missing symbols. Metadata is his love language.
Sessions
“I like waiting for my build jobs,” said no one ever. CI is an essential part of ensuring quality, helping to highlight new issues before they might be merged into the main codebase. CI gives us confidence that the code changes being proposed don’t break things, as least as far as our tests cover. That confidence comes at the cost of time and compute resources.
The RAPIDS team at NVIDIA manages its own operations and compute resources. Those resources are limited, of course, so we wait our turn and put the toys back when we’re done.. It is essential to us that we are using our resources as efficiently as possible. This is the “Speed of Light” principle at NVIDIA: how close are you to a theoretical optimal limit? For CI, this involves several factors: startup wait time, docker image setup time, cache utilization, build tool processes, and limiting unnecessary redoing builds and tests for things that haven’t changed. The RAPIDS team set out to add telemetry to all of our builds, so that we can quantify where we are spending our time and compute resources, and ensure that we are spending them wisely. We’ll demonstrate the telemetry tools that we’re using, and show how you can add them to your build jobs.