Profiling#

Table of Contents#

In this series, we look at how to profile deep learning code with PyTorch. These profiling tools are a means for us to pry open the black box of CUDA mechanics and understanding how to interpret the performance metrics that are output by these tools will be sufficient for most deep learning practitioners. We will leave the hardcore CUDA optimization to the experts!

Of course, we will touch on PyTorch’s own profiler, while our posts are all very basic and introductory, we would still want advanced users to know how to incorporate these tools into their workflows (especially training deep learning models). What better to just read torchtune, another native library developed by the PyTorch team for training/finetuning LLMs. Inside you can see how they do profiling in practice for training.

References and Further Readings#