TorchScript

Solving Machine Learning Performance Anti-Patterns: a Systematic Approach

June 24, 2021
Machine Learning Productionization
Nsight Systems, NVTX, Optimization, TensorRT, TorchScript, Quantization

This article is a high-level introduction to an efficient worfklow for optimizing runtime performance of machine learning systems running on the GPU. Using traces from Nsight Systems to show real production scenarios, I introduce a set of common utilization patterns and outline effective approaches to improve performance.

Mastering TorchScript: Tracing vs Scripting, Device Pinning, Direct Graph Modification

October 29, 2020
Machine Learning Productionization
Pytorch, TorchScript, TensorRT, ONNX, Nsight Systems

TorchScript is one of the most important parts of the Pytorch ecosystem, allowing portable, efficient and nearly seamless deployment. With just a few lines of torch.jit code and some simple model changes you can export an asset that runs anywhere libtorch does. It’s an important toolset to master if you want to run your models outside the lab at high efficiency. This article is a collection of topics going beyond the basics of your first export.

Object Detection at 1840 FPS with TorchScript, TensorRT and DeepStream

October 17, 2020
Visual Analytics, Machine Learning Productionization
SSD300, Pytorch, Object Detection, Optimization, DeepStream, TorchScript, TensorRT, ONNX, NVTX, Nsight Systems

In this article we take performance of the SSD300 model even further, leaving Python behind and moving towards true production deployment technologies: TorchScript, TensorRT and DeepStream. We also identify and understand several limitations in Nvidia’s DeepStream framework, and then remove them by modifying how the nvinfer element works.


© Paul Bridger 2020