December 31, 2020
This article is a deep dive into the techniques needed to get SSD300 object detection throughput to 2530 FPS. We will rewrite Pytorch model code, perform ONNX graph surgery, optimize a TensorRT plugin and finally we’ll quantize the model to an 8-bit representation. We will also examine divergence from the accuracy of the full-precision model.
October 17, 2020
In this article we take performance of the SSD300 model even further, leaving Python behind and moving towards true production deployment technologies: TorchScript, TensorRT and DeepStream. We also identify and understand several limitations in Nvidia’s DeepStream framework, and then remove them by modifying how the nvinfer
element works.
September 30, 2020
Making code run fast on GPUs requires a very different approach to making code run fast on CPUs because the hardware architecture is fundamentally different. Machine learning engineers of all kinds should care about squeezing performance from their models and hardware — not just for production purposes, but also for research and training. In research as in development, a fast iteration loop leads to faster improvement. This article is a practical deep dive into making a specific deep learning model (Nvidia’s SSD300) run fast on a powerful GPU server, but the general principles apply to all GPU programming.
September 23, 2020
Taking machine learning models into production for video analytics doesn’t have to be hard. A pipeline with reasonable efficiency can be created very quickly just by plugging together the right libraries. In this post we’ll create a video pipeline with a focus on flexibility and simplicity using two main libraries: Gstreamer and Pytorch.