Quantization

Object Detection at 2530 FPS with TensorRT and 8-Bit Quantization

December 31, 2020
Visual Analytics, Machine Learning Productionization
SSD300, Pytorch, Object Detection, Optimization, TensorRT, Quantization, ONNX, Nsight Systems

This article is a deep dive into the techniques needed to get SSD300 object detection throughput to 2530 FPS. We will rewrite Pytorch model code, perform ONNX graph surgery, optimize a TensorRT plugin and finally we’ll quantize the model to an 8-bit representation. We will also examine divergence from the accuracy of the full-precision model.


© Paul Bridger 2020