CUDA 11.3.0
cuBLAS 11.5.1.101
cuDNN 8.2.0.41
NCCL 2.9.6
TensorRT 7.2.3.4
Triton Inference Server 2.9.0
I'm new to deploying to production inference so I'm not sure if those are easily portable across such platforms or not really.
https://NN-512.com (open source, free software, no dependencies)
With batch size 1, NN-512 is easily 2x faster than TensorFlow and does 27 ResNet50 inferences per second on a c5.xlarge instance. For more unusual networks, like DenseNet or ResNeXt, the performance gap is wider.
Even if you allow TensorFlow to use a larger ResNet50 batch size, NN-512 is easily 1.3x faster.
If you need a few dozen inferences per second per server, this is the cheapest way. And you're not depending on a proprietary solution whose parent company could go out of business in a year.
If you need Transformers instead of convolutions, Fabrice Bellard's LibNC is a good solution: https://bellard.org/libnc/
> If you need a few dozen inferences per second per server, this is the cheapest way. And you're not depending on a proprietary solution whose parent company could go out of business in a year.
Definitely the cheapest way.
We've been in business for more than a year already actually :)
I’ve never heard of that type before and I wasn’t able to find anything with google.
Furthermore more, the lack of company information (address, company registration nr etc) and the fact that it’s not clear where the servers are located geographically makes me a bit hesitant.
It helps us figure out what got done and where we are in our roadmap