The ideal is trilinear sampling instead of bilinear. Easy to do with GPU APIs, in D3D11 it’s ID3D11DeviceContext.GenerateMips API call to generate full set of mip levels of the input texture, then generate each output pixel with trilinear sampler. When doing non-uniform downsampling, use anisotropic sampler instead of trilinear.
Not sure high level image processing libraries like the mentioned numpy, PIL or TensorFlow are doing anything like that, though.
Have you never downscaled and upscaled images in a non-3D-rendering context?
Indeed, and I found that leveraging hardware texture samplers is the best approach even for command-like tools which don’t render anything.
Simple CPU running C++ is just too slow for large images.
Apart from a few easy cases like 2x2 downsampling discussed in the article, SIMD optimized CPU implementations are very complicated for non-integer or non-uniform scaling factors, as they often require dynamic dispatch to run on older computers without AVX2. And despite the SIMD, still couple orders of magnitude slower compared to GPU hardware while only delivering barely observable quality win.
It has worse quality than something like a Lanczos filter, and it requires computing image pyramids first, i.e., it is also slower for the very common use case of rescaling images just once. And that article isn't really about projected/distorted textures, where trilinear filtering actually makes sense.