Why would you recommend that? It's all way more effort than just writing Cython, especially in a Jupyter Notebook. And Cython code can be just as fast as C/C++ code unless you're doing something really fancy. It's a bunch of work for no benefit.
>Warp jits and runs your code on CUDA or CPU
If someone's writing Cython it's probably because they found something that couldn't be done efficiently in Numpy because it was sequential, not easily vectorisable. Such code is going to get zero benefit from Cuda or running on the GPU.
In general, all your jitted code is not going to be as fast as code compiled with an ahead-of-time compiler like the C compiler that Cython uses. Moreover if you use a JIT then it makes your code a pain in the ass to embed in a C/C++ application, unlike Cython code.
nanobind/pybind11 (co-)author here. The space of python bindings is extremely diverse and on the whole probably looks very different from your use case. nanobind/pybind11 target the 'really fancy' case you mention specifically for codebases that are "at home" in C++, but which want natural Pythonic bindings. There is near-zero overlap with Cython.
A change in Python 3.9.0 introduces undefined behavior in combination with pybind11 (rarely occurring crashes, but could be arbitrarily bad). We will work around it in an upcoming version of pybind11, and Python will separately also fix this problem in 3.9.1 slated for release in December.
Details available here: https://pybind11.readthedocs.io/en/latest and https://github.com/python/cpython/pull/22670.
Deleted Comment