You'd need first to convert it to portable SIMD intrinsics. There are several libraries.
Apparently it also needs Clang to achieve the same performance: https://news.ycombinator.com/item?id=40875968
Also, if you were designing for smaller cases, say MNK=16 or 32, how would you approach it differently? I'm implementing neural ODEs and this is one point I've been considering.
The author also says "(...) implementation follows the BLIS design", but then proceeds to compare *only* with OpenBLAS. I'd love to see a more thorough analysis, and using C directly would make it easier to compare multiple BLAS libs.
Deleted Comment
https://github.com/timescale/timescaledb/blob/main/LICENSE-A...
Deleted Comment
from typing import TypedDict, Unpack
class Movie(TypedDict):
name: str
year: int
def foo(*kwargs: Unpack[Movie]): ...
Maybe now I'll be able to actually figure out what data to send libraries without actually reading their source code.1. https://docs.python.org/3.12/whatsnew/3.12.html#pep-692-usin...
Deleted Comment
I'll add that this is the first site that misdetects anything like that.
Deleted Comment