There is also engrXiv, which has an OAI endpoint. https://engrxiv.org/oai?verb=ListRecords&metadataPrefix=oai_...
Amazing!
Colbert being a good google-able application of utilizing more embeddings.
Search ends up often being a funnel of techniques. Cheap and high recall for phase 1 and ratchet up the flops and precision in subsequent passes on the previous result set.
It would be cool if the "More Like This" had a + button that would append the arxiv id to the search query.
https://news.ycombinator.com/item?id=42519487
I just did a spot check, I think searchthearxiv search results are superior.
Don't forget chemrXiv!
I'm also maintaining a dataset of all the embeddings on kaggle if you want to use them yourself: https://www.kaggle.com/datasets/tomtum/openai-arxiv-embeddin...