That's has a major downgrade. For binary embeddings, the top 10 results are same as fp32, albeit shuffled. However after the 10th result, I think quality degrades quite a bit. I was planning to add a reranking strategy for binary embeddings. What do you think?
Try this trick that I learned from Cohere:
- Fetch top 10*k (i.e. 100) results using the hamming distance
- Rerank by taking dot product between query embedding (full precision) and binary doc embeddings
- Show top-10 results after re-ranking
huh? sample code please? this should not be true since Structured Outputs came out - literally prevented from generating invalid json
(more: https://www.latent.space/p/openai-api-and-o1)
You have to set 'strict' to True manually to use the same grammar-based sampling they use for structured outputs.
https://platform.openai.com/docs/guides/function-calling?api...