amitness (u/amitness)

amitness commented on New tools for building agents openai.com/index/new-tool... · Posted by u/meetpateltech

swyx · a year ago

> Even their "function calling" abstraction still hallucinates parameters and schema

huh? sample code please? this should not be true since Structured Outputs came out - literally prevented from generating invalid json

(more: https://www.latent.space/p/openai-api-and-o1)

amitness · a year ago

It's not enabled by default for their function calling API. So, hallucination is possible.

You have to set 'strict' to True manually to use the same grammar-based sampling they use for structured outputs.

https://platform.openai.com/docs/guides/function-calling?api...

amitness commented on Show HN: I made a website to semantically search ArXiv papers papermatch.mitanshu.tech/... · Posted by u/Quizzical4230

Quizzical4230 · a year ago

That's has a major downgrade. For binary embeddings, the top 10 results are same as fp32, albeit shuffled. However after the 10th result, I think quality degrades quite a bit. I was planning to add a reranking strategy for binary embeddings. What do you think?

amitness · a year ago

Try this trick that I learned from Cohere: - Fetch top 10*k (i.e. 100) results using the hamming distance - Rerank by taking dot product between query embedding (full precision) and binary doc embeddings - Show top-10 results after re-ranking

amitness commented on Open source inference time compute example from HuggingFace github.com/huggingface/se... · Posted by u/burningion

dinp · a year ago

Great work! When I use models like o1, they work better than sonnet and 4o for tasks that require some thinking but the output is often very verbose. Is it possible to get the best of both worlds? The thinking takes place resulting in better performance but the output is straightforward to work with like with sonnet and 4o. Did you observe similar behaviour with the 1B and 3B models? How does the model behaviour change when used for normal tasks that don't require thinking?

Also how well do these models work to extract structured output? Eg- perform ocr on some hand written text with math, convert to html and format formulas correctly etc. Single shot prompting doesn't work well with such problems but splitting the steps into consecutive api calls works well.

amitness · a year ago

OpenAI recommends using o1 to generate the verbose plan and then chain the verbose output to a cheaper model (e.g. gpt-4o-mini) to convert it into structured data / function calls / summary etc. They call it planner-executor pattern. [1]

[1] https://vimeo.com/showcase/11333741/video/1018737829