Readit News logoReadit News
parrt commented on GPT-4o's Memory Breakthrough – Needle in a Needlestack   nian.llmonpy.ai/... · Posted by u/parrt
19h · 2 years ago
Funnily enough I ran a 980k token log dump against Gemini Pro 1.5 yesterday to investigate an error scenario and it found a single incident of a 429 error being returned by a third-party API provider while reasoning that "based on the file provided and the information that this log file is aggregated of all instances of the service in question, it seems unlikely that a rate limit would be triggered, and additional investigation may be appropriate", and it turned out the service had implemented a block against AWS IPs, breaking a system that loads press data from said API provider, leaving the customer who was affected by it without press data -- we didn't even notice or investigate that, and Gemini just randomly mentioned it without being prompted for that.
parrt · 2 years ago
That definitely makes it seem like it's noticing a great deal of its context window. impressive.
parrt commented on GPT-4o's Memory Breakthrough – Needle in a Needlestack   nian.llmonpy.ai/... · Posted by u/parrt
19h · 2 years ago
I'd like to see this for Gemini Pro 1.5 -- I threw the entirety of Moby Dick at it last week, and at one point all books Byung Chul-Han has ever published, and it both cases it was able to return the single part of a sentence that mentioned or answered my question verbatim, every single time, without any hallucinations.
parrt · 2 years ago
Wow. Cool. I have access to that model and have also seen some impressive context extraction. It also gave a really good summary of a large code base that I dumped in. I saw somebody analyze a huge log file, but we really need something like this needle in a needlestack to help identify when models might be missing something. At the very least, this could give model developers something to analyze their proposed models.
parrt commented on GPT-4o's Memory Breakthrough – Needle in a Needlestack   nian.llmonpy.ai/... · Posted by u/parrt
parrt · 2 years ago
The article shows how much better GPT-4o is at paying attention across its input window compared to GPT-4 Turbo and Claude-3 Sonnet.

We've needed an upgrade to needle in a haystack for a while and this "Needle In A Needlestack" is a good next step! NIAN creates a prompt that includes thousands of limericks and the prompt asks a question about one limerick at a specific location.

parrt commented on GPT-4o's Memory Breakthrough   github.com/llmonpy/needle... · Posted by u/sftombu
parrt · 2 years ago
BTW, how did you manage all of the throughput to the models and navigate the various throttling strategies for all the models you mentioned?
parrt commented on GPT-4o's Memory Breakthrough   github.com/llmonpy/needle... · Posted by u/sftombu
parrt · 2 years ago
Looks really cool and useful. Seems like GPT-4o it's a lot better than 4.
parrt commented on The matrix calculus you need for deep learning (2018)   explained.ai/matrix-calcu... · Posted by u/cpp_frog
cs702 · 2 years ago
Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.

Thank you for doing this with Jeremy and sharing it with the world!

parrt · 2 years ago
Sure thing! Very enjoyable to have people use our work.
parrt commented on The matrix calculus you need for deep learning (2018)   explained.ai/matrix-calcu... · Posted by u/cpp_frog
cs702 · 2 years ago
Please change the link to the original source:

https://arxiv.org/abs/1802.01528

---

EDIT: It turns out explained.ai is the personal website of one of the authors, so there's no need to change the link. See comment below.

parrt · 2 years ago
:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)

I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.

parrt commented on The matrix calculus you need for deep learning (2018)   explained.ai/matrix-calcu... · Posted by u/cpp_frog
trolan · 2 years ago
I finished Vector Calculus last year and have no experience in machine learning but this seems exceptionally thorough and would have made my life easier having a practical explanation over a mathematical one, but woe is the life of the engineering student I guess.
parrt · 2 years ago
Glad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.
parrt commented on How to Visualize Decision Trees   explained.ai/decision-tre... · Posted by u/LewisVerstappen
_l7dh · 4 years ago
And his visualization of constrained optimization is astonishing https://explained.ai/regularization/index.html (I struggled for a long time to get the right intuition of a Lagrangian)
parrt · 4 years ago
Thanks! Took me a year to discover the key nut there. L1 vs L2 regularization is not well described I found so I went nuts trying to nail it down.

u/parrt

KarmaCake day667October 25, 2014
About
Tech lead at Google. Computer languages guy (the ANTLR creator) retooling as machine learning guy, explainer, ex-professor (CS, data science). Hacking almost every day since 1980. Yes, I have tendinitis.
View Original