Readit News logoReadit News
hxypqr commented on GPT-4.5 or GPT-5 being tested on LMSYS?   rentry.co/GPT2... · Posted by u/atemerev
jiggawatts · 2 years ago
I recently tried a Fermi estimation problem on a bunch of LLMs and they all failed spectacularly. It was crossing too many orders of magnitude, all the zeroes muddled them up.

E.g.: the right way to work with numbers like a “trillion trillion” is to concentrate on the powers of ten, not to write the number out in full.

hxypqr · 2 years ago
Predicting the next character alone cannot achieve this kind of compression, because the probability distribution obtained from the training results is related to the corpus, and multi-scale compression and alignment cannot be fully learned by the backpropagation of this model
hxypqr commented on GPT-4.5 or GPT-5 being tested on LMSYS?   rentry.co/GPT2... · Posted by u/atemerev
kromem · 2 years ago
I certainly hope it's not GPT-5.

This model struggles with reasoning tasks Opus does wonderfully with.

A cheaper GPT-4 that's this good? Neat, I guess.

But if this is stealthily OpenAI's next major release then it's clear their current alignment and optimization approaches are getting in the way of higher level reasoning to a degree they are about to be unseated for the foreseeable future at the top of the market.

(Though personally, I just think it's not GPT-5.)

hxypqr · 2 years ago
The reasoning ability of Opus also has a clear ceiling
hxypqr commented on AlphaGeometry: An Olympiad-level AI system for geometry   deepmind.google/discover/... · Posted by u/FlawedReformer
tsimionescu · 2 years ago
The key insight is this whole thread is that this Alpha Geometry only works because the search field is not a googol combinations. So, it doesn't really generalize to many other fields of math. We shouldn't expect an AlphaCategoryTheory or AlphaNumberTheory anytime soon.
hxypqr · 2 years ago
Key insight is the finiteness of reasoning parts in planar geometry that can be quickly solved by the SAT, which often does not exist in most first-order and second-order logics, such as number theory, algebra, or functional analysis
hxypqr commented on AlphaGeometry: An Olympiad-level AI system for geometry   deepmind.google/discover/... · Posted by u/FlawedReformer
nybsjytm · 2 years ago
If I read their paper right, this is legit work (much more legit than DeepMind's AI math paper last month falsely advertised as solving an open math research problem) but it's still pretty striking how far away the structure of it is from the usual idea of automated reasoning/intelligence.

A transformer is trained on millions of elementary geometry theorems and used as brute search for a proof, which because of the elementary geometry context has both a necessarily elementary structure and can be easily symbolically judged as true or false. When the brute search fails, an extra geometric construction is randomly added (like adding a midpoint of a certain line) to see if brute search using that extra raw material might work. [edit: as corrected by Imnimo, I got this backwards - the brute search is just pure brute search, the transformer is used to predict which extra geometric construction to add]

Also (not mentioned in the blog post) the actual problem statements had to be modified/adapted, e.g. the actual problem statement "Let AH1, BH2 and CH3 be the altitudes of a triangle ABC. The incircle W of triangle ABC touches the sides BC, CA and AB at T1, T2 and T3, respectively. Consider the symmetric images of the lines H1H2, H2H3, and H3H1 with respect to the lines T1T2, T2T3, and T3T1. Prove that these images form a triangle whose vertices lie on W." had to be changed to "Let ABC be a triangle. Define point I such that AI is the bisector of angle BAC and CI is the bisector of angle ACB. Define point T1 as the foot of I on line BC. Define T2 as the foot of I on line AC. Define point T3 as the foot of I on line AB. Define point H1 as the foot of A on line BC. Define point H2 as the foot of B on line AC. Define point H3 as the foot of C on line AB. Define point X1 as the intersection of circles (T1,H1) and (T2,H1). Define point X2 as the intersection of circles (T1,H2) and (T2,H2). Define point Y2 as the intersection of circles (T2,H2) and (T3,H2). Define point Y3 as the intersection of circles (T2,H3) and (T3,H3). Define point Z as the intersection of lines X1X2 and Y2Y3. Prove that T1I=IZ."

hxypqr · 2 years ago
The problem is that using LLM as a role for drawing auxiliary lines is too inefficient. It is hard to imagine people deploying a large number of machines to solve a simple IMO problem. This field must be in the early stage of development, and much work remains unfinished. A reasonable point of view is that the search part should be replaced by a small neural network, and the reasoning part should not be difficult, and does not require much improvement. Now is the time to use self-play to improve performance, treating the conclusions that need to be proved in plane geometry problems as a point in the diagram and the conditions as another point in the diagram. Then two players try to move towards each other as much as possible and share data, so that the contribution made by each player in this process can be used as an analogy for calculating wins and losses in Go, and thus improve performance through self-play.
hxypqr commented on AlphaGeometry: An Olympiad-level AI system for geometry   deepmind.google/discover/... · Posted by u/FlawedReformer
riku_iki · 2 years ago
My understanding is that they encoded domain in several dozens of mechanical rules described in extended data table 1, and then did transformer-guided brute force search for solutions.
hxypqr · 2 years ago
The problem is that LLM as a role for drawing auxiliary lines is too inefficient. It is hard to imagine people deploying a large number of machines to solve a simple IMO problem. This field must be in the early stage of development, and much work remains unfinished
hxypqr commented on Dynamic programming is not black magic   qsantos.fr/2024/01/04/dyn... · Posted by u/qsantos
hxypqr · 2 years ago
Most of Dynamic programming is just a method of reducing computational complexity by changing the noun objects in first-order logic (or second-order logic, advanced version) to walk through the answers of unfinished tasks using completed tasks. Only in very few cases is it necessary to extract and match the completed parts from the unfinished objects in the above process, which often involves optimizing a function f(A,B). However, most of the time, this process is futile.
hxypqr commented on Building a fully local LLM voice assistant to control my smart home   johnthenerd.com/blog/loca... · Posted by u/JohnTheNerd
wokwokwok · 2 years ago
Was I the only who got to the end and was like, “and then…?”

You installed it and customised your prompts and then… it worked? It didn’t work? You added the hugging face voice model?

I appreciate the prompt, but broadly speaking it feels like there’s a fair bit of vague hand waving here: did it actually work? It mixtral good enough to consistently respond in an intelligent manner?

My experience with this stuff has been mixed; broadly speaking, whisper is good and mixtral isn’t.

It’s basically quite shit compared to GPT4, no matter how careful your prompt engineering is, you simply can’t use tiny models to do big complicated tasks. Better than mistral, sure… but on average generating structured correct (no hallucination craziness) output is a sort of 1/10 kind of deal (for me).

…so, some unfiltered examples of the actual output would be really interesting to see here…

hxypqr · 2 years ago
mixtral 7*8B does indeed have this characteristic. It tends to disregard the requirement for structured output and often outputs unnecessary things in a very casual manner. However, I have found that models like qwen 72b or others have better controllability in this aspect, at least reaching the level of gpt 3.5.
hxypqr commented on Building a fully local LLM voice assistant to control my smart home   johnthenerd.com/blog/loca... · Posted by u/JohnTheNerd
sprobertson · 2 years ago
I've been working on something like this but it's of course harder than it sounds, mostly due to how few example use cases there are. A dumb false positive for yours might be "you tend to turn off the lights when the outside temperature is 50º"

Anyone know of a database of generic automations to train on?

hxypqr · 2 years ago
Temperature and light may create illusions in LLM. A potential available solution to this is to establish a knowledge graph based on sensor signals, where LLM is used to understand the speech signals given by humans and then interpret these signals as operations on the graph using similarity calculations.
hxypqr commented on Building a fully local LLM voice assistant to control my smart home   johnthenerd.com/blog/loca... · Posted by u/JohnTheNerd
weebull · 2 years ago
Machine learning can tackle this for sure, but that's surely separate to LLMs. A language model deals with language, not logic.
hxypqr · 2 years ago
This is a very insightful viewpoint. In this situation, I believe it is necessary to use NER to connect the LLM module and the ML module.
hxypqr commented on Combinatorial Problem    · Posted by u/PringleBoy
hxypqr · 2 years ago
No matter what n is, the answer to this problem on an n*n chessboard is basically cn^2. Due to the finitness of this problem, the optimal solution will repeat in a certain pattern in two directions. Additionally, c should be slightly smaller than 3/4, between 5/8 and 3/4.

u/hxypqr

KarmaCake day1November 8, 2023View Original