Readit News logoReadit News
integralof6y commented on Understanding R1-Zero-Like Training: A Critical Perspective   github.com/sail-sg/unders... · Posted by u/pama
andai · 6 months ago
I heard that even just getting the model to print a bunch of whitespace ("think for longer") improves the quality of the final response, because some kind of processing is still happening internally?
integralof6y · 6 months ago
printing a bunch of whitespace is a way of entering into a new state ( I am thinking about a state machine), so the LLM can use that whitespace as a new token that can be used later to refine the state of the system. In math terms, whitespace is a tag for a class (or state) in the LLM. I think that perhaps RL can take advantage of such tags. For example whitespace could indicate a point of low gradient (indetermination) or a branching point, the LLM in some way would learn to enhance the learning rate parameter, so the message in the head of the LLM is: be ready to learn from RL because in your actual state you need to take a branch from a branching point that can enhance your capabilities. This is similar to tossing a coin or a die. The rule could be: when whitespace do increase the learning rate parameter to escape from zero gradient points. Caveat emptor: This is just an speculation, I don't have any data to support this hypothesis. Also this suggests that whitespace could be a "token that reflects the state of previous layers" and is not contained in the vocabulary used to train the model, so I should say that whitespace is a macro-token or neurotoken. If this hypothesis has some ground then it could also be plausible that whitespace could be an enumerate neural tag in the sense that the length of whitespace reflects or is related to the layer in which the zero gradient or branching point occurs. Finally, my throwaway user need whitespace so I will change the password to a random one to force myself to avoid adding new ideas.

Deleted Comment

integralof6y commented on Scallop – A Language for Neurosymbolic Programming   scallop-lang.org/... · Posted by u/andsoitis
versteegen · 6 months ago
Unfortunately it doesn't seem to be available yet. Scallop and Lobster are both from UPenn, and the Scallop website says "We are still in the process of open sourcing Scallop," so I assume it's a matter of time.
integralof6y · 6 months ago
The scallop source code is in github, https://github.com/scallop-lang/
integralof6y commented on FOSS infrastructure is under attack by AI companies   thelibre.news/foss-infras... · Posted by u/todsacerdoti
dbingham · 6 months ago
That value is only great if it's shared equitably with the rest of the planet.

If it's owned by a few, as it is right now, it's an existential threat to the life, liberty, and pursuit of a happiness of everyone else on the planet.

We should be seriously considering what we're going to do in response to that threat if something doesn't change soon.

integralof6y · 6 months ago
> That value is only great if it's shared equitably with the rest of the planet.

I think this should be an axiom which should be respected by any copyright rule.

integralof6y commented on Artificial Intelligence: Foundations of Computational Agents   artint.info/index.html... · Posted by u/rramadass
EGreg · 6 months ago
I am really not sure where agents would ever be better than workflows. Can you give me some examples?

Workflows means some organization signed off on what has to be done. Checklists, best practices, etc.

Agents on the other hand have a goal and you have no idea or what they’re going to do to achieve it. I think of an agent’s guardrails as essentially a “blacklist” of actions, while a workflow is a “whitelist”.

To me, agents are a gimmick the same way that real-time chat, or video, is a gimmick. It is good for entertainment but actually has negative value for getting actual work done.

Think of it this way… just as models have a tradeoff between explore and exploit, the agents can be considered as capable of exploration while the workflows exploit best practices. Over time and many tasks, everything is standardized into best practices, so the agents become worse than completely standardized workflows. They may be useful to tinker at the edges but not to make huge decisions. Like maybe agents can be used to set up some personalized hooks for users at the edges of some complex system.

https://medium.com/@falkgottlob/many-ai-agents-are-actually-...

integralof6y · 6 months ago
Interesting. I understand that you draw the line that separate workflow from agents as the exploitation exploration trade-off. This could allow a dynamic environment in which a parameter depending of each task control the workflow-agent planning. So there is not a clear cut off, the difference depends of the task, the priors, and the posterior experience.
integralof6y commented on Command A: Max performance, minimal compute – 256k context window   cohere.com/blog/command-a... · Posted by u/lastdong
integralof6y · 6 months ago
I just tried the chat and asked the LLM to compute the double integral of 6*y on the interior of a triangle given the vertices. There were many trials all incorrect, then I asked to compute a python program to solve this, again incorrect. I know math computation is a weak point for LLM specially on a chat. In one of the programs it used a hardcoded number 10 to branch, this suggests that the program generated was fitted to give a good result for the test (I did give it the correct result before). This suggests that you should be careful when testing the generated programs, they could be fitted to pass your simple tests. Edited: Also I tried to compute the integral of 6y on the triangle with vertices A(8, 8), B(15, 29), C(10, 12) and it yield a wrong result of 2341, then I suggested computing that using the formula for the barycenter of the triangle, that is, 6Area*(Mean of y-coordinates) and it returned the correct value of 686.

To summarize: It seems that LLM are not able to give correct result for simple math problems (here a double integral on a triangle). So students should not rely on them since nowaday they are not able to perform simple task without many errors.

u/integralof6y

KarmaCake day12March 16, 2025View Original