Readit News logoReadit News
nodja commented on T5Gemma 2: The next generation of encoder-decoder models   blog.google/technology/de... · Posted by u/milomg
davedx · 12 hours ago
What is an encoder-decoder model, is it some kind of LLM, or a subcomponent of an LLM?
nodja · 12 hours ago
It's an alternate architecture of LLMs, they actually predate modern LLMs. An encoder-decoder model was actually the model used in the "Attention if all you need" paper that introduced the transformer and essentially gave birth to modern LLMs.

A encoder-decoder model splits input and output. This makes sense for translation tasks, summarization, etc. They're good when there's a clear separation of "understand the task" and "complete the task", but you can use it for anything really. A example would be send "Translate to english: Le chat est noir." to the encoder, the encoder processes everything in a single step, that is understand the task as a whole, then the output of the encoder is fed to the decoder and then the decoder runs one token at a time.

GPT ditches the encoder altogether and just runs the decoder with some slight changes, this makes it more parameter efficient but tends to hallucinate more due to past tokens containing information that might be wrong. You can see it as the encoder running on each token as they are read/generated.

Edit: On re-read I noticed it might not be clear what I mean by past tokens containing wrong information. I mean that for each token the model generates a hidden state, those states don't change, so for example an input of 100 tokens will have 100 hidden states, the states are generated at once on the encoder model, and one token at a time on the decoder models. Since the decoder doesn't have the full information yet, the hidden state will contain extra information that might not having anything to do with the task, or even confuse it.

For example if you give the model the task "Please translate this to chinese: Thanks for the cat, he's cute. I'm trying to send it to my friend in hong kong.". For a enc-dec model it would read the whole thing at once and understand that you mean cantonese. But a decoder only model would "read" it one token a time it could trip in several places, 1. assume chinese means mandarin chinese not cantonese, 2. assume that the text after "cute." it's something to also translate and not a clarification. This would have several token worth of extra information that would confuse the model. Models are trained with this in mind so they're used to tokens having lots of different meanings embeded in them, then having later tokens narrow down the meanings, but it might cause models to ignore certain tokens, or hallucinate.

nodja commented on Average DRAM price in USD over last 18 months   pcpartpicker.com/trends/p... · Posted by u/zekrioca
somenameforme · 15 days ago
I don't understand this. I'm looking at prices on some Asian store fronts and it's nowhere even remotely near these. I'm looking at DDR5-6000 2x16 for about $130, with no apparent limits.

Even with tariffs, transport, and other fees, you could get this to the US for way less than $400. I doubt the market could be this inefficient - in other words I don't think I just found a get rich quick scheme. So, what gives?

nodja · 15 days ago
The current prices are a response to future stock, not current. It's 100% retailers price gouging with their current stock that they got for cheap because they know there will be limited stock in the near future. Asian retailers may be more honest and keep their margins the same, but will catch up in a month or two.
nodja commented on OpenAI declares 'code red' as Google catches up in AI race   theverge.com/news/836212/... · Posted by u/goplayoutside
mrweasel · 16 days ago
If pre-training is just training, then how on earth can OpenAI not have "a successful pre-training run"? The word successful indicates that they tried, but failed.

It might be me misunderstanding how this works, but I assumed that the training phase was fairly reproducible. You might get different results on each run, do to changes in the input, but not massively so. If OpenAI can't continuously and reliably train new models, then they are even more overvalued that I previously assumed.

nodja · 16 days ago
Because success for them doesn't mean it works, it means it works much better than what they currently have. If a 1% improvement comes at the cost of spending 10x more on training and 2x more on inference then you're failing at runs. (numbers out of ass)
nodja commented on OpenAI declares 'code red' as Google catches up in AI race   theverge.com/news/836212/... · Posted by u/goplayoutside
MikeTheGreat · 16 days ago
(My apologies if this was already asked - this thread is huge and Find-In-Page-ing for variations of "pre-train", "pretrain", and "train" turned up nothing about this. If this was already asked I'd super-appreciate a pointer to the discussion :) )

Genuine question: How is it possible for OpenAI to NOT successfully pre-train a model?

I understand it's very difficult, but they've already successfully done this and they have a ton of incredibly skilled and knowledgeable, well-paid and highly knowledgeable employees.

I get that there's some randomness involved but it seems like they should be able to (at a minimum) just re-run the pre-training from 2024, yes?

Maybe the process is more ad-hoc (and less reproducible?) than I'm assuming? Is the newer data causing problems for the process that worked in 2024?

Any thoughts or ideas are appreciated, and apologies again if this was asked already!

nodja · 16 days ago
> Genuine question: How is it possible for OpenAI to NOT successfully pre-train a model?

The same way everyone else fails at it.

Change some hyper parameters to match the new hardware (more params), maybe implement the latest improvements in papers after it was validated in a smaller model run. Start training the big boy, loss looks good, 2 months and millions of dollars later loss plateaus, do the whole SFT/RL shebang, run benchmarks.

It's not much better than the previous model, very tiny improvements, oops.

nodja commented on OpenAI declares 'code red' as Google catches up in AI race   theverge.com/news/836212/... · Posted by u/goplayoutside
mr_00ff00 · 16 days ago
What is a pre-training run?
nodja · 16 days ago
Pre-training is just training, it got the name because most models have a post-training stage so to differentiate people call it pre-training.

Pre-training: You train on a vast amount of data, as varied and high quality as possible, this will determine the distribution the model can operate with, so LLMs are usually trained on a curated dataset of the whole internet, the output of the pre-training is usually called the base model.

Post-training: You narrow down the task by training on the specific model needs you want. You can do this through several ways:

- Supervised Finetuning (SFT): Training on a strict high quality dataset of the task you want. For example if you wanted a summarization model, you'd finetune the model on high quality text->summary pairs and the model would be able to summarize much better than the base model.

- Reinforcement Learning (RL): You train a separate model that ranks outputs, then use it to rate the output of the model, then use that data to train the model.

- Direct Preference Optimizaton (DPO): You have pairs of good/bad generations and use them to align the model towards/away the kinds of responses you want.

Post-training is what makes the models able to be easily used, the most common is instruction tuning that teaches to model to talk in turns, but post-training can be used for anything. E.g. if you want a translation model that always translates a certain way, or a model that knows how to use tools, etc. you'd achieve all that through post-training. Post-training is where most of the secret sauce in current models is nowadays.

nodja commented on Meta Segment Anything Model 3   ai.meta.com/sam3/... · Posted by u/lukeinator42
xfeeefeee · a month ago
I can't wait until it is easy to rotoscope / greenscreen / mask this stuff out accessibly for videos. I had tried Runway ML but it was... lacking, and the webui for fixing parts of it had similar issues.

I'm curious how this works for hair and transparent/translucent things. Probably not the best, but does not seem to be mentioned anywhere? Presumably it's just a straight line or vector rather than alpha etc?

nodja · a month ago
I'm pretty sure davinci resolve does this already, you can even track it, idk if it's available in the free version.
nodja commented on Fizz Buzz without conditionals or booleans   evanhahn.com/fizz-buzz-wi... · Posted by u/ingve
kiratp · a month ago
A loop either never halts or has a conditional. I guess a compiler could elide a “while True:” to a branch-less jump instruction.

One hack would be to use recursion and let stack exhaustion stop you.

nodja · a month ago
Count down i from 100 to 0 and do 1/i at the end of the loop :)
nodja commented on Windhawk Windows classic theme mod for Windows 11   windhawk.net/mods/classic... · Posted by u/znpy
rikafurude21 · a month ago
I've come across Windhawk before but the mods being just C++ programs seemed a little suspicious to me, how do you make sure the mods dont include malware?
nodja · a month ago
Windhawk mods are distributed as source code and WH itself compiles it. It works the same way usescripts work with tampermonkey/violentmonkey on browsers.

If a mod includes malware it'll be very obvious as mods are usually small.

nodja commented on Valve is about to win the console generation   xeiaso.net/blog/2025/valv... · Posted by u/moonleay
NelsonMinar · a month ago
As the article says, "The only way that they could mess this up is with the pricing. ... I'd expect the pricing to be super aggressive." The price to beat is the $400-$500 price point of PS5 and XBox. I'm guessing Valve is going to have a very hard time matching that. We'll know soon enough.
nodja · a month ago
All they have to do is market the fact you don't have to pay for online.

PS5 + 3 years of PS Plus = $740

Steam Machine = $700

Add/remove more years of PS Plus if the SM turns out to be more/less expensive.

If you add the fact that games on PC are usually cheaper and have sales more often then it's a no brainer, but that won't convince the FIFA and COD players.

nodja commented on Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model   moonshotai.github.io/Kimi... · Posted by u/nekofneko
NiloCK · a month ago
Maybe a dumb question but: what is a "reasoning model"?

I think I get that "reasoning" in this context refers to dynamically budgeting scratchpad tokens that aren't intended as the main response body. But can't any model do that, and it's just part of the system prompt, or more generally, the conversation scaffold that is being written to.

Or does a "reasoning model" specifically refer to models whose "post training" / "fine tuning" / "rlhf" laps have been run against those sorts of prompts rather than simpler user-assistant-user-assistant back and forths?

EG, a base model becomes "a reasoning model" after so much experience in the reasoning mines.

nodja · a month ago
Any model that does thinking inside <think></think> style tokens before it answers.

This can be done with finetuning/RL using an existing pre-formatted dataset, or format based RL where the model is rewarded for both answering correct and using the right format.

u/nodja

KarmaCake day1174March 14, 2016View Original