kick_in_the_dor (u/kick_in_the_dor)

kick_in_the_dor commented on Is Sora the beginning of the end for OpenAI? calnewport.com/is-sora-th... · Posted by u/warrenm

bongodongobob · 5 months ago

Lol ok. We'll wait for your game changing technology, keep us posted.

kick_in_the_dor · 5 months ago

OP has a point. Are these type of embeddings the best way to model thought?

kick_in_the_dor commented on Databricks is raising a Series K Investment at >$100B valuation databricks.com/company/ne... · Posted by u/djhu9

vrm · 7 months ago

that is earnings (net income) not revenue (top line) so these are wildly different and incomparable numbers

kick_in_the_dor · 7 months ago

Got it - thanks for the correction.

kick_in_the_dor commented on Databricks is raising a Series K Investment at >$100B valuation databricks.com/company/ne... · Posted by u/djhu9

bix6 · 7 months ago

$2-3B in 2024 revenue based on estimates I can find. That’s a 33-50x revenue multiple lol.

Also announcing the signed term sheet but not the close so this is a PR push to find more investors?

kick_in_the_dor · 7 months ago

PE ratio of 40 isn't bad is this market actually. Mature companies like Google/Meta are hovering around 30.

kick_in_the_dor commented on Training Language Models to Self-Correct via Reinforcement Learning arxiv.org/abs/2409.12917... · Posted by u/weirdcat

fpgaminer · a year ago

I found the paper a tad difficult to understand because it spends a lot of time circling around the main thesis instead of directly describing. So, to the best of my understanding:

We want to improve LLM's abilities to give correct answers to hard problems. One theory is that we can do that by training a "Self Correcting" behavior into the models where they can take as input a wrong answer and improve it to a better/correct answer.

This has been explored previously, trying to train this behavior using various Reinforcement techniques where the reward is based on how good the "corrected" answer is. So far it hasn't worked well, and the trained behavior doesn't generalize well.

The thesis of the paper is that this is because when the model is presented with a training example of `Answer 1, Reasoning, Corrected Answer`, and a signal of "Make Corrected Answer Better" it actually has _two_ perfectly viable ways to do that. One is to improve `Reasoning, Corrected Answer`, which would yield a higher reward and is what we want. The other, just as valid solution, is to simply improve `Answer 1` and have `Corrected Answer` = `Answer 1`.

The latter is what existing research has shown happens, and why so far attempts to train the desired behavior has failed. The models just try to improve their answers, not their correcting behaviors. This paper's solution is to change the training regimen slightly to encourage the model to use the former approach. And thus, hopefully, get the model to actually train the desired behavior of correcting previous answers.

This is done by doing two stages of training. In the first stage, the model is forced (by KL divergence loss) to keep its first answers the same, while being rewarded for improving the second answer. This helps keep the model's distribution of initial answers the same, avoiding the issue later where the model doesn't see as many "wrong" answers because wrong answers were trained out of the model. But it helps initialize the "self correcting" behavior into the model.

In the second stage the model is free to change the first answer, but they tweak the reward function to give higher rewards for "flips" (where answer 1 was bad, but answer 2 was good). So in this second stage it can use both strategies, improving its first answer or improving its self correcting, but it gets more rewards for the latter behavior. This seems to be a kind of refinement on the model, to improve things overall, while still keeping the self correcting behavior intact.

Anyway, blah blah blah, metrics showing the technique working better and generalizing better.

Seems reasonable to me. I'd be a bit worried about, in Stage 2, the model learning to write _worse_ answers for Answer 1 so it can maximize the reward for flipping answers. So you'd need some kind of balancing to ensure Answer 1 doesn't get worse. Not sure if that's in their reward function or not, or if its even a valid concern in practice.

kick_in_the_dor · a year ago

Can you explain what you mean by: "The other, just as valid solution, is to simply improve `Answer 1` and have `Corrected Answer` = `Answer 1`."

Isn't improving "Answer 1" the whole point?

Your write-up makes it sound like "Answer 1" an input but an output from the LLM?

kick_in_the_dor commented on Google removed 'number of results' default setting for some users support.google.com/websea... · Posted by u/nomilk

renegat0x0 · 2 years ago

Welcome to the Google great bubble. Internet is full of web pages. There are billions of them, yet Google shows you 10 pages of 10 links each.

kick_in_the_dor · 2 years ago

Yep, you just described a search engine!

kick_in_the_dor commented on AlphaProteo generates novel proteins for biology and health research deepmind.google/discover/... · Posted by u/meetpateltech

ramraj07 · 2 years ago

That channel is just hype machine. Like half the stuff he says is barely comprehensible if you know the actual science.

kick_in_the_dor · 2 years ago

When he started the video with "I had the honor of having an exclusive look at it" I knew it was all marketing.

kick_in_the_dor commented on Pharma firms stash profits in Europe's tax havens investigate-europe.eu/pos... · Posted by u/obscurette

ok_dad · 2 years ago

I’m an American citizen, and no matter where I go, I owe the IRS taxes on what I make, even if the USA isn’t involved otherwise.

Question: why can’t we do the same for corporations in the USA? The equivalent action might be to only allow a company to expense money spent inside the USA and thus they can’t just license themselves all their own technology and patents which are held in a one person office in Ireland.

I’m sure I’m simplifying things too much, but I’m tired of the two tiered tax system where regular people pay for everything and corps reap the rewards.

kick_in_the_dor · 2 years ago

Counterargument: At the end of the day, the ones reaping corporate profits are themselves people, who should be paying their income taxes, no?

kick_in_the_dor commented on Waymo One is now open to everyone in San Francisco waymo.com/blog/2024/06/wa... · Posted by u/ra7

hackernoteng · 2 years ago

I dont get why anyone cheers shit like this. It just takes jobs away from more people. You will just get more homeless and drug addicted people. And more rich techies. And it will make no difference to the rider - they get where they need to be either way. It's just another bit wealth transfer from poor to rich. Yet this is cheered.

kick_in_the_dor · 2 years ago

I love coming to HackerNews to see exactly this type of salt.

"Guys it's the future! Self-driving cars actually exist!"

"Fuck you, here's why this is awful"

kick_in_the_dor commented on Detecting hallucinations in large language models using semantic entropy nature.com/articles/s4158... · Posted by u/Tomte

program_whiz · 2 years ago

Everyone in the comments seems to be arguing over the semantics of the words and anthropomorphization of LLMs. Putting that aside, there is a real problem with this approach that lies at the mathematical level.

For any given input text, there is a corresponding output text distribution (e.g. the probabilities of all words in a sequence which the model draws samples from).

The approach of drawing several samples and evaluating the entropy and/or disagreement between those draws is that it relies on already knowing the properties of the output distribution. It may be legitimate that one distribution is much more uniformly random than another, which has high certainty. Its not clear to me that they have demonstrated the underlying assumption.

Take for example celebrity info, "What is Tom Cruise known for?". The phrases "movie star", "katie holmes", "topgun", and "scientology" are all quite different in terms of their location in the word vector space, and would result in low semantic similarity, but are all accurate outputs.

On the other hand, "What is Taylor Swift known for?" the answers "standup comedy", "comedian", and "comedy actress" are semantically similar but represent hallucinations. Without knowing the distribution characteristics (e.g multivariate moments and estimates) we couldn't say for certain these are correct merely by their proximity in vector space.

As some have pointed out in this thread, knowing the correct distribution of word sequences for a given input sequence is the very job the LLM is solving, so there is no way of evaluating the output distribution to determine its correctness.

There are actual statistical models to evaluate the amount of uncertainty in output from ANNs (albeit a bit limited), but they are probably not feasible at the scale of LLMs. Perhaps a layer or two could be used to create a partial estimate of uncertainty (e.g. final 2 layers), but this would be a severe truncation of overall network uncertainty.

Another reason I mention this is most hallucinations I encounter are very plausible and often close to the right thing (swapping a variable name, confabulating a config key), which appear very convincing and "in sample", but are actually incorrect.

kick_in_the_dor · 2 years ago

I think you make a good point, but my guess is that e.g. your Taylor Swift example, a well-grounded model would have a low likelihood of outputting multiple consecutive answers about her being a comedian, which isn't grounded in the training data.

For your Tom Cruise example, since all those phrases are true and grounded in the training data, the technique may fire off a false positive "hallucination decision".

However, the example they give in the paper seems to be for "single-answer" questions, e.g., "What is the receptor that this very specific medication acts on?", or "Where is the Eiffel Tower located?", in which case I think this approach could be helpful. So perhaps this technique is best-suited for those single-answer applications.

kick_in_the_dor commented on Researchers cracked an 11-year-old password to a $3M crypto wallet wired.com/story/roboform-... · Posted by u/ColinWright

kick_in_the_dor · 2 years ago

"Michael... now has 30 BTC, now worth $3 million, and is waiting for the value to rise to $100,000 per coin."

What the ? You presumably go from not a millionaire to having $3,000,000, and you decide to risk it to triple it? That's some next level greed right there.