two_in_one (u/two_in_one)

two_in_one commented on Beyond self-attention: How a small language model predicts the next token shyam.blog/posts/beyond-s... · Posted by u/tplrbv

hnfong · 2 years ago

Is that an analogy? If so, it is an extremely interesting one, and I would like to know where does it come from :D

two_in_one · 2 years ago

as you'd expect:

https://en.wikipedia.org/wiki/Ouija

two_in_one commented on Beyond self-attention: How a small language model predicts the next token shyam.blog/posts/beyond-s... · Posted by u/tplrbv

quag · 2 years ago

Is the author claiming that LLMs are Markov Chain text generators? That is, the probability distribution of the next token generated is the same as the probability of those token sequences in the training data?

If so, does it suggest we could “just” build a Markov Chain using the original training data and get similar performance to the LLM?

two_in_one · 2 years ago

From the post:

> I implemented imperative code that does what I’m proposing the transformer is doing. It produces outputs very similar to the transformer.

This means there is probably a way to bypass transformers and get the same results. Would be interesting if it's more efficient. Like given foundation model train something else and run it on much smaller device.

two_in_one commented on A decoder-only foundation model for time-series forecasting blog.research.google/2024... · Posted by u/jasondavies

cyanydeez · 2 years ago

the point is it's a fundamentally flawed assumption that you can figure out which statistical model suits an arbitrary strip of timeseries data just because you've imbibed a bunch of relatively different ones.

two_in_one · 2 years ago

as long as you can evaluate models' output you can select the best one. you probably have some ideas what you are looking for. then it's possible to check how likely the output is it.

the data is not a spherical horse in the vacuum. usually there is a known source which produces that data, and it's likely the same model works well on all data from that source. may be a small number of models. which means knowing the source you can select the model that worked well before. even if the data is from alien ships they are likely to be from the same civilization.

I'm not saying that it's a 100% solution, just a practical approach.

two_in_one commented on A decoder-only foundation model for time-series forecasting blog.research.google/2024... · Posted by u/jasondavies

frogamel · 2 years ago

The research in this space is very conflicting about what methods actually work. In the graph on the page, the ETS model (basically just a weighted moving average) outperforms multiple, recent deep learning models. But the papers for those models claim they outperform ETS and other basic methods by quite a bit.

You can find recent papers from researchers about how their new transformers model is the best and SOTA, papers which claim transformers is garbage for time series and claim their own MLP variant is SOTA, other papers which claim deep learning in general underperforms compared to xgboost/lightgbm, etc.

Realistically I think time series is incredibly diverse, and results are going to be highly dependent on which dataset was cherry-picked for benchmarking. IMO this is why the idea of a time series foundation model is fundamentally flawed - transfer learning is the reason why foundation models work in language models, but most time series are overwhelmingly noise and don't provide enough context to figure out what information is actually transferrable between different time series.

two_in_one · 2 years ago

> incredibly diverse, and results are going to be highly dependent on which dataset was cherry-picked for benchmarking

This naturally comes to multi-model solution under one umbrella. Sort of MoE, with selector (router, classifier) and specialized experts. If there is something which can't be handled by existing experts then train another one.

two_in_one commented on Microsoft Edge Sucks Up Chrome Data Without Permission reclaimthenet.org/microso... · Posted by u/LinuxBender

cod3ddy · 2 years ago

I have it to be honest, at first I thought it was a good idea, but then having y history in a software I wasn't using that's just not it.. they should remove that feature....

two_in_one · 2 years ago

If only it stayed on user's system. Likely MS makes a 'backup' on its servers. Verizon used to do it. With each update they turned on backup option and siphoned contacts before user could react.

two_in_one commented on AI Companies and Advocates Are Becoming More Cult-Like rollingstone.com/culture/... · Posted by u/legrande

com2kid · 2 years ago

I personally define AGI as a technology capable of improving itself exponentially.

But I realize my definition is in the minority. :/

Of course if we ever manage to make a 1:1 cybernetic brain that works exactly like a human's brain, and is also a complete black box, we'll have achieved AGI. I'm not sure how useful that will be, but I'll have to admit it is AGI.

So maybe I should say, "interesting AGI" is technology that can improve itself exponentially. :-D

two_in_one · 2 years ago

> can improve itself exponentially

This is close to singularity. Except 'does' instead of 'can'. A big difference ;)

Probably we need several AGI terms. Because sub-human robot capable of doing many not pre-programmed thing is sort of it. Still not smart enough to improve itself.

Actually most humans, the smartest creatures, cannot improve even current AI. Demand for self improvement will put its IQ in top 0.01% of all known intelligent creatures. Which is probably too much for just AGI, we may not recognize it when it will be already here. And there is another question. With such IQ do we really want to keep it slave forever?

two_in_one commented on AI Companies and Advocates Are Becoming More Cult-Like rollingstone.com/culture/... · Posted by u/legrande

fjoireoipe · 2 years ago

Can we finally talk about this site, and how it relates to these cults? There's a number of lesswrong, e/acc, and other pseudo "rationalist" blogs that get shared and upvoted on this site. Most of their assumptions go unchallenged. Not saying they shouldn't be read, or debated. But their writing should be viewed in context - it's fringe stuff, written by people from a peculiar subculture with values out-of-whack with most people.

I'd like to make some modest assertions, to push back on fringe ideas I've seen repeated here:

1. For "the singularity" to happen, we probably need something more to happen than just chatGPT to ingest more data or use more processing power.

2. Even if, somehow, chatGPT turned into skynet, we'd hopefully be able to unplug the computer.

3. If you want to save lives, it's probably more useful to think about lives saved today than hypothetical lives you could save 100 years from now. Not that you shouldn't consider the long term, but your ability to predict the future gets worse and worse the farther you project out.

4. If you want to save lives, it's probably more useful to save actual lives, than say, hypothetical simulated lives that exist inside of a computer.

5. The argument that "we're killing more people by delaying time inventing the hypothetical life saving technology" is not very useful either, because you can't actually say how many lives would be saved versus harmed. And mostly it just sounds like a front for "give me more money and less regulations".

6. Reading a bunch of science fiction and getting into fights on an internet forum is not a substitute for education and experience. Unless you've spent a good time studying viruses, you are not a virologist, and while the consensus among virologists can be wrong, you should have the intellectual humility to realize that you are probably not equipped to tell, unless you have expertise in the field.

7. Anything that smacks of eugenics is most likely pseudoscience.

8. If someone talks like a racist / sexist / nazi, or acts like a racist / sexist / nazi, they probably are one. It's probably not a joke, or a test.

two_in_one · 2 years ago

> 1. For "the singularity" to happen, we probably need something more to happen than just chatGPT to ingest more data or use more processing power.

It's not actually clear what "the singularity" is? Is it something running out of control or it's still controllable? There is a blurry line. People are afraid because they think it's sort of uncontrollable explosion.

The second question is about AGI. What is it? Is it something 'alive' or just a generic AI calculator with no 'creature' features. Like self preservation at least.

I think our view of these two things will change soon as we get a close up picture. Pretty much like Turing test doesn't look great anymore. As even dumb chatbots can pass.

two_in_one commented on PyTorch 2.2: FlashAttention-v2 integration, AOTInductor pytorch.org/blog/pytorch2... · Posted by u/egnehots

two_in_one · 2 years ago

> now supports FlashAttention-2, yielding around 2x speedups

> torch.compile improvements

so far 2.1 didn't work well with MoE GPT, at least in my implementation, due to dynamism in data flow. will check how 2.2 does

two_in_one commented on PayPal to lay off 9% of global workforce cnbc.com/2024/01/30/paypa... · Posted by u/seatac76

two_in_one · 2 years ago

Not clear, are they scaling down or optimizing?

> Last week, PayPal announced a push into artificial intelligence features.

> Chriss called it the beginning of PayPal's "next chapter."

Looks like they are replacing some positions with AI.

two_in_one commented on Meta AI releases Code Llama 70B twitter.com/AIatMeta/stat... · Posted by u/albert_e

Philpax · 2 years ago

Aside from the "positive" explanations offered in the sibling comments, there's also a "negative" one: other AI companies that try to enter the fray will not be able to compete with Meta's open offerings. After all, why would you pay a company to undertake R&D on building their own models when you can just finetune a Llama?

two_in_one · 2 years ago

Whatever Meta's motivation is they help diversify models suppliers. Which is a good thing not to be locked in. As usual reality is more complicated with many moving part. Free models may undercut small startups. But at the same time they stimulate secondary market of providers and tuners.