Readit News logoReadit News
spott commented on Crimes with Python's Pattern Matching (2022)   hillelwayne.com/post/pyth... · Posted by u/agluszak
mickeyp · 3 days ago
Yes but they are not equivalent. dict and list are factories; {} and [] are reified when the code is touched and then never reinitialised again. This catches out beginners and LLMs alike:

https://www.inspiredpython.com/article/watch-out-for-mutable...

spott · 3 days ago
They are equivalent. In function signatures (what your article is talking about), using dict() instead of {} will have the same effect. The only difference is that {} is a literal of an empty dict, and dict is a name bound to the builtin dict class. So you can reassign dict, but not {}, and if you use dict() instead of {}, then you have a name lookup before a call, so {} is a little more efficient.
spott commented on Ashet Home Computer   ashet.computer/... · Posted by u/todsacerdoti
lysace · 13 days ago
The Parallax P8X32A Propeller (2006) did multi-core processing in a very beginner friendly way.

It can be done - if you take a holistic approach to hardware + runtime + development environment.

The Propeller probably failed because of the custom language, the custom assembly syntax, the custom ISA, the custom IDE font (!) etc. It was a very neat system though.

spott · 12 days ago
The propeller 2 is going to be used as the south bridge for the Ashet.
spott commented on Why are there so many rationalist cults?   asteriskmag.com/issues/11... · Posted by u/glenstein
naasking · 13 days ago
> We're normally data-limited.

This is a common sentiment but is probably not entirely true. A great example is cosmology. Yes, more data would make some work easier, but astrophysicists and cosmologists have shown that you can gather and combine existing data and look at it in novel new ways to produce unexpected results, like place bounds that can include/exclude various theories.

I think a philosophy that encourages more analysis rather than sitting back on our laurels with an excuse that we need more data is good, as long as it's done transparently and honestly.

spott · 13 days ago
This depends on what you are trying to figure out.

If you are talking about cosmology? Yea, you can look at existing data in new ways, cause you probably have enough data to do that safely.

If you are looking at human psychology? Looking at existing data in new ways is essentially p-hacking. And you probably won’t ever have enough data to define a “universal theory of the human mind”.

spott commented on Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?    · Posted by u/superasn
KaiserPro · 16 days ago
> Quite the opposite.

Unless something has dramatically changed, the model is stateless. The context cache needs to be injected before the new prompt, but for what I understand (and please do correct me if I'm wrong) the the context cache isn't that big, like in the order of a few tens of kilobytes. Plus the cache saves seconds of GPU time, so having an extra 100ms of latency is nothing compare to a cache miss. so a broad cache is much much better than a narrow local cache.

But! even if its larger, Your bottleneck isn't the network, its waiting on the GPUs to be free[1]. So whilst having the cache really close ie in the same rack, or same machine, will give the best performance, it will limit your scale (because the cache is only effective for a small number of users)

[1] a 100megs of data shared over the same datacentre network every 2-3 seconds per node isn't that much, especially if you have a partitioned network (ie like AWS where you have a block network and a "network" network)

spott · 14 days ago
KV cache for dense models is order 50% of parameters. For sparse moe models it can be significantly smaller I believe, but I don’t think it is measured in kb.
spott commented on Does the stock market know something we don't?   theatlantic.com/economy/a... · Posted by u/littlexsparkee
tradertef · 17 days ago
That's the "final theory" given in the article..
spott · 17 days ago
Yea, I figured that out after I finished the article.

The dangers of posting before I finish the article.

spott commented on Does the stock market know something we don't?   theatlantic.com/economy/a... · Posted by u/littlexsparkee
spott · 18 days ago
Personal theory: the rise of ETFs as a fallback means that money isn’t leaving the stock market anymore. Instead of people selling off and going to cash, they go to SPY, which doesn’t have the downward pressure on the stock market that going to cash does.
spott commented on Open models by OpenAI   openai.com/open-models/... · Posted by u/lackoftactics
shpongled · 20 days ago
Do you know when this was introduced (or which paper)? AFAIK it's not that way in the original transformer paper, or BERT/GPT-2
spott · 20 days ago
All the Llamas have done it (well, 2 and 3, and I believe 1, I don't know about 4). I think they have a citation for it, though it might just be the RoPE paper (https://arxiv.org/abs/2104.09864).

I'm not actually aware of any model that doesn't do positional embeddings on a per-layer basis (excepting BERT and the original transformer paper, and I haven't read the GPT2 paper in a while, so I'm not sure about that one either).

spott commented on Open models by OpenAI   openai.com/open-models/... · Posted by u/lackoftactics
sadiq · 20 days ago
Looks like Groq (at 1k+ tokens/second) and Fireworks are already live on openrouter: https://openrouter.ai/openai/gpt-oss-120b

$0.15M in / $0.6-0.75M out

edit: Now Cerebras too at 3,815 tps for $0.25M / $0.69M out.

spott · 20 days ago
It is interesting that openai isn't offering any inference for these models.
spott commented on Open models by OpenAI   openai.com/open-models/... · Posted by u/lackoftactics
artembugara · 20 days ago
Disclamer: probably dumb questions

so, the 20b model.

Can someone explain to me what I would need to do in terms of resources (GPU, I assume) if I want to run 20 concurrent processes, assuming I need 1k tokens/second throughput (on each, so 20 x 1k)

Also, is this model better/comparable for information extraction compared to gpt-4.1-nano, and would it be cheaper to host myself 20b?

spott · 20 days ago
Groq is offering 1k tokens per second for the 20B model.

You are unlikely to match groq on off the shelf hardware as far as I'm aware.

spott commented on I wasted weeks hand optimizing assembly because I benchmarked on random data   vidarholen.net/contents/b... · Posted by u/thunderbong
jasonthorsness · a month ago
Identifying a representative usage scenario to optimize towards and then implementing that scenario in a microbenchmark test driver are both massively difficult to get right, and a "failure" in this regard, as the author found, can be hard to detect before you sink a lot of time into it.

Even for seemingly trivial scenarios like searching an array, the contents and length of the array make a massive difference in results and how to optimize (as shown in the last section of this write-up where I tried to benchmark search algorithms correctly https://www.jasonthorsness.com/23).

I've not seen a perfect solution to this that isn't just "thinking carefully about the test setup" (profile-guided optimization/production profiles replayed for benchmarks seem like maybe it could be an alternative, but I haven't seen that used much).

spott · a month ago
Why is just capturing real values not the answer here?

Something like grab .01% of the inputs to that function for a day or something like that (maybe a lower fraction over a week or month).

Is the cost of grabbing this that high?

u/spott

KarmaCake day1175April 2, 2014View Original