montebicyclelo (u/montebicyclelo)

montebicyclelo commented on Left to Right Programming graic.net/p/left-to-right... · Posted by u/graic

> the Python code in the previous example is still readable

Yes, I agree with the author, list comprehensions are readible, and I'd add, practical.

> it gets worse as the complexity of the logic increases

    len(list(filter(lambda line: all([abs(x) >= 1 and abs(x) <= 3 for x in line]) and (all([x > 0 for x in line]) or all([x < 0 for x in line])), diffs)))

Ok, well this is something that someone would be unlikely to write... unless they wanted to make a contrived example to prove a point.

It would be written more like:

    result = sum(my_contrived_condition(x) for line in diffs)

See also the Google Python style guide, which says not to do the kind of thing in the contrived example above: https://google.github.io/styleguide/pyguide.html

(Surely in any language it's possible to write very bad confusing code, using some feature of the language...)

And note:

    x = [line.split() for line in text.splitlines()]

^- list comprehension is just a convenient shorthand for a `for loop`, i.e.:

    x = []
    for line in text.splitlines():
        x.append(line.split())

Just moving the `line.split()` to the front and removing the empty list creation and append.

montebicyclelo commented on LLM Embeddings Explained: A Visual and Intuitive Guide huggingface.co/spaces/hes... · Posted by u/eric-burel

xg15 · a month ago

> While we can use pretrained models such as Word2Vec to generate embeddings for machine learning models, LLMs commonly produce their own embeddings that are part of the input layer and are updated during training.

So out of interest: During inference, the embedding is simply a lookup table "token ID -> embedding vector". Mathematically, you could represent this as encoding the token ID as a (very very long) one-hot vector, then passing that through a linear layer to get the embedding vector. The linear layer would contain exactly the information from the lookup table.

My question: Is this also how the embeddings are trained? I.e. just treat them as a linear layer and include them in the normal backpropagation of the model?

montebicyclelo · a month ago

So, they are included in the normal backpropagation of the model. But there is no one-hot encoding, because, although you are correct that it is equivalent, it would be very inefficient to do it that way. You can make indexing differentiable, i.e. gradient descent flows back to the vectors that were selected, which is more efficient than a one-hot matmul.

(If you're curious about the details, there's an example of making indexing differentiable in my minimal deep learning library here: https://github.com/sradc/SmallPebble/blob/2cd915c4ba72bf2d92...)

montebicyclelo commented on LLM Embeddings Explained: A Visual and Intuitive Guide huggingface.co/spaces/hes... · Posted by u/eric-burel

montebicyclelo · a month ago

Nice tutorial — the contextual vs static embeddings is the important point; many are familiar with word2vec (static), but contextual embeddings are more powerful for many tasks.

(However, there seems to be some serious back-button / browser history hijacking on this page.. Just scolling down the page appends a ton to my browser history, which is lame.)

montebicyclelo commented on “Dynamic programming” is not referring to “computer programming” vidarholen.net/contents/b... · Posted by u/r4um

montebicyclelo · a month ago

I always found that badly named things did make learning harder / more jarring; especially if an explanation for the incongruous name wasn't provided.

montebicyclelo commented on I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch github.com/yousef-rafat/m... · Posted by u/yousef_g

jatins · 2 months ago

> It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.

Could you clarify what you mean by this part -- if the weights are taken from HF then what's the implementation for?

montebicyclelo · 2 months ago

> if the weights are taken from HF then what's the implementation for

The weights are essentially a bunch of floating point numbers, (grouped into tensors). The code says what operations to do with the weights. E.g. say you load matrix W from the weights, you could do `y = W @ x`, or `y = W.T @ x`, or `y = W @ W @ x` etc.

montebicyclelo commented on Google Cloud Incident Report – 2025-06-13 status.cloud.google.com/i... · Posted by u/denysvitali

montebicyclelo · 2 months ago

TLDR, unexpected blank fields

> policy change was inserted into the regional Spanner tables

> This policy data contained unintended blank fields

> Service Control... pulled in blank fields... hit null pointer causing the binaries to go into a crash loop

montebicyclelo commented on Frequent reauth doesn't make you more secure tailscale.com/blog/freque... · Posted by u/ingve

montebicyclelo · 2 months ago

Forced password rotation and expiry seems the bigger problem; given that it causes people to get locked out so often, (e.g. if pw expires when on holiday), — often then requiring travelling to IT, or at least a few hours trying to get IT on the phone to reset, or chasing up colleagues who aren't locked out to get in touch with IT.

Many (most?) companies still do it, despite it now not being recommended by NIST:

> Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically)

https://pages.nist.gov/800-63-3/sp800-63b.html

Or by Microsoft

> Password expiration requirements do more harm than good...

https://learn.microsoft.com/en-us/microsoft-365/admin/misc/p...

But these don't seem to be authoritative enough for IT / security, (and there are still guidelines out there that do recommend the practice IIRC).

montebicyclelo commented on Cloud Run GPUs, now GA, makes running AI workloads easier for everyone cloud.google.com/blog/pro... · Posted by u/mariuz

mwest217 · 3 months ago

Disclaimer: I work at Google but not on cloud. Opinions my own.

I think the reason this doesn’t get prioritized is that large customers don’t actually want a “stop serving if I pass this limit” amount. If there’s a spike in traffic, they probably would rather pay the money to serve it. The customers that would want this feature are small-dollar customers, and from an economic perspective it makes less sense to prioritize this feature, since they’re not spending very much relative to customers who wouldn’t want this feature.

Maybe if there weren’t more feature requests to get prioritized this might happen, but the reality is that there are always more feature requests than time to implement them, and a feature request used almost exclusively by the smallest dollar customers will always lose to a feature for big-dollar customers.

montebicyclelo · 3 months ago

I guess where it could potentially bring value is by:

Removing a major concern that prevents individuals / small customers from using GCP in the first place; so more of them do use it

That could then lead to value in two ways:

- They make small projects that go on to be large projects later, (e.g. a small app that grows / becomes successful, becomes a moneymaker)

- Or, they might then be more inclined to get their big corp to use GCP later on, if they've already been using it as an individual

But that's long term, and hard to measure / put a number on

montebicyclelo commented on Cloud Run GPUs, now GA, makes running AI workloads easier for everyone cloud.google.com/blog/pro... · Posted by u/mariuz

isoprophlex · 3 months ago

All the cruft of a big cloud provider, AND the joy of uncapped yolo billing that has the potential to drain your creditcard overnight. No thanks, I'll personally stick with Modal and vast.ai

montebicyclelo · 3 months ago

Not providing a cap on spending is a major flaw of GCP for individuals / small projects.

With Cloud Run, AFAIK, spending can effectively be capped by: limiting concurrency, plus limiting the max number of instances it can scale to. (But this is not as good as GCP having a proper cap.)

montebicyclelo commented on Cloud Run GPUs, now GA, makes running AI workloads easier for everyone cloud.google.com/blog/pro... · Posted by u/mariuz

montebicyclelo · 3 months ago

Reason Cloud Run is so nice compared to other providers is that it has autoscaling, with scaling to 0. Meaning it can cost basically 0 if it's not being used. Also can set a cap on the scaling, e.g. 5 instances max, which caps the max cost of the service too. - Note, I only have experience with the CPU version of Cloud Run, (which is very reliable / easy).