oergiR (u/oergiR) - Readit News

oergiR commented on The Case That A.I. Is Thinking newyorker.com/magazine/20... · Posted by u/ascertain

Symmetry · 2 months ago

"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra

oergiR · 2 months ago

There is more to this quote than you might think.

Grammatically, in English the verb "swim" requires an "animate subject", i.e. a living being, like a human or an animal. So the question of whether a submarine can swim is about grammar. In Russian (IIRC), submarines can swim just fine, because the verb does not have this animacy requirement. Crucially, the question is not about whether or how a submarine propels itself.

Likewise, in English at least, the verb "think" requires an animate object. the question whether a machine can think is about whether you consider it to be alive. Again, whether or how the machine generates its output is not material to the question.

oergiR commented on Memory safety for web fonts developer.chrome.com/blog... · Posted by u/mmmrk

jasonthorsness · 9 months ago

If fresh code in Rust truly reduces the number or severity of CVE in a massively tested and fuzzed C library like this one it will be a major blow to the “carefully written and tested C/C++ is just as safe” perspective. Just need the resources and Rust performance to rewrite them all.

oergiR · 9 months ago

FreeType was written when fonts were local, trusted, resources, and it was written in low-level C to be fast. The TrueType/OpenType format is also made for fast access, e.g. with internal pointers, making validation a pain.

So though FreeType is carefully written w.r.t. correctness, it was not meant to deal with malicious input and that robustness is hard to put in afterwards.

oergiR commented on Bugs in LLM Training – Gradient Accumulation Fix unsloth.ai/blog/gradient... · Posted by u/apsec112

xcodevn · a year ago

Look from a different point of view: this is a feature, not a bug. With this, every example has equal weight, while with the fix, every token has equal weight.

oergiR · a year ago

That makes it sound like it’s a choice, which it isn’t really. The way to look at it is from a probabilistic perspective: with the fix, you maximise the probability of the data. Without the fix, you fairly arbitrarily raise some probabilities to a power greater than one, and some to a power less than one.

oergiR commented on Heat pumps show how hard decarbonisation will be economist.com/leaders/202... · Posted by u/Brajeshwar

oergiR · 2 years ago

Isn’t the main problem the cost of heating? In the UK at least a kWh of electricity cost 4 times what a kWh of gas costs. Even if a heat pump is twice as efficient as gas heating, the cost of heating is still twice as high.

oergiR commented on What are transformer models and how do they work? txt.cohere.ai/what-are-tr... · Posted by u/tomcam

jaidhyani · 3 years ago

Skimming it, there are a few things about this explanation that rub me just slightly the wrong way.

1. Calling the input token sequence a "command". It probably only makes sense to think of this as a "command" on a model that's been fine-tuned to treat it as such.

2. Skipping over BPE as part of tokenization - but almost every transformer explainer does this, I guess.

3. Describing transformers as using a "word embedding". I'm actually not aware of any transformers that use actual word embeddings, except the ones that incidentally fall out of other tokenization approaches sometimes.

4. Describing positional embeddings as multiplicative. They are generally (and very counterintuitively to me, but nevertheless) additive with token embeddings.

5. "what attention does is it moves the words in a sentence (or piece of text) closer in the word embedding" No, that's just incorrect.

6. You don't actually need a softmax layer at the end, since here they're just picking the top token and they can just do that pre-softmax since it won't change. It's also weird how they talked about this here when the most prominent use of softmax in transformers is actually in the attention component.

7. Really shortchanges the feedforward component. It may be simple, but it's really important to making the whole thing work.

8. Nothing about the residual

oergiR · 3 years ago

I agree except for (6). A language model assigns probabilities to sequences. The model needs normalised distributions, eg using a softmax, so that’s the right way of thinking about it.

oergiR commented on Eating horsemeat in France lamelonne.substack.com/p/... · Posted by u/WuTangCFO

OJFord · 3 years ago

Similar quirk that irks me in English is 'horseback riding': it's just 'riding' - or 'riding on horseback', 'horse riding' - the whole thing's coming!

'horseback riding' to me sounds like it's supposed to be a humourous phrase along the lines of 'driving a desk' (a sort of blue-collar self-deprecation for having moved up from driving whatever vehicle) - like you're still in training and haven't been given a live horse yet or something.

oergiR · 3 years ago

In English everywhere except the US, it’s “horse riding”; “horseback riding” is US English.

Definitely humorous: https://m.youtube.com/watch?v=5wSw3IWRJa0

oergiR commented on Firefox Voice voice.mozilla.org/firefox... · Posted by u/makeworld

dreamcompiler · 5 years ago

> if we want open, on-device voice recognition, we'll have to do the work and donate sample data.

We absolutely will not. The only reason people believe this is that they've forgotten how to do speaker-dependent recognition (SDR), which is more accurate and more secure anyway. We were doing SDR in the 80s with 1/1000 the CPU power and 1/1000 the memory.

SDR does require an initial training session, but once that's done any modern computer or smartphone should be able to handle it locally with no cloud server environment.

oergiR · 5 years ago

Training a speaker-specific recogniser that improves over a generic recogniser requires a lot more data nowadays. First, generic systems are a lot better and trained on a lot more data nowadays. Second, speaker adaptation worked better for the Gaussian mixture models from the late nineties (don’t know about the eighties) than for neural networks.

oergiR commented on Deep Learning for Siri’s Voice machinelearning.apple.com... · Posted by u/subset

sarabande · 8 years ago

Also glad to see this. Still curious as to why they wouldn't post it as a research paper on arXiv -- what's the point in reinventing the wheel here? I suppose it's nice for publicity, but would be great if they also played nicely with the ecosystem.

oergiR · 8 years ago

The actual paper is available here: http://www.isca-speech.org/archive/Interspeech_2017/abstract...

oergiR commented on Jack Ma's theory of how America went wrong over the past 30 years businessinsider.com/aliba... · Posted by u/taobility

mirimir · 9 years ago

No, Obamacare only pretends to resemble European-style healthcare. In fact, it merely forces everyone to sign up with private health insurance companies. Which still take their huge profits.

oergiR · 9 years ago

This is how healthcare is set up in Switzerland and the Netherlands.

It is just an implementation issue how payments (through tax or otherwise) are routed. In a system that involves market forces, the key thing is that if everybody is insured, including the healthy, the cost per person comes down.

oergiR commented on Model-Based Machine Learning mbmlbook.com/toc.html... · Posted by u/seycombi

oergiR · 9 years ago

The "model" in the title is the model of the world, as a probabilistic model. The good thing about such a model is that it explicitly states your beliefs about the world. Once you've defined it, in theory reasoning about it is straightforward. (In practice a lot of papers get written about how to do approximate inference.) It's also straightforward to do unsupervised learning.

This is a different perspective from (most uses of) neural networks, which do not have this clear separation between the model and how to reason about it. It's funny that Chris Bishop in 1995 wrote the textbook "Neural Networks for Pattern Recognition" and now is effectively arguing against using neural networks.

You can use both by using neural networks as "factors" (the black squares) in probabilistic models.