Readit News logoReadit News
karterk commented on Amazon has mostly sat out the AI talent war   businessinsider.com/amazo... · Posted by u/ripe
whatever1 · 12 days ago
The evidence shows that there is no methodological moat for LLMS. The moat of the frontier folks is just compute. xAI went in months from nothing to competing with the top dogs. DeepSeek too. So why bother with splurging billions in talent when you can buy GPUs and energy instead and serve the compute needs of everyone?

Also Amazon is in another capital intensive business. Retail. Spending billions on dubious AWS moonshots vs just buying more widgets and placing them across the houses of US customers for even faster deliveries does not make sense.

karterk · 12 days ago
> The moat of the frontier folks is just compute.

This is not really true. Google has all the compute but in many dimensions they lag behind GPT-5 class (catching up, but it has not been a given).

Amazon itself did try to train a model (so did Meta) and had limited success.

karterk commented on Microsoft Is Dead (2007)   paulgraham.com/microsoft.... · Posted by u/aamederen
karterk · 7 months ago
Satya saved Microsoft by doubling down on Azure and Cloud. Something that Balmer failed to do with Mobile.
karterk commented on I fixed the strawberry problem because OpenAI couldn't   xeiaso.net/blog/2024/stra... · Posted by u/xena
viraptor · a year ago
Or "when trying to answer questions that involve spelling or calculation, use python". No need for extra training really.
karterk · a year ago
There are many different classes of problems that are affected by tokenization. Some of them can be tackled by code.
karterk commented on I fixed the strawberry problem because OpenAI couldn't   xeiaso.net/blog/2024/stra... · Posted by u/xena
karterk · a year ago
Solving the strawberry problem will probably require a model that just works with bytes of text. There have been a few attempts at building this [1] but it just does not work as well as models that consume pre-tokenized strings.

[1]: https://arxiv.org/abs/2106.12672

karterk commented on Eagle 7B: Soaring past Transformers   blog.rwkv.com/p/eagle-7b-... · Posted by u/guybedo
karterk · 2 years ago
It's interesting how all focus is now primarily on decoder-only next-token-prediction models. Encoders (BERT, encoder of T5) are still useful for generating embedding for tasks like retrieval or classification. While there is a lot of work on fine-tuning BERT and T5 for such tasks, it would be nice to see more research on better pre-training architectures for embedding use cases.
karterk commented on What are farm animals thinking?   science.org/content/artic... · Posted by u/mooreds
theultdev · 2 years ago
> If people spent time with meat animals, there would be a lot more vegetarians.

Having grown up on a ranch, yes I know cows are smart... chickens not so much. It does not change my views on eating beef at all.

Cows are treated very well in all the ranches I've been on. One because it makes them taste better, another because we about our livestock.

Your alternative is them not existing at all, they cannot survive in nature.

The cow does not know it's going to be slaughtered.

It lives a nice happy life roaming the fields, on the last few days it goes to processing to eat a massive last great meal.

It's the nicest form of us being eating our prey.

karterk · 2 years ago
> Cows are treated very well in all the ranches I've been on.

Unfortunately the entire premise of your argument is based on a personal anecdote that's a gross generalization of vast number of cattle farms.

karterk commented on JAX – NumPy on the CPU, GPU, and TPU   jax.readthedocs.io/en/lat... · Posted by u/peter_d_sherman
albertzeyer · 2 years ago
Does it support arrays of variable lengths now? Last time I looked, I think this was not supported. So it means, for every variable dimension, you need to use an upper bound, and then use masking properly, and hope that it would not waste computation too much on the unused part (e.g. when running a loop over it).

I'm working with sequences, e.g. speech recognition, machine translation, language modeling. This is a quite fundamental property for this type of models, that we have variable lengths sequences.

In those cases, for some example code, I have seen that training also used only fixed size dimensions. And at inference time, they had some non-JAX code for the loop over the sequence around the JAX code with fixed-size dimensions.

This seems like a quite fundamental issue to me? I wonder a bit that this is not an issue for others.

karterk · 2 years ago
For JIT-ing you need to know the sizes upfront. There was an experimental branch for introducing jagged tensors, but as far as I know, it has been abandoned.
karterk commented on Building a Cloud Database from Scratch: Why We Moved from C++ to Rust (2022)   risingwave-labs.com/blog/... · Posted by u/mountainview
pjmlp · 3 years ago
That is the current issue with most C++ codebases nowadays, I only see modern C++ on conference slides, when I look into codebases even from ISO C++ members, it is always C++ full of C idioms no matter what.

I bet most candidates to C++ job offers end up discovering the hardly reality of existing code.

karterk · 3 years ago
Can you give some examples of problematic C idioms in C++?

u/karterk

KarmaCake day4443October 20, 2010
About
Co-founder @ typesense.org

kishore at typesense org https://calendly.com/kishorenc/30min

View Original