threeducks (u/threeducks)

threeducks commented on Grok Code Fast 1 x.ai/news/grok-code-fast-... · Posted by u/Terretta

hu3 · 2 days ago

Interesting. Available in VSCode Copilot for free.

https://i.imgur.com/qgBq6Vo.png

I'm going to test it. My bottleneck currently is waiting for agent to scan/think/apply changes.

threeducks · 2 days ago

I have been testing it since yesterday in VS Code and it seemed fine so far. But I am also happy with all the GPT-4 variants, so YMMV.

threeducks commented on Ban me at the IP level if you don't like me boston.conman.org/2025/08... · Posted by u/classichasclass

bob1029 · 7 days ago

I think a lot of really smart people are letting themselves get taken for a ride by the web scraping thing. Unless the bot activity is legitimately hammering your site and causing issues (not saying this isn't happening in some cases), then this mostly amounts to an ideological game of capture the flag. The difference being that you'll never find their flag. The only thing you win by playing is lost time.

The best way to mitigate the load from diffuse, unidentifiable, grey area participants is to have a fast and well engineered web product. This is good news, because your actual human customers would really enjoy this too.

threeducks · 7 days ago

> The best way to mitigate the load from diffuse, unidentifiable, grey area participants is to have a fast and well engineered web product.

I wonder what all those people are doing that their server can't handle the traffic. Wouldn't a simple IP-based rate limit be sufficient? I only pay $1 per month for my VPS, and even that piece of trash can handle 1000s of requests per second.

threeducks commented on In a first, Google has released data on how much energy an AI prompt uses technologyreview.com/2025... · Posted by u/jeffbee

threeducks · 10 days ago

    the median prompt [...] consumes 0.24 watt-hours of electricity

In layman's terms, that is (approximately)

- one second of running a toaster, or

- 1/80th of a phone charge,

- lifting 100 pounds to a height of 6 feet,

- muzzle energy of a 9mm bullet,

- driving 6 feet with a Tesla.

threeducks commented on Analysis of the GFW's Unconditional Port 443 Block on August 20, 2025 gfw.report/blog/gfw_uncon... · Posted by u/kotri

spwa4 · 12 days ago

No it does not. Against a huge state adversary like China it does not matter. They have satellites looking down so they can quickly locate any starlink users. And then ...

The only thing that could bypass is GPS + laser links (meaning physically aiming a laser both on the ground AND on a satellite). You cannot detect that without being in the direct path of the laser (though of course you can still see the equipment aiming the laser, so it doesn't just need to work it needs to be properly disguised). That requires coherent beams (not easy, but well studied), aimed to within 2 wavelengths of distance at 160km (so your direction needs to be accurate to 2 billionths of a degree, obviously you'll need stabilization), at a moving target, using camouflaged equipment.

This is not truly beyond current technology, but you can be pretty confident even the military doesn't have this yet.

threeducks · 12 days ago

What makes it so that this kind of precision is required? I have little knowledge of the physics behind it, but a few decades ago, a local university had an open day where they bounced lasers off of a retro reflector on the moon to measure the distance: https://en.wikipedia.org/wiki/Lunar_Laser_Ranging_experiment...

The moon is 700 times farther away than the starlink satellites (or twice that, if you consider the bounce), so I find it hard to imagine that it would be impossible to communicate with much closer satellites over laser when both sides can have an active transmitter.

threeducks commented on How to Think About GPUs jax-ml.github.io/scaling-... · Posted by u/alphabetting

gregorygoc · 12 days ago

It’s mind boggling why this resource has not been provided by NVIDIA yet. It reached the point that 3rd parties reverse engineer and summarize NV hardware to a point it becomes an actually useful mental model.

What are the actual incentives at NVIDIA? If it’s all about marketing they’re doing great, but I have some doubts about engineering culture.

threeducks · 12 days ago

With mediocre documentation, NVIDIAs closed-source libraries, such as cuBLAS and cuDNN, will remain the fastest way to perform certain tasks, thereby strengthening vendor lock-in. And of course it makes it more difficult for other companies to reverse engineer.

threeducks commented on Llama-Scan: Convert PDFs to Text W Local LLMs github.com/ngafar/llama-s... · Posted by u/nawazgafar

nawazgafar · 14 days ago

Author here, that sucks. I'd love to recreate this locally. Would you be willing to share the PDF?

threeducks · 14 days ago

As far as I am aware, the "hanging" issue remains unsolved to this day. The underlying problem is that LLMs sometimes get stuck in a loop where they repeat the same text again and again until they reach the token limit. You can break the loop by setting a repeat penalty, but when your image contains repeated text, such as in tables, the LLM will output incorrect results to prevent repetition.

Here is the corresponding GitHub issue for your default model (Qwen2.5-VL):

https://github.com/QwenLM/Qwen2.5-VL/issues/241

You can mitigate the fallout of this repetition issue to some degree by chopping up each page into smaller pieces (paragraphs, tables, images, etc.) with a page layout model. Then at least only part of the text is broken instead of the entire page.

A better solution might be to train a model to estimate a heat map of character density for a page of text. Then, condition the vision-language model on character density by feeding the density to the vision encoder. Also output character coordinates, which can be used with the heat map to adjust token probabilities.

threeducks commented on PYX: The next step in Python packaging astral.sh/blog/introducin... · Posted by u/the_mitsuhiko

ilvez · 18 days ago

Do I get it right that this issue is within Windows? I've never heard of the issues you describe while working with Linux.. I've seen people struggle with MacOS a bit due to brew different versions of some library or the other, mostly self compiling Ruby.

threeducks · 18 days ago

There certainly are issues on Linux as well. The Detectron2 library alone has several hundred issues related to incorrect versions of something: https://github.com/facebookresearch/detectron2/issues

The mmdetection library (https://github.com/open-mmlab/mmdetection/issues) also has hundreds of version-related issues. Admittedly, that library has not seen any updates for over a year now, but it is sad that things just break and become basically unusable on modern Linux operating systems because NVIDIA can't stop breaking backwards and forwards compatibility for what is essentially just fancy matrix multiplication.

threeducks commented on GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 magazine.sebastianraschka... · Posted by u/ModelForge

yunusabd · 21 days ago

GP asked the model to _create_ a riddle, not solve a given one.

threeducks · 21 days ago

Yes, but the odds of getting GPT-OSS to respond with that riddle are pretty low and it is not necessary to demonstrate whether the LLM can answer the riddle correctly.

threeducks commented on GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 magazine.sebastianraschka... · Posted by u/ModelForge

unstatusthequo · 21 days ago

Yes. I tried to ask oss-gpt to ask me a riddle. The response was absurd. Came up with a nonsensical question, then told me the answer. The answer was a four letter “word” that wasn’t actually a real word.

“What is the word that starts with S, ends with E, and contains A? → SAEA”

Then when I said that’s not a word and you gave me the answer already, no fun, it said

“I do not have access to confirm that word.”

threeducks · 21 days ago

FWIW, I asked gpt-oss-120b this question 10 times and the answer was always "sauce", "sane" or "sale". I also tried different temperatures (from 0 to 1), which did not seem to have an effect on the correctness of the answer.

EDIT: I now have also questioned the smaller gpt-oss-20b (free) 10 times via OpenRouter (default settings, provider was AtlasCloud) and the answers were: sage, sane, sane, space, sane, sane, sane, sane, space, sane.

You are either very unlucky, your configuration is suboptimal (weird system prompt perhaps?) or there is some bug in whichever system you are using for inference.

threeducks commented on Qwen3-4B-Thinking-2507 huggingface.co/Qwen/Qwen3... · Posted by u/IdealeZahlen

cowpig · 25 days ago

Compare these rankings to actual usage: https://openrouter.ai/rankings

Claude is not cheap, why is it far and away the most popular if it's not top 10 in performance?

Qwen3 235b ranks highest on these benchmarks among open models, but I have never met someone who prefers its output over Deepseek R1. It's extremely wordy and often gets caught in thought loops.

My interpretation is that the models at the top of ArtificialAnalysis are focusing the most on public benchmarks in their training. Note I am not saying XAI is necessarily nefariously doing this, could just be that they decided it's better bang for the buck to rely on public benchmarks than to try to focus on building their own evaluation systems.

But Grok is not very good compared to the anthropic, openai, or google models despite ranking so highly in benchmarks.

threeducks · 25 days ago

OpenRouter rankings conflate many factors like output quality, popularity, price and legal concerns. They can not tell us whether a model is popular because it is genuinely good, or because many people have heard about it, or because it is free, or because the lawyers trust the provider.