thorax (u/thorax) - Readit News

thorax commented on Show HN: Why write code if the LLM can just do the thing? (web app experiment) github.com/samrolken/noko... · Posted by u/samrolken

thorax · 2 months ago

I like the OP's idea and think it actually has some fun you applications. Especially with a little more narrowing of scope.

Similar fun concept as the cataclysm library for Python: https://github.com/Mattie/cataclysm

thorax commented on Google is winning on every AI front thealgorithmicbridge.com/... · Posted by u/vinhnx

antirez · 8 months ago

Gemini 2.5 pro is as powerful as everybody says. I still also use Claude Sonnet 3.7 only because the Gemini web UI has issues... (Imagine creating the best AI and then not allowing to attach Python or C files if not renamed .txt) but the way the model is better than anyone else is a "that's another league" experience. They have the biggest search engine and YouTube to leverage the power of the AI they are developing. At this point I believe too that they are likely to win the race.

thorax · 8 months ago

In AI Studio, it seemed to let me upload pretty much any file and tokenize it without renaming, FWIW

thorax commented on Sketch-Programming: A Minimalist Paradigm for Code Design github.com/DmitryOlkhovoi... · Posted by u/DmitryO

thorax · 9 months ago

I really like the spirit! This sort of concept was one of the first things I wanted to explore with a spiritually adjacent project (cataclysm). It's been a bit, so I need to take time to update the python JIT generation now that we have much better coding models, but I'm a big fan of this sort of abstraction. https://github.com/Mattie/cataclysm

As most of us here can see, for many tasks now you don't really need to worry that you have the exact right syntax. I think you still need expert precision when it matters immensely, but we all develop tools, scripts, layers and the like that manual precision isn't necessary.

thorax commented on Home Assistant blocked from integrating with Garage Door opener API home-assistant.io/blog/20... · Posted by u/eamonnsullivan

lvh · 2 years ago

Based on my local big box store and garage installer availability, Chamberlain has a de facto monopoly. They also pulled the rug out from under customers: that behavior had been in Home Assistant since 2017, and it's their own recent changes that caused the alleged "DDoS". They say it's to promote official products, but the company previously had a local hub that didn't require their cloud service and discontinued it.

The API breakage coincides pretty well with their brand new CTO, whose objective is apparently "transformation to a smart access software company".

It's unclear if the CTO just doesn't understand that "DDoS" generally implies malice, or if they're intentionally using that language to blame users for using their product.

Good news: ratgdo, an ESP-based local solution works great. I hope the author is making a decent profit on the kits.

thorax · 2 years ago

Replaced my openers in the spring, and 100% wouldn't have chosen them if there wasn't HA MyQ integration. Such a silly move.

I used a local Meross install on my old garage doors, time to break them out, but ugh...

thorax commented on Benchmarking GPT-4 Turbo – A Cautionary Tale blog.mentat.ai/benchmarki... · Posted by u/ja3k

Havoc · 2 years ago

Is it know what exactly OpenAI does in the background when they make these turbo editions?

Seems like sacrificing some quality for large gains on speed and cost but anyone know more detail?

thorax · 2 years ago

Don't think so, but there were some guesses on 3.5-turbo-- i.e. training a much smaller model on quality questions/answers from GPT-4. Same tactic worked again and again for other LLMs.

I'm definitely curious on the context window increase-- I'm having a hard time telling if it's 'real' vs a fast specially trained summarization prework step. That being said, it's been doing a rather solid job not losing info in that context window in my minor anecdotal use cases.

thorax commented on Benchmarking GPT-4 Turbo – A Cautionary Tale blog.mentat.ai/benchmarki... · Posted by u/ja3k

minihat · 2 years ago

GPT-4 Turbo is dramatically worse at one task I often try:

Read the following passage from [new ML article]. Identify their assumptions, and tell me which mathematical operations or procedures they use depend upon these assumptions.

GPT-4: Usually correctly identifies the assumptions, and often quotes the correct mathematics in its reply.

GPT-4 Turbo: Sometimes identifies the assumptions, and is guaranteed to stop trying at that point and then give me a Wikipedia-like summary about the assumptions rather than finish the task. Further prompting will not improve its result.

thorax · 2 years ago

Do you have a link or gist of an example run you tried? I'd be curious to try something similar.

thorax commented on OpenAI's plans according to sama humanloop.com/blog/open_a... · Posted by u/razcle

m3kw9 · 3 years ago

If you look at their API limits, no serious company can use this to scale up beyond say 10k users. 3500 Reqs per min for gpt3.5 turbo. They have a long way to go to make it usable for the rest of the 95%

thorax · 3 years ago

I've had to move to using Azure OpenAI service during business hours for the API-- much more stable unless the prompts stray into something a little odd and their API censorship blocks the calls.

thorax commented on Show HN: cataclysm v0.1 – the final Python module? github.com/Mattie/catacly... · Posted by u/thorax

JohnBerryman · 3 years ago

Now that it's released, it's time to experiment with what it can do.

``` from cataclysm import doom

def mystery_func(): while True: pass

# predict if specified function halts print(doom.does_it_halt(mystery_func)) ```

thorax · 3 years ago

Ah, really curious if it'll balk at the age old problem or just answer based on the code provided since it does inspect the nearby code to understand context for the generated code.

I'll have to try that when I get back to my desk!

thorax commented on ChatGPT – The Revolutionary Bullshit Parrot reasonfieldlab.com/post/c... · Posted by u/mgl

zekica · 3 years ago

It returns factually incorrect data, and it returns code with subtle but important errors if you ask it anything that's not regurgitated a thousand times in the training dataset.

Don't get me wrong, it has emergent properties (more than you would expect from a fancy autocomplete), but factual output was never GPT-4's nor any other LLM's design goal.

thorax · 3 years ago

> It returns factually incorrect data, and it returns code with subtle but important errors if you ask it anything that's not regurgitated a thousand times in the training dataset.

To be fair, that's what pretty much every person does. The bar does seem pretty high if we need more than that (especially if not specifically trained on a topic). It's not a universally perfect expert servant, but I've been exploring the code generation of GPT4 in detail (i.e. via the 'cataclysm' module I just posted about). In 1 minute it can write functions as good as the average developer intern most of the time.

We're keeping score in a weird way if we're responding quickly with it needing to "code without subtle but important errors". Because that's the majority of human developers, too. I've been writing code for 30 years, and if you put a gun to my head, I would still have subtle but important flaws in every first typing of any complex generated code.

I'm not saying you're bashing it, by the way, I get your point, but I do worry a bit when the first response is citing that the SOTA models get things wrong in 0-shot situations without full context. That's describing all of us.

thorax commented on Show HN: cataclysm v0.1 – the final Python module? github.com/Mattie/catacly... · Posted by u/thorax

thorax · 3 years ago

Now that it's released, it's time to experiment with what it can do.

   from cataclysm import doom

   # App gets the img file from the command line and saves it as a new file at half size with _half appended to the name
   doom.resize_app()

Turned out to be all that's needed for a command-line file resize app (with PIL installed).