Readit News logoReadit News
sireat commented on Evaluating LLMs for my personal use case   darkcoding.net/software/p... · Posted by u/goranmoomin
vjerancrnjak · 3 days ago
How do you run so many, I’m constantly exhausting the resources can’t even concurrently call 20 times?
sireat · 3 days ago
While I do have multiple OpenRouter accounts(personal and organizational) I did not even look into concurrent calls - it was sequential.

The job was set on Friday and ready on Monday. On average it was about 5k tokens (documents ranging from 1k to 200k in size) and only about 10 tokens out.

Average response was about 1.5 seconds ~ 40 hours for full set.

I really did some heavy prompt testing to limit output.

Even then every few thousand queries you'd get some double token responses. That is Gemini would respond in duplicate - ie Daisy Daisy.

sireat commented on How to build a coding agent   ghuntley.com/agent/... · Posted by u/ghuntley
simonw · 3 days ago
OK that really is pretty simple, thanks for sharing.

The whole thing runs on these prompts: https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...

  Your task: {{task}}. Please reply
  with a single shell command in
  triple backticks.
  
  To finish, the first line of the
  output of the shell command must be
  'COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT'.

sireat · 3 days ago
Pretty sure you also need about 120 lines of prompting from default.yaml

https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...

sireat commented on Evaluating LLMs for my personal use case   darkcoding.net/software/p... · Posted by u/goranmoomin
sireat · 3 days ago
Basically it boils down that for most queries google/gemini-2.5-flash is the workhorse fast/cheap/good enough.

Add in multimodality, 1M context and it is such a Swiss army knife.

It is cheap and performant enough to run 100k queries. (Took a bit over a day and cost around 30 Euros for a major document classification task). Yes in theory this could have been done with fine-tuned BERT or maybe even with some older methods but it saved way too much time.

There is another factor that may explain why Flash is #1 in most categories on OpenRouter - Flash has gotten reasonably decent at less common human languages.

Most cheap (including Flash Lite) and local models mostly have English focused training.

sireat commented on Evaluating LLMs for my personal use case   darkcoding.net/software/p... · Posted by u/goranmoomin
Timwi · 3 days ago
This is definitely something very early LLMs could do that has kind of gotten beat out of them. I used to ask ChatGPT to simulate a text adventure game, but now if you try that you always get exactly the same one.
sireat · 3 days ago
Curious, what kind of prompt gives you the same text adventure game?

Surely it is a question of prompting some context(in UI mode) or with additional kicker of temperature (if using API)?

At the very least some set up prompt such as "Give me 5 scenarios for text adventure game" would break the sameness?

There have always been theories that OpenAI and other LLM providers cache some responses - this could be one hypothesis.

sireat commented on Windows XP Professional   win32.run/... · Posted by u/pentagrama
sireat · 19 days ago
Fun but its Python REPL broke the immersion for me.. Python 3.13.2 (main, Aug 4 2025 20:25:58)

I was expecting Python 2.2 or 2.3 ... not sure what was the earliest version of Python on Pyodide

sireat commented on The untold impact of cancellation   pretty.direct/impact... · Posted by u/cbeach
mzajc · a month ago
> I didn’t know “second base” had another meaning.

Out of curiosity, what's the other meaning? I assume the primary one has to do with baseball bases.

sireat · a month ago
First Base - Kissing Second Base - petting above waist Third Base - petting below waist Home Run - sex

2nd,3rd base can vary a bit, but Home Run analogy has been around for a long time.

sireat commented on LLMs are cheap   snellman.net/blog/archive... · Posted by u/Bogdanp
FergusArgyll · 3 months ago
At that price level you run into serious adverse selection
sireat · 3 months ago
Meaning someone paying $200 monthly is going to use it as much as possible to get their money's worth.

I slightly disagree.

My hypothesis would be that the distribution for $200 users would be bimodal.

That is there would be a one concentration of super heavy power users.

The second concentration would be of people who want the "best AI" but are not power users and feel that most expensive -> the best.

Their actual usage would be just like normal free tier of ChatGPT.

sireat commented on Tell HN: Help restore the tax deduction for software dev in the US (Section 174)    · Posted by u/dang
FirmwareBurner · 3 months ago
Most European countries don't have deductions for SW devs. Romania had it for a long time and removed it due to gov budget deficits. Some other CEE countries might still have them, but in general most socialist western European countries don't have them, which is why they don't have a tech industry.
sireat · 3 months ago
What do you mean? I am in Eastern Europe and as far as know software development costs are fully deductible just like any other employee costs.

Even more so for startups in Estonia and Latvia (probably Lithuania too) you can fully deduct R&D in general - not sure for how long.

That is you have you have 1M in sales 200k in net profit(after paying for everything including software development).

If that 200k in net profit is plowed back into speculative R&D it does not incur income taxes until money is paid out.

Even more so you can invest some 200k pre-tax in assets such as buildings. You only get taxed when you actually take out the money. In a way this is a pretty big loophole provided you are actually cash flow positive.

Basically in Baltics you can follow the early Amazon strategy of not making net profit, but investing in growth.

sireat commented on Trying to teach in the age of the AI homework machine   solarshades.club/p/dispat... · Posted by u/notarobot123
sireat · 3 months ago
In my programming, algorithms and data structures courses the homework assignment completion has gone from roughly 50% before LLMs to 99% this year.

Making assignments harder would be unfair to those few students who would actually try to solve the problem without LLMs.

So what I do is require extensive comments and ahem - chain of thought reasoning in the comments - especially the WHY part.

Then I require oral defense of the code.

Sadly this is unfeasible for some of the large classes of 200, but works quite well when I have the luxury of teaching 20 students.

sireat commented on Trying to teach in the age of the AI homework machine   solarshades.club/p/dispat... · Posted by u/notarobot123
const_cast · 3 months ago
Typical classroom experience works and has worked for thousands of years.

Edutech is pretty new and virtually all of it has been a disaster. Sitting in a lecture and taking notes on paper is tried, tested, and research backed. It works. Not for everyone, but for a lot of people.

sireat · 3 months ago
Actually, before https://en.wikipedia.org/wiki/John_Amos_Comenius in 17th century much of education was route memorization.

Then it was corporal punishment if you did not learn quickly enough.

Comenius idea was of pansophia - knowledge for all. Also his Latin textbook - https://en.wikipedia.org/wiki/Janua_Linguarum_Reserata was quite revolutionary - in using relations to real world knowledge to learn a new language.

Even more ground breaking was his picture book for children - https://en.wikipedia.org/wiki/Orbis_Pictus . We take hybrid approach to learning for granted these days.

Even then Comenius was mostly forgotten in the enlightenment of 18th century - probably ideas of Jean-Jacques Rousseau took over - with insufficient backing.

u/sireat

KarmaCake day2752August 26, 2008
About
This handle is unique to HN. I do not use it anywhere else. You could probably find me by finding similar posts by someone else on Reddit and maybe just maybe on Usenet.
View Original