Readit News logoReadit News
dave1010uk commented on UK House of Lords attempting to ban use of VPNs by anyone under 16   alecmuffett.com/article/1... · Posted by u/nvarsj
dave1010uk · 5 days ago
I know what you're thinking: these restrictions are easy to work around. But don't worry, we can just layer more restrictions on top. Eventually the children will be safe! The government just needs to...

- require proof of age (ID) to install apps from unofficial sources on your phone or PC. Probably best to block this at both the OS and also popular VPN downloading sites like github.com and debian.org.

- require proof of age (ID) to unblock DNS provider IP addresses like 8.8.8.8 and 1.1.1.1 at your ISP.

- make sure children aren't using any other "privacy" tools that might be a slippery slope to installing a VPN.

This makes it so much easier for the parents too! The internet will be so safe that they won't even need to talk to their children about internet safety.

dave1010uk commented on Useful patterns for building HTML tools   simonwillison.net/2025/De... · Posted by u/simonw
dave1010uk · 5 days ago
Thanks Simon!

My tool collection [0] is inspired by yours, with a handful of differences. I'm only at 53 tools at the moment.

What I did differently:

Hosted on Cloudflare Pages. This gives you preview URLs for pull requests out the box. This might be possible with Github Pages but I haven't checked. I've used Vercel for similar projects in the past. Cloudflare seems to have the odd failed build that needs a kick from their dashboard.

Some tools can make use of Workers/Functions for backend processing and secrets. I try to keep these to a minimum but they're occasionally useful.

I have an AGENTS.md that's updated with a Github action to automatically pull in Claude-style Skills from the .skills directory. I blogged about this pattern and am still waiting for a standard to evolve [2].

I have a base stylesheet that I instruct agents to pull in. This gives a bit of consistency and also let's them use Tailwind, which they'd seem to love.

[0] https://tools.dave.engineer/

[1] https://github.com/dave1010/tools/tree/main/functions

[2] https://dave.engineer/blog/2025/11/skills-to-agents/

dave1010uk commented on Claude Opus 4.5   anthropic.com/news/claude... · Posted by u/adocomplete
aurareturn · 21 days ago

  Pages 22–24 of Opus’s system card provide some evidence for this. Anthropic run a multi-agent search benchmark where Opus acts as an orchestrator and Haiku/Sonnet/Opus act as sub-agents with search access. Using cheap Haiku sub-agents gives a ~12-point boost over Opus alone.
Will this lead to another exponential in capabilities and token increase in the same order as thinking models?

dave1010uk · 21 days ago
Perhaps. Though if that were feasible, I'd expect it would have been exploited already.

I think this is more about the cost and time saving of being able to use cheaper models. Sub-agents are effectively the same as parallelization and temporary context compaction. (The same as with human teams, delegation and organisational structures.)

We're starting to see benchmarks include stats of low/medium/high reasoning effort and how newer models can match or beat older ones with fewer reasoning tokens. What would be interesting is seeing more benchmarks for different sub-agent reasoning combinations too. Eg does Claude perform better when Opus can use 10,000 tokens of Sonnet or 100,000 tokens of Haiku? What's the best agent response you can get for $1?

Where I think we might see gains in _some_ types of tasks is with vast quantities of tiny models. I.e many LLMs that are under 4B parameters used as sub-agents. I wonder what GPT-5.1 Pro would be like if it could orchestrate 1000 drone-like workers.

dave1010uk commented on Claude Opus 4.5   anthropic.com/news/claude... · Posted by u/adocomplete
dave1010uk · 22 days ago
The Claude Opus 4.5 system card [0] is much more revealing than the marketing blog post. It's a 150 page PDF, with all sorts of info, not just the usual benchmarks.

There's a big section on deception. One example is Opus is fed news about Anthropic's safety team being disbanded but then hides that info from the user.

The risks are a bit scary, especially around CBRNs. Opus is still only ASL-3 (systems that substantially increase the risk of catastrophic misuse) and not quite at ASL-4 (uplifting a second-tier state-level bioweapons programme to the sophistication and success of a first-tier one), so I think we're fine...

I've never written a blog post about a model release before but decided to this time [1]. The system card has quite a few surprises, so I've highlighted some bits that stood out to me (and Claude, ChatGPT and Gemini).

[0] https://www.anthropic.com/claude-opus-4-5-system-card

[1] https://dave.engineer/blog/2025/11/claude-opus-4.5-system-ca...

dave1010uk commented on You should write an agent   fly.io/blog/everyone-writ... · Posted by u/tabletcorry
simonw · a month ago
I love hubcap so much. It was a real eye-opener for me at the time, really impressive result for so little code. https://simonwillison.net/2023/Sep/6/hubcap/
dave1010uk · a month ago
Thanks Simon!

It only worked because of your LLM tool. Standing on the shoulders of giants.

dave1010uk commented on You should write an agent   fly.io/blog/everyone-writ... · Posted by u/tabletcorry
dave1010uk · a month ago
Two years ago I wrote an agent in 25 lines of PHP [0]. It was surprisingly effective, even back then before tool calling was a thing and you had to coax the LLM into returning structured output. I think it even worked with GPT-3.5 for trivial things.

In my mind LLMs are just UNIX strong manipulation tools like `sed` or `awk`: you give them an input and command and they give you an output. This is especially true if you use something like `llm` [1].

It then seems logical that you can compose calls to LLMs, loop and branch and combine them with other functions.

[0] https://github.com/dave1010/hubcap

[1] https://github.com/simonw/llm

dave1010uk commented on Show HN: MARS – Personal AI robot for builders (< $2k)    · Posted by u/apeytavin
dave1010uk · 2 months ago
Looks awesome!

This isn't so clear though: https://docs.innate.bot/main/software/basic/connecting-to-ba...

> BASIC is accessible for free to all users of Innate robots for 300 cumulative hours - and probably more if you ask us.

Is BASIC used just to create the behaviours or to run them too? It sounds like this is an API you host that turns a behaviour like "pick up socks" into ROS2 motor commands for the robot. Are you open sourcing this too, so anyone can run the (presumably GPU heavy) backend?

Does the robot needs an internet connection to work?

Also, more importantly, what does it look like with googly eyes stuck on?

dave1010uk commented on Open Social   overreacted.io/open-socia... · Posted by u/knowtheory
wilt6269 · 3 months ago
I can’t help but cringe whenever I see a Bluesky fan stubbornly clinging to the past by calling X ‘Twitter.’ This one went even further - using the old logo and even the outdated URL.
dave1010uk · 3 months ago
That was an example of a social media company changing, with users not being able to migrate their data. Scroll a bit further and you'll see X.
dave1010uk commented on CEO Bench – Can AI Replace the C-Suite?   ceo-bench.dave.engineer/... · Posted by u/watermelon0
dave1010uk · 6 months ago
Thanks for submitting this!

Author here. (If you can call me that. GPT-4 and Gemini did the bulk of the work)

This is a (slightly tongue in cheek) benchmark to test some LLMs. All open source and all the data is in the repo.

It makes use of the excellent `llm` Python package from Simon Willison.

I've only benchmarked a couple of local models but want to see what the smallest LLM is that will score above the estimated "human CEO" performance. How long before a sub-1B parameter model performs better than a tech giant CEO?

u/dave1010uk

KarmaCake day7189March 8, 2011
About
Dave Hulbert

https://dave.engineer

Work https://passenger.tech

http://github.com/dave1010 @dave1010 dave1010 at gmail dot com

View Original