paradite (u/paradite)

paradite commented on Google will allow only apps from verified developers to be installed on Android 9to5google.com/2025/08/25... · Posted by u/kotaKat

rvnx · 10 hours ago

If this is a thing then the solution they offer is incorrect. A big giant red screen: “warning the identity of this application developer has not been verified and this could be an application stealing your data, etc” would have worked.

What they want is to get rid of apps like YouTube Vanced that are making them lose money (and other Play Store apps)

paradite · 8 hours ago

It won't work because of too many false positives. People are already trained to ignore warnings, like how they blindly accept T&C without reading.

paradite commented on DeepSeek-v3.1 api-docs.deepseek.com/new... · Posted by u/wertyk

segmondy · 5 days ago

garbage benchmark, inconsistent mix of "agent tools" and models. if you wanted to present a meaningful benchmark, the agent tools will stay the same and then we can really compare the models.

there are plenty of other benchmarks that disagree with these, with that said. from my experience most of these benchmarks are trash. use the model yourself, apply your own set of problems and see how well it fairs.

paradite · 4 days ago

Hey. I like your roast on benchmarks.

I also publish my own evals on new models (using coding tasks that I curated myself, without tools, rated by human with rubrics). Would love you to check out and give your thoughts:

Example recent one on GPT-5:

https://eval.16x.engineer/blog/gpt-5-coding-evaluation-under...

All results:

https://eval.16x.engineer/evals/coding

paradite commented on Mark Zuckerberg freezes AI hiring amid bubble fears telegraph.co.uk/business/... · Posted by u/pera

Macha · 5 days ago

This link is not paywalled, unlike the WSJ link.

paradite · 5 days ago

It's pay wall for me.

paradite commented on Node.js is able to execute TypeScript files without additional configuration nodejs.org/en/blog/releas... · Posted by u/steren

rovingeye · 9 days ago

I can understand the argument, since npm has no solution for TypeScript packages, unlike JSR:

"You publish TypeScript source, and JSR handles generating API docs, .d.ts files, and transpiling your code for cross-runtime compatibility."

Still would have been nice to have this for private packages.

This makes Deno/Bun much more attractive alternatives

paradite · 8 days ago

JSR does that? Now that might be a good reason to move my packages over to get rid of tsup.

Posted by u/paradite 11 days ago

GPT-5 Coding Evaluation: Underwhelming Performance Given the Hype eval.16x.engineer/blog/gp...

Posted by u/paradite 16 days ago

The Identity Crisis: Why LLMs Don't Know Who They Are eval.16x.engineer/blog/ll...

paradite commented on · Posted by u/NoScopeNinja

paradite · 17 days ago

Is this from Moonshot AI (company behind the Kimi K2), or a 3rd party?

Judging from the design, I assume it's not officially related to the model.

Posted by u/paradite 18 days ago

The Pink Elephant Problem: Why "Don't Do That" Fails with LLMs eval.16x.engineer/blog/th...

paradite commented on Crush: Glamourous AI coding agent for your favourite terminal github.com/charmbracelet/... · Posted by u/nateb2022

cristea · a month ago

I would love a comparison between all these new tools, like this with Claude Code, opencode, aider and cortex.

I just can’t get an easy overview of how each tool works and is different

paradite · a month ago

The performance not only depends on the tool, it also depends on the model, and the codebase you are working on (context), and the task given (prompt).

And all these factors are not independent. Some combinations work better than others. For example:

- Claude Sonnet 4 might work well with feature implementation, on backend code python code using Claude Code.

- Gemini 2.5 Pro works better for big fixes on frontend react codebases.

...

So you can't just test the tools alone and keep everything else constant. Instead you get a combinatorial explosion of tool * model * context * prompt to test.

16x Eval can tackle parts of the problem, but it doesn't cover factors like tools yet.

https://eval.16x.engineer/

paradite commented on Study mode openai.com/index/chatgpt-... · Posted by u/meetpateltech

kobenni · a month ago

In my experience asking questions to Claude, the amount of incorrect information it gives is on a completely different scale in comparison to traditional sources. And the information often sounds completely plausible too. When using a text book, I would usually not Google every single piece of new information to verify it independently, but with Claude, doing that is absolutely necessary. At this point I only use Claude as a stepping stone to get ideas on what to Google because it is giving me false information so often. That is the only "effective" usage I have found for it, which is obviously much less useful than a good old-fashioned textbook or online course.

Admittedly I have less experience with ChatGPT, but those experiences were equally bad.

paradite · a month ago

What kind of questions / domains were you encountering false information on?

u/paradite

KarmaCake day2555March 23, 2014

About

Building products:

16x Eval - Effortlessly evaluate prompts and models

https://eval.16x.engineer/

16x Prompt - Streamline AI Coding Workflow

https://prompt.16x.engineer/

16x Tracker - Track, Filter, and Organize Reddit Keyword Hits

https://tracker.16x.engineer/

Older projects

https://ai-simulator.com/

https://16x.engineer/

[ my public key: https://keybase.io/paradite; my proof: https://keybase.io/paradite/sigs/KmrMtMWIIJSc-46410nPNevLQ4ICFUNP-F2RTCKTVhc ]

View Original