Readit News logoReadit News
TheTaytay commented on Show HN: MCP Security Suite   github.com/NineSunsInc/mi... · Posted by u/jodoking
tptacek · 9 days ago
So, we've surfaced a disagreement, because I don't think you need something like taint tracking. I think the security boundary between an LLM context that takes untrusted data (from, e.g., tickets) and a sensitive context (that can, e.g., make database queries) is essentially no different than the boundary between the GET/POST args in a web app and a SQL query.

It's not a trivial boundary, but it's one we have a very good handle on.

TheTaytay · 6 days ago
I don't understand the part where you said that you have a very good handle on it. I really want to believe that it's as simple and solvable as you say it is. or do you mean that it's easily solvable - it's just that no one has done it yet? (In which case I think you are Simonw are saying the same thing?)

You mentioned the boundary between GET/POST args in a web app and a SQL query...but we have a system that is (by nature) mingling all of the parameters and execution together. It would be as if everyone's web server had a first line of their handler function that said something like "params = eval(user_based_params)", and you couldn't remove it...

TheTaytay commented on Claudia – Desktop companion for Claude code   claudiacode.com/... · Posted by u/zerealshadowban
marxism · 6 days ago
I've been contributing to an open source mobile app [1] that takes two swings at offering something that Roo does not have.

1. Real-time sync of CLI coding agent state to your phone. Granted this doesn't give you any new coding capabilities, you won't be making any different changes from your phone. And I would still chose to make a code change on my computer. But the fact that it's only slightly worse (you just wish you had a bigger screen) is still an innovation. Making Claude Code usable from anywhere changes when you can work, even if it doesn't change what you can do. I wrote a post trying to explain why this matters in practice. https://happy.engineering/docs/features/real-time-sync/

2. Another contributor is experimenting with a separate voice agent in between you and Claude Code. I've found it usable and maybe even nice? The voice agent acts like a buffer to collect and compact half backed think out loud ideas into slightly better commands for Claude Code. Another contributor wrote a blog post about why voice coding on your phone while out of the house is useful. They explained it better than I can. https://happy.engineering/docs/features/voice-coding-with-cl...

[1] https://github.com/slopus/happy

TheTaytay · 6 days ago
This looks awesome, but I was surprised to see the relay server being necessary. Can I self-host that too?
TheTaytay commented on Claudia – Desktop companion for Claude code   claudiacode.com/... · Posted by u/zerealshadowban
zblevins · 6 days ago
I currently do this with Termius and ssh into the box I’m working on the launching Claude Code. Only issue I have is the occasional network issue causing the session to drop.
TheTaytay · 6 days ago
You likely know this, but in case you don’t: Termius makes it easy to use “mosh”, which makes your connection resistant to network drops and resumable. I am experimenting with it right now. Once you install mosh on your serve, click the “mosh” setting in the connection settings in Termius, and you are good to go.
TheTaytay commented on Claudia – Desktop companion for Claude code   claudiacode.com/... · Posted by u/zerealshadowban
chis · 7 days ago
Really good idea, I’ll have to try it out. The thing I really want is to have the ability to give a recipe for a new Claude code instance - spin up a docker image with code, data, and a running server and then let Claude work against that.
TheTaytay · 7 days ago
I’ve been wanting the same thing. I am currently experimenting with a container that adds what I want. I keep wanting my sandboxed dev machine that is preconfigured with Claude (and almost nothing else)…but then I also want my customized Claude. I saw someone here on HN the other day who just spins up a Docker container and shares their home directory Claude files with it to facilitate the customized Claude container:

https://news.ycombinator.com/item?id=44549802

TheTaytay commented on Show HN: Omnara – Run Claude Code from anywhere   github.com/omnara-ai/omna... · Posted by u/kmansm27
bingdig · 11 days ago
Very cool! Would love an integration with Twilio / phone and text-to-voice and voice-to-text.

Start an agent, receive a call when a response from the user is needed, provide instruction, repeat.

Use case would be to continue the work hands-free while biking or driving.

TheTaytay · 11 days ago
A fellow posted his project called “talkito” that is close to this.
TheTaytay commented on Show HN: Omnara – Run Claude Code from anywhere   github.com/omnara-ai/omna... · Posted by u/kmansm27
robbomacrae · 12 days ago
Like others (smithclay, sst/opencode) have said about aiming for a similar feature, I had plans to make a mobile app for Talkito[0][1] which primarily adds voice TTS/ASR and WhatsApp/slack interactions to Claude Code.

This looks like exactly what I was envisioning so congrats on getting out there first! LMK if you want to add voice controls to this.

[0]: https://github.com/robdmac/talkito

[1]: https://talkito.com

TheTaytay · 11 days ago
Talkito looks very slick! You added voice, sms, and WhatsApp, which I’ve just been wishing for. I’ll have to give this a shot!
TheTaytay commented on Show HN: Omnara – Run Claude Code from anywhere   github.com/omnara-ai/omna... · Posted by u/kmansm27
smithclay · 11 days ago
similar: blink + tailscale + zellij + devcontainers
TheTaytay · 11 days ago
smithclay is being polite because this is someone else’s thread, but he wrote this (which I’m literally playing with right now): https://clay.fyi/blog/iphone-claude-code-context-coding/
TheTaytay commented on GEPA: Reflective prompt evolution can outperform reinforcement learning   arxiviq.substack.com/p/ge... · Posted by u/che_shr_cat
viksit · 24 days ago
they’ve already written one! see omar’s x account for details!
TheTaytay · 24 days ago
Here's a link to a repost Omar made referencing it: https://x.com/DSPyOSS/status/1950733300420510006
TheTaytay commented on Supervised fine tuning on curated data is reinforcement learning   arxiv.org/abs/2507.12856... · Posted by u/GabrielBianconi
anndvision · 25 days ago
We recently ran similar experiments and saw that fine-tuning small models on automatically curated high-quality outputs from a large model can beat large-model performance while reducing inference costs by up to 30x and inference time by up to 4x.

We benchmarked closed-source (OpenAI, Google) and open-source (Qwen) models on multi-turn maze navigation (BabyAI), agentic RAG (Multi-Hop), and agentic tool use (τ-bench).

We're still running a few experiments and plan to update the post with additional results in a few days.

Looking forward to trying out importance weighting soon!

Curated Behavior Cloning: Small LLMs Can Beat Large Ones at 5-30x Lower Cost: https://www.tensorzero.com/blog/curated-behavior-cloning-sma...

TheTaytay · 25 days ago
Thanks for this - I’ve spent the last hour reading your docs and blog. I like the primitives you’ve exposed in your APO, and particularly like the decision to separate out the structured inputs from the prompt when you record an LLM call, so I can finally perform optimizations and evals on past calls.

Quick question : you mentioned unsloth in the blog post. Which of the fine tuning providers mentioned is using unsloth under the hood?

TheTaytay commented on Batch Mode in the Gemini API: Process More for Less   developers.googleblog.com... · Posted by u/xnx
pugio · a month ago
Hah, I've been wrestling with this ALL DAY. Another example of Phenomenal Cosmic Powers (AI) combined with itty bitty docs (typical of Google). The main endpoint ("https://generativelanguage.googleapis.com/v1beta/models/gemi...") doesn't even have actual REST documentation in the API. The Python API has 3 different versions of the same types. One of the main ones (`GenerateContentRequest`) isn't available in the newest path (`google.genai.types`) so you need to find it in an older version, but then you start getting version mismatch errors, and then pydantic errors, until you finally decide to just cross your fingers and submit raw JSON, only to get opaque API errors.

So, if anybody else is frustrated and not finding anything online about this, here are a few things I learned, specifically for structured output generation (which is a main use case for batching) - the individual request JSON should resolve to this:

```json { "request": { "contents": [ { "parts": [ { "text": "Give me the main output please" } ] } ], "system_instruction": { "parts": [ { "text": "You are a main output maker." } ] }, "generation_config": { "response_mime_type": "application/json", "response_json_schema": { "type": "object", "properties": { "output1": { "type": "string" }, "output2": { "type": "string" } }, "required": [ "output1", "output2" ] } } }, "metadata": { "key": "my_id" } } ```

To get actual structured output, don't just do `generation_config.response_schema`, you need to include the mime-type, and the key should be `response_json_schema`. Any other combination will either throw opaque errors or won't trigger Structured Output (and will contain the usual LLM intros "I'm happy to do this for you...").

So you upload a .jsonl file with the above JSON, and then you try to submit it for a batch job. If something is wrong with your file, you'll get a "400" and no other info. If something is wrong with the request submission you'll get a 400 with "Invalid JSON payload received. Unknown name \"file_name\" at 'batch.input_config.requests': Cannot find field."

I got the above error endless times when trying their exact sample code: ``` BATCH_INPUT_FILE='files/123456' # File ID curl https://generativelanguage.googleapis.com/v1beta/models/gemi... \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'requests': { 'file_name': ${BATCH_INPUT_FILE} } } } }" ```

Finally got the job submission working via the python api (`file_batch_job = client.batches.create()`), but remember, if something is wrong with the file you're submitting, they won't tell you what, or how.

TheTaytay · a month ago
Thank you for posting this! (When I run into errors with posted sample code, I spend WAY too long assuming it’s my fault.)

u/TheTaytay

KarmaCake day660May 12, 2012View Original