Readit News logoReadit News
nsypteras commented on Ask HN: What Are You Working On? (December 2025)    · Posted by u/david927
nsypteras · a day ago
Analyzing frontier LLM performance on my favorite daily puzzle game (https://www.nicksypteras.com/blog/cbs-benchmark.html) Next step is to assess how well the LLMs can create their own new, logically satisfiable puzzles in the same style. Then I'll have them battle it out, with one model creating a puzzle and the other attempting to solve it!
nsypteras commented on Open Source and Local Code Mode MCP in Deno Sandboxes   portofcontext.com... · Posted by u/pmkelly4444
nsypteras · 25 days ago
Congrats on launching! One immediate thought is that people will always be wary of running LLM-generated code on their machines even if it's sandboxed. Is one of the future business cases for this to host a remote execution environment that pctx can call out to rather than running the code locally?
nsypteras commented on Can GPT-5 Beat My Favorite Daily Puzzle Game?   nicksypteras.com/blog/cbs... · Posted by u/nsypteras
pyankoff · a month ago
Very cool! The massive outperformance of GPT-5 looks like there is something different in their training data indeed. Considering their previous work on games, wouldn't be surprising if they generated some synthetic game data.
nsypteras · a month ago
Ya interesting thought - would be fascinating if generating games w/solutions is part of the training data pipeline. There's been previous work done on on testing LLMs on logic puzzles[1][2][3] so they could possibly be building off those ideas to improve performance.

[1] https://huggingface.co/papers/2504.00043 [2] https://huggingface.co/blog/yuchenlin/zebra-logic [3] https://arxiv.org/pdf/2403.12094

nsypteras commented on Show HN: I scraped 3B Goodreads reviews to train a better recommendation model   book.sv... · Posted by u/costco
nsypteras · a month ago
I'm impressed it recommended so many books i've already read and liked! I have a big reading backlog but once it's whittled down I will likely come back to this. One feature request would be to also show a "why this is recommended" for each recommendation so I can further narrow down the list for what I'm looking for
nsypteras commented on US AI Action Plan   ai.gov/action-plan... · Posted by u/joelburget
nsypteras · 5 months ago
"Counter Chinese Influence in International Governance Bodies" and grouping them in with US "adversaries" and "rivals" is quite undiplomatic language to throw in under "Lead in International AI Diplomacy and Security" section. Diplomacy with China should be an important part of this initiative but will inevitably be bungled.
nsypteras commented on The United States withdraws from UNESCO   state.gov/releases/office... · Posted by u/layer8
nsypteras · 5 months ago
1984: U.S. withdraws. 2003: U.S. rejoins. 2011: U.S. stops paying dues after Palestine joins. 2017: U.S. announces withdrawal (effective end of 2018). 2023: U.S. rejoins, pledges to repay dues. 2025: U.S announces withdrawal

Seems to be a revolving door

nsypteras commented on Local LLMs versus offline Wikipedia   evanhahn.com/local-llms-v... · Posted by u/EvanHahn
beaugunderson · 5 months ago
I've had a full Kiwix Wikipedia export on my phone for the last ~5 years... I have used it many times when I didn't have service and needed to answer a question or needed something to read (I travel a lot).
nsypteras · 5 months ago
Same here! Kiwix comes in clutch on flights. I've used it so many times to get background knowledge on topics mid-read. Plus free and open source. Such a great service.
nsypteras commented on The current hype around autonomous agents, and what actually works in production   utkarshkanwat.com/writing... · Posted by u/Dachande663
johndhi · 5 months ago
From what I understand customer support chatbots have had some pretty good outcomes from ai agents. Or does that not count?
nsypteras · 5 months ago
I think that would be one of the success cases described in the article because HITL is an integral part of good customer support chatbots. Support chats can be escalated to a human whenever the agent is unable to provide a satisfactory answer to the user.
nsypteras commented on TSA to end shoes-off policy for airport security screening   abcnews.go.com/US/tsa-end... · Posted by u/avonmach
nsypteras · 5 months ago
> The transportation agency has spent years looking for an innovative way to allow passengers to move faster through the security checkpoints.

I think the writer had some fun with this one

u/nsypteras

KarmaCake day311September 23, 2016
About
nicksypteras.com
View Original