nsypteras (u/nsypteras)

nsypteras commented on Ask HN: What Are You Working On? (December 2025) · Posted by u/david927

nsypteras · a day ago

Analyzing frontier LLM performance on my favorite daily puzzle game (https://www.nicksypteras.com/blog/cbs-benchmark.html) Next step is to assess how well the LLMs can create their own new, logically satisfiable puzzles in the same style. Then I'll have them battle it out, with one model creating a puzzle and the other attempting to solve it!

nsypteras commented on Open Source and Local Code Mode MCP in Deno Sandboxes portofcontext.com... · Posted by u/pmkelly4444

nsypteras · 25 days ago

Congrats on launching! One immediate thought is that people will always be wary of running LLM-generated code on their machines even if it's sandboxed. Is one of the future business cases for this to host a remote execution environment that pctx can call out to rather than running the code locally?

nsypteras commented on Can GPT-5 Beat My Favorite Daily Puzzle Game? nicksypteras.com/blog/cbs... · Posted by u/nsypteras

pyankoff · a month ago

Very cool! The massive outperformance of GPT-5 looks like there is something different in their training data indeed. Considering their previous work on games, wouldn't be surprising if they generated some synthetic game data.

nsypteras · a month ago

Ya interesting thought - would be fascinating if generating games w/solutions is part of the training data pipeline. There's been previous work done on on testing LLMs on logic puzzles[1][2][3] so they could possibly be building off those ideas to improve performance.

[1] https://huggingface.co/papers/2504.00043 [2] https://huggingface.co/blog/yuchenlin/zebra-logic [3] https://arxiv.org/pdf/2403.12094

Posted by u/nsypteras a month ago

Can GPT-5 Beat My Favorite Daily Puzzle Game?nicksypteras.com/blog/cbs...

nsypteras commented on Show HN: I scraped 3B Goodreads reviews to train a better recommendation model book.sv... · Posted by u/costco

nsypteras · a month ago

I'm impressed it recommended so many books i've already read and liked! I have a big reading backlog but once it's whittled down I will likely come back to this. One feature request would be to also show a "why this is recommended" for each recommendation so I can further narrow down the list for what I'm looking for

nsypteras commented on US AI Action Plan ai.gov/action-plan... · Posted by u/joelburget

nsypteras · 5 months ago

"Counter Chinese Influence in International Governance Bodies" and grouping them in with US "adversaries" and "rivals" is quite undiplomatic language to throw in under "Lead in International AI Diplomacy and Security" section. Diplomacy with China should be an important part of this initiative but will inevitably be bungled.

nsypteras commented on The United States withdraws from UNESCO state.gov/releases/office... · Posted by u/layer8

nsypteras · 5 months ago

1984: U.S. withdraws. 2003: U.S. rejoins. 2011: U.S. stops paying dues after Palestine joins. 2017: U.S. announces withdrawal (effective end of 2018). 2023: U.S. rejoins, pledges to repay dues. 2025: U.S announces withdrawal

Seems to be a revolving door

nsypteras commented on Local LLMs versus offline Wikipedia evanhahn.com/local-llms-v... · Posted by u/EvanHahn

beaugunderson · 5 months ago

I've had a full Kiwix Wikipedia export on my phone for the last ~5 years... I have used it many times when I didn't have service and needed to answer a question or needed something to read (I travel a lot).

nsypteras · 5 months ago

Same here! Kiwix comes in clutch on flights. I've used it so many times to get background knowledge on topics mid-read. Plus free and open source. Such a great service.

nsypteras commented on The current hype around autonomous agents, and what actually works in production utkarshkanwat.com/writing... · Posted by u/Dachande663

johndhi · 5 months ago

From what I understand customer support chatbots have had some pretty good outcomes from ai agents. Or does that not count?

nsypteras · 5 months ago

I think that would be one of the success cases described in the article because HITL is an integral part of good customer support chatbots. Support chats can be escalated to a human whenever the agent is unable to provide a satisfactory answer to the user.

nsypteras commented on TSA to end shoes-off policy for airport security screening abcnews.go.com/US/tsa-end... · Posted by u/avonmach

nsypteras · 5 months ago

> The transportation agency has spent years looking for an innovative way to allow passengers to move faster through the security checkpoints.

I think the writer had some fun with this one

u/nsypteras

KarmaCake day311September 23, 2016

About

nicksypteras.com

View Original