philipbjorge (u/philipbjorge)

philipbjorge commented on CBP is monitoring US drivers and detaining those with suspicious travel patterns apnews.com/article/immigr... · Posted by u/jjwiseman

stevenjgarner · a month ago

Yes that is what I am saying. Most cops carry a multi tool at the minimum (with Phillips screwdriver). They also had a standard 10mm socket (carried by MANY cops and all that is required to dismantle much of any Hyundai).

Using their multi tool, they removed the fender liners (wheel well liners) from all 4 wheels, the trunk side trim (luggage compartment side trim) from both sides - all of which just has plastic push-pin scrivets (retainer clips). They broke 5 of them.

They folded down my back seats (after removing all my personal items out to the shoulder in the rain), then unbolted and removed the back seat.

I do a LOT of interstate driving, and it is not at all uncommon to see this happen.

This is not the only time I have been in situations where authority has been exceeded. My attitude is to generally be cooperative (without giving consent) as my experience has taught me that is the most painless way to go.

philipbjorge · a month ago

Until it's happened to you, it sounds unbelievable

Sorry about all the broken plastic on the trim -- That's also very familiar...

philipbjorge commented on Using AI to negotiate a $195k hospital bill down to $33k threads.com/@nthmonkey/po... · Posted by u/stevenhubertron

philipbjorge · 2 months ago

> We asked for a bill with the standard CPT codes. No reply. Asked again. “Oh, we meant to send it. We upgraded our computers five months ago and nothing works.” Uh-huh. Finally got the CPT codes.

I work in healthcare RCM. I have no trouble believing the staff here that nothing in their system works.

philipbjorge commented on Asking AI to build scrapers should be easy right? skyvern.com/blog/asking-a... · Posted by u/suchintan

philipbjorge · 2 months ago

We had a similar realization here at Thoughtful and pivoted towards code generation approaches as well.

I know the authors of Skyvern are around here sometimes -- How do you think about code generation with vision based approaches to agentic browser use like OpenAI's Operator, Claude Computer Use and Magnitude?

From my POV, I think the vision based approaches are superior, but they are less amenable to codegen IMO.

philipbjorge commented on Windows-Use: an AI agent that interacts with Windows at GUI layer github.com/CursorTouch/Wi... · Posted by u/djhu9

kh9000 · 3 months ago

Using the UIA tree as the currency for LLMs to reason over always made more sense to me than computer vision, screenshot based approaches. It’s true that not all software exposes itself correctly via UIA, but almost all the important stuff does. VS code is one notable exception (but you can turn on accessibility support in the settings)

philipbjorge · 3 months ago

Important is subjective — In the healthcare space, I’d make the claim that most applications don’t expose themselves correctly (native or web).

CV and direct mouse/kb interactions are the “base” interface, so if you solve this problem, you unlock just about every automation usecase.

(I agree that if you can get good, unambiguous, actionable context from accessibility/automation trees, that’s going to be superior)

philipbjorge commented on GPT-4.1 in the API openai.com/index/gpt-4-1/... · Posted by u/maheshrijal

pcwelder · 8 months ago

Can someone explain to me why we should take Aider's polyglot benchmark seriously?

All the solutions are already available on the internet on which various models are trained, albeit in various ratios.

Any variance could likely be due to the mix of the data.

philipbjorge · 8 months ago

If you're looking to test an LLMs ability to solve a coding task without prior knowledge of the task at hand, I don't think their benchmark is super useful.

If you care about understanding relative performance between models for solving known problems and producing correct output format, it's pretty useful.

- Even for well-known problems, we see a large distribution of quality between models (5 to 75% correctness) - Additionally, we see a large distribution of model's ability to produce responses in formats they were instructed in

At the end of the day, benchmarks are pretty fuzzy, but I always welcome a formalized benchmark as a means to understand model performance over vibe checking.

philipbjorge commented on Ask HN: What's the best way to get started with LLM-assisted programing? · Posted by u/mywittyname

gopye · 9 months ago

100% compatibility

philipbjorge · 8 months ago

devcontainers extension was a year out of date up until the last month or something? sorry, this is from memory, but definitely not 100% compatibility.

philipbjorge commented on Supreme Court upholds TikTok ban, but Trump might offer lifeline cnbc.com/2025/01/17/supre... · Posted by u/kjhughes

ryandrake · a year ago

I'm still not sure I understand the national security concerns around 17-year old nobodies publishing videos of themselves doing silly dances. Or the "metadata" those 17 year olds produce. Are people sharing nuclear secrets on TikTok or something (and not doing the same on US services)?

philipbjorge · a year ago

I haven't followed this closely, but I assumed it was related to a foreign entity having the ability to hyper-target content towards said 17 year olds (and the entire userbase in general) -- A modern form of psychological warfare.

philipbjorge commented on Weight-loss drug found to shrink muscle in mice, human cells ualberta.ca/en/folio/2024... · Posted by u/Eumenes

firecall · a year ago

Interestingly, when I was part of a weight loss diet study at my local university I actually gained muscle whilst losing weight.

I had multiple full body dexascans during the programme.

I didn’t change my exercise routine at all. I wasn’t hitting the gym or doing weights, just my usual basic cardio.

And I gained muscle and lost ~10kilos in weight.

It wasn’t much muscle, but the amount of muscle was higher than before.

philipbjorge · a year ago

The latest research I’ve pulled suggests that DEXA scans are fairly inaccurate and aren’t a reliable way to measure body composition even for the same person across time.

MRI is the gold standard, everything else is pretty loosely goosey.

Sorry, no references but this comes up pretty often in the science based lifting communities on Reddit and YouTube if you want to learn more.

philipbjorge commented on Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations github.com/Skyvern-AI/Sky... · Posted by u/suchintan

drothlis · a year ago

> Claude's ability to count pixels and interact with a screen using precise coordinate

I guess you mean its "Computer use" API that can (if I understand correctly) send mouse click at specific coordinates?

I got excited thinking Claude can finally do accurate object detection, but alas no. Here's its output:

> Looking at the image directly, the SPACE key appears near the bottom left of the keyboard interface, but I cannot determine its exact pixel coordinates just by looking at the image. I can see it's positioned below the letter grid and appears wider than the regular letter keys, but I apologize - I cannot reliably extract specific pixel coordinates from just viewing the screenshot.

This is 3.5 Sonnet (their most current model).

And they explicitly call out spatial reasoning as a limitation:

> Claude’s spatial reasoning abilities are limited. It may struggle with tasks requiring precise localization or layouts, like reading an analog clock face or describing exact positions of chess pieces.

--https://docs.anthropic.com/en/docs/build-with-claude/vision#...

Since 2022 I occasionally dip in and test this use-case with the latest models but haven't seen much progress on the spatial reasoning. The multi-modality has been a neat addition though.

philipbjorge · a year ago

They report that they trained the model to count pixels and based on accurate mouse clicks coming out of it, it seems to be the case for at least some code path.

> When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place. Training Claude to count pixels accurately was critical.

philipbjorge commented on Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations github.com/Skyvern-AI/Sky... · Posted by u/suchintan

glorpsicle · a year ago

Congrats on the launch! I've been keeping up with you folks since you last posted (a few months ago, I believe). How does Anthropic's recent announcement of Claude's "computer use" abilities grab you? What key differentiators does Skyvern have, at this point in time ("computer use" with Claude being relatively new)?

philipbjorge · a year ago

I work in this space and Claude's ability to count pixels and interact with a screen using precise coordinates seems like a genuinely useful innovation that I expect will improve upon existing approaches.

Existing approaches tend to involve drawing marked bounding boxes around interactive elements and then asking the LLM to provide a tool call like `click('A12')` where A12 remaps to the underlying HTML element and we perform some sort of Selenium/JS action. Using heuristics to draw those bounding boxes is tricky. Even performing the correct action can be tricky as it might be that click handlers are attached to a different DOM element.

Avoiding this remapping between a visual to an HTML element and instead working with high level operations like `click(x, y)` or `type("foo")` directly on the screen will probably be more effective at automating usecases.

That being said, providing HTML to the LLM as context does tend to improve performance on top of just visual inference right now.

So I dunno... I'm more optimistic about Claude's approach and am very excited about it... especially if visual inference continues to improve.