Some questions:
[Tech]
1. How deep does the modification go? If I request a tweek to the YouTube homepage, do I need to re-specify or reload the tweek to have it persist across the entire site (deeply nested pages, iframes, etc.)
2. What is your test and eval setup? How confident are you that the model is performing the requested change without being overly aggressive and eliminating important content?
3. What is your upkeep strategy? How will you ensure that your system continues to WAI after site owners update their content in potentially adversarial ways? In my experience LLMs do a fairly poor job at website understanding when the original author is intentionally trying to mess with the model, or has overly complex CSS and JS.
4. Can I prompt changes that I want to see globally applied across all sites (or a category of sites)? For example, I may want a persistent toolbar for quick actions across all pages -- essentially becoming a generic extension builder.
[Privacy]
5. Where and how are results being cached? For example, if I apply tweeks to a banking website, what content is being scraped and sent to an LLM? When I reload a site, is content being pulled purely from a local cache on my machine?
[Business]
6. Is this (or will it be) open source? IMO a large component of empowering the user against enshittification is open source. As compute commoditizes it will likely be open source that is the best hope for protection against the overlords.
7. What is your revenue model? If your product essentially wrestles control from site owners and reduces their optionality for revenue, your arbitrage is likely to be equal or less than the sum of site owners' loss (a potentially massive amount to be sure). It's unclear to me how you'd capture this value though, if open source.
8. Interested in the cost and latency. If this essentially requires an LLM call for every website I visit, this will start to add up. Also curious if this means that my cost will scale with the efficiency of the sites I visit (i.e. do my costs scale with the size of the site's content).
Very cool.
Cheers
Sounds more like they were losing market to other retailers.
Can we also get the ability to filter by seller entity country of origin?
Amazon also needs to offer far better tools for buyers to effectively find and attach to brands.
Buying a house is often an emotionally motivated decision with many important risk factors... Were inspections comprehensive and thorough or did they overlook an issue with the foundation, wiring, plumbing, etc.? Did the buyer understand the required disclosures, and specifically understand what the seller is not obligated to disclose in their jurisdiction? (e.g., in many areas the seller is not obligated to disclose if a child sex offender lives next door). Is the neighborhood up and coming or struggling?
Getting these wrong can easily negate the potential upsides of ownership.
Renting can also be great, but as the article points out, if it mostly just results in more disposable cash, then you may be better off owning (forced savings). Rental properties often cannot be sublet and also cannot be used as collateral or passed on to family members with a step-up in basis. Rental leases also fluctuate with the market, so it's not uncommon in big cities for renters to be paying close or equal prices of their homeowners next door.
Anecdotally I know many folks who have rented their way through, invested wisely, and done well. I also know folks who have moved around the US and always purchased, did their homework, capitalized on tax incentives, and now have a stable of rental properties that helped them become FIRE.
We should be educating children at a young age about the benefits and risks of social media. We haven't adapted the way we educate society in light of massive tech changes.
This will likely be a topic that future humans look back on and wonder why we did this to ourselves.
Microsoft had three personas for software engineers that were eventually retired for a much more complex persona framework called people in context (the irony in relation to this article isn’t lost on me).
But those original personas still stick with me and have been incredibly valuable in my career to understand and work effectively with other engineers.
Mort - the pragmatic engineer who cares most about the business outcome. If a “pile of if statements” gets the job done quickly and meets the requirements - Mort became a pejorative term at Microsoft unfortunately. VB developers were often Morts, Access developers were often Morts.
Elvis - the rockstar engineer who cares most about doing something new and exciting. Being the first to use the latest framework or technology. Getting visibility and accolades for innovation. The code might be a little unstable - but move fast and break things right? Elvis also cares a lot about the perceived brilliance of their code - 4 layers of abstraction? That must take a genius to understand and Elvis understands it because they wrote it, now everyone will know they are a genius. For many engineers at Microsoft (especially early in career) the assumption was (and still is largely) that Elvis gets promoted because Elvis gets visibility and is always innovating.
Einstein - the engineer who cares about the algorithm. Einstein wants to write the most performant, the most elegant, the most technically correct code possible. Einstein cares more if they are writing “pythonic” code than if the output actually solves the business problem. Einstein will refactor 200 lines of code to add a single new conditional to keep the codebase consistent. Einsteins love love love functional languages.
None of these personas represent a real engineer - every engineer is a mix, and a human with complex motivations and perspectives - but I can usually pin one of these 3 as the primary within a few days of PRs and a single design review.
I spent time at Microsoft as well, and one of the things I noticed was folks who spent time in different disciplines (e.g. dev, test, pgm) seemed to be especially great at tailoring these qualities to their needs. If you're working on optimizing a compiler, you probably need a bit more Einstein and Mort than Elvis. If you're working on a game engine you may need a different combination.
The quantities of each (or whether these are the correct archetypes) is certainly debatable, but understanding that you need all of them in different proportions over time is important, IMHO.
A few things to consider:
1. This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent? The author is an OpenAI employee IIUC, so it begs this question. Sora's demos were amazing until you tried it, and realized it took 50 attempts to get a usable clip.
2. The author noted that humans had updated their own research in April 2025 with an improved solution. For cases where we detect signs of superior behavior, we need to start publishing the thought process (reasoning steps, inference cycles, tools used, etc.). Otherwise it's impossible to know whether this used a specialty model, had access to the more recent paper, or in other ways got lucky. Without detailed proof it's becoming harder to separate legitimate findings from marketing posts (not suggesting this specific case was a pure marketing post)
3. Points 1 and 2 would help with reproducibility, which is important for scientific rigor. If we give Claude the same tools and inputs, will it perform just as well? This would help the community understand if GPT-5 is novel, or if the novelty is in how the user is prompting it
At the same time, I've realized that "let me just try to squeeze out the last of my career" is a really unhealthy mindset for me to hold. It sort of locks me into a feeling like my best days are behind me or something.
So I am trying to dabble in using AI for coding and trying to make sure I stay open-minded and open to learning new things. I don't want to feel like a dinosaur.
There are many perspectives on coding agents because there are many different types of engineers, with different levels of experience.
In my interactions I've found that junior engineers overestimate or overuse the capabilities of these agents, while more senior engineers are better calibrated.
The biggest challenge I see is what to do in 5 years once a generation of fresh engineers never learned how compilers, operating systems, hardware, memory, etc actually work. Innovation almost always requires deep understanding of the fundamentals, and AI may erode our interest in learning these critical bits of knowledge.
What I see as a hiring manager is senior (perhaps older) engineers commanding higher comp, while junior engineers become increasingly less in demand.
Agents are here to stay, but I'd estimate your best engineering days are still ahead.
Not to mention Christmas trees, moving, helping friends out, etc.