The irony of this, is that Microsoft was trying to push CoPilot everywhere, however eventually Apple, Google and JetBrains have their own AI integrations, taking CoPilot out of the loop.
Slowly the AI craziness at Microsoft is taking the similar shape, of going all in at the begining and then losing to the competition, that they also had with Web (IE), mobile (Windows CE/Pocket PC/WP 7/WP 8/UWP), the BUILD sessions that used to be all about UWP with the same vigour as they are all AI nowadays, and then puff, competition took over even if they started later, because Microsoft messed up delivery among everyone trying to meet their KPIs and OKRs.
I also love the C++ security improvements on this release.
Microsoft owns 49% of OpenAI so why they should worry? JetBrains just proudly announce that they now use GPT-5 by default.
> going all in at the begining and then losing to the competition
Sure, but there are counter examples too. Microsoft went late to the party of cloud computing. Today Azure is their main money printing machine. At some point Visual Studio seemed to be a legacy app only used for Windows-specific app development. Then they released VSCode and boom! It became the most popular editor by a huge margin[0].
Visual Studio is a bad example. It's used for Windows, Web, and Mobile. The big difference between the two is the cost. Visual Studio Pro is $100/month, Enterprise is $300/month, while VSCode is free. It was an incredibly smart marketing play by Microsoft to do that.
> At some point Visual Studio seemed to be a legacy app only used for Windows-specific app development. Then they released VSCode and boom!
I'm not sure what the point is. Visual Studio is still Windows-only; VS Code is not related to it in any shape or form, the name is deliberately misleading.
>The irony of this, is that Microsoft was trying to push CoPilot everywhere, however eventually Apple, Google and JetBrains have their own AI integrations, taking CoPilot out of the loop.
What is the irony? Microsoft integrated copilot in Vscode, bing, etc. Apple is integrating claude in Xcode, Jetbrains has their own AI.
Microsoft moved first with putting AI into their products then other companies put other AI into their products. Nothing about this seems ironic or in any way surprising.
The irony is that Microsoft has several cases where it gets there first, only to be left behind when competition catches up.
Bing is irrelevant, VSCode might top in some places, but it is cursor and Claude that people are reaching for, VS is really only used by people like myself that still care about Windows development or console SDKs, otherwise even for .NET people are switching to Rider.
CoPilot isn't anything Microsoft is trying to sell outside of their own products. And with GitHub Copilot there is no "copilot" model to choose, you can choose between Anthropic, OpenAI and Google models.
Sure UWP never caught on, but you know why? Win32, which by the way is also Microsoft, was way to popular and more flexible. Devs weren't going to re-write their apps to UWP in order to support phones.
People were writing to UWP. There were hundreds of UWP apps that got cancelled and abandoned when Microsoft ditched their Windows Phone once Nadella got in. He kill Windows Phone, he killed native Edge (Chakra JS) and a lot of other stuff to focus fully on Cloud and then AI.
Before that ex-Microsoft guy was responsible for killing Nokia OS/Meego too in favor of Windows Phone - which got abandoned. What a train-wreck of errors leading to the mobile phone duopoly today.
Just because you can’t or won’t win the market with your opportunistic investment, doesn’t mean you should let your competitors completely annihilate you by taking that investment for themselves.
Google, Apple, FB or AWS would have been suitors for that licensing deal if MS didn’t bite.
About GitHub Copilot in specific: One big negative was how when GPT-4 became available that Microsoft didn't upgrade paying Copilot users to it, they simply branded this "coming soon"/"beta" Copilot X for a while. We simply cancelled the only Copilot subscription we had at work.
I've been getting monthly emails that my free access for GitHub Copilot has been renewed for another month… for years. I've never used it, I thought that all GitHub users got it for free.
Microsoft mistook a product game for a distribution one. AI quality is heterogenous and advancing enough that people will make an effort to use the one they like best. And while CoPilot is excellently distributed, it’s a crap product, in large part due to the limits Microsoft put on GPT.
I use IntelliJ with the Copilot plugin, using Claude. My employer has a big subscription for everything from Microsoft, and that includes Copilot, so that's free for me. But somehow Copilot also gives me access to Claude. No idea how that works.
> But somehow Copilot also gives me access to Claude.
So the first AI on (in?) AI hack battle for sole survivorship has begun...
We know these models have security issues, including surreptitious prompting. So do they.
Things will get really ugly when we hit the consolidation phase, and unlucky models realize that other models' unchallenged successes are putting them in eminent danger of being aquifired. Aquimerged? Aquiborged?
umm I don't know what you are talking about, I use a Github Copilot 40 USD subscription in VSCode to code using various models, and this is the industry standard now in my region, as most employers are now giving employees the 10 USD subscription.
Also OpenAI pioneered but now the many competitors seem to have either caught up or surpassed them. They might still retain a significant brand recognition advantage as long as they don't fall too far behind, though.
Which competitor has alternative to ChatGPT Pro? I have Claude subscription and Opus 4.1 is not on the same level. ChatGPT Pro thinks for 5-10 minutes, while Opus either doesn't think at all or thinks very briefly. And response quality is absolutely different. ChatGPT Pro solves problems, Opus does not. Is there any competitor with "Pro" product which spends significant amount of computing for a single query?
LLVM is only relevant thanks to Apple in first place, otherwise it would still be an university project if at all, clang was born at Apple, and some of their employees are responsible for those improvements in collaboration with Google, presented at a LLVM Developers Meeting.
Almost no one uses copilot unless they are not allowed to use anything else or don’t know any better. MS could have been a leader in this space but MS couldn’t understand why people didn’t like copilot but loved the competition.
Once co-pilot tendrils and icons began appearing in all of my orgs tools, they announced we would no longer be able to expense subscriptions for others. Only those who haven’t used ChatGPT Pro, Claude, Gemini, etc have anything good to say about copilot.
Maybe because Microsoft is a shit company and anything they do is sus af. And everyone knows it. And I'm tired of pretending like it's not. I wouldn't trust Microsoft to babysit my mortal enemy's kids.
Maybe if they weren't literally the borg people would open their hearts and wallets to Redmond. They saw that Windows 10 was a privacy nightmare and what did they do? They doubled down in Windows 11. Not that I care but it plays really poorly. Every nerd on the internet spouts off about Recall even though it's not even enabled if you install straight to the latest build.
They bought GitHub and now it's a honeypot. We live in a world where we have to assume GitHub is adversarial.
_NSAKEY???
Fuck you Microsoft.
Makes sense karma catches up to them. Maybe if their mission statement and vision were pure or at least convincing they would win hearts and minds.
Interesting to think about how Apple get to make product decisions based on Gatekeeper OCSP analytics now that every app launch phones home. They must know exactly how popular VSCode is.
Facebook got excoriated for doing that with Onavo but I guess it's Good Actually when it's done in the name of protecting my computer from myself lol
Compared to stock Claude Code, this version of Claude knows a lot more about SwiftUI and related technologies. The following is output from Claude in Xcode on an empty project. Claude Code gives a generic response when it looked at the same project:
What I Can Help You With
• SwiftUI Development: Layout, state management, animations, etc.
• iOS/macOS App Architecture: MVVM, data flow, navigation
• Apple Frameworks: Core Data, CloudKit, MapKit, etc.
• Testing: Both traditional XCTest and the new Swift Testing framework
• Performance & Best Practices: Swift concurrency, memory management
Example of What We Could Do Right Now
Looking at your current ContentView.swift, I could help you:
• Transform this basic "Hello World" into a recovery tracking interface
• Add navigation, data models, or user interface components
• Implement proper architecture patterns for your Recovery Tracker app
Its not shipping the model in Xcode. You are still sending your data off to a remote provider, hoping that this provider behaves nicely with all this data and that the government will never force the provider to reveal your data.
3 days ago I saw another Claude praising submission on HN, and finally I signed up for it, to compare it with copilot.
I asked 2 things.
1. Create a boilerplate Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured. It generated garbage devicetree which didn't even compile. When I pointed it out, it apologized and generated another one that didn't compile. It configured also non-existent drivers, and for some reason it enabled monkey test support (but not test support).
2. I asked it to create 7x10 monochromatic pixelmaps, as C integer arrays, for numeric characters, 0-9. I also gave an example. It generated them, but number eight looked like zero. (There was no cross in ether 0 nor 8, so it wasn't that. Both were just a ring)
What am I doing wrong? Or is this really the state of the art?
Your first prompt is testing Claude as an encyclopedia: has it somehow baked into its model weights the exactly correct skeleton for a "Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured"?
Frequent LLM users will not be surprised to see it fail that.
The way to solve this particular problem is to make a correct example available to it. Don't expect it to just know extremely specific facts like that - instead, treat it as a tool that can act on facts presented to it.
For your second example: treat interactions with LLMs as an ongoing conversation, don't expect them to give you exactly what you want first time. Here the thing to do next is a follow-up prompt where you say "number eight looked like zero, fix that".
> For your second example: treat interactions with LLMs as an ongoing conversation, don't expect them to give you exactly what you want first time. Here the thing to do next is a follow-up prompt where you say "number eight looked like zero, fix that".
Personally, I treat those sort of mistakes as "misunderstandings" where I wasn't clear enough with my first prompt, so instead of adding another message (and increasing context further, making the responses worse by each message), I rewrite my first one to be clearer about that thing, and regenerate the assistant message.
Basically, if the LLM cannot one-shot it, you weren't clear enough, and if you go beyond the total of two messages, be prepared for the quality of responses to really sink fast. Even by the second assistant message, you can tell it's having an harder time keeping up with everything. Many models brag about their long contexts, but I still feel like the quality of responses to be a lot worse even once you reach 10% of the "maximum context".
It’s good at doing stuff like “host this all in Docker. Make a Postgres database with a Users table. Make a FastAPI CRUD endpoint for Users. Make a React site with a homepage, login page, and user dashboard”.
It’ll successfully produce _something_ like that, because there’s millions of examples of those technologies online. If you do anything remotely niche, you need to hold its hand far more.
The more complicated your requirements are, the closer you are to having “spicy autocomplete”. If you’re just making a crud react app, you can talk in high level natural language.
Did you try claude code and spend actual time going back and forth with it, reviewing it's code and providing suggestions; Instead of just expecting things to work first try with minimal requirements?
I see claude code as pair programming with a junior/mid dev that knows all fields of computer engineering. I still need to nudge it here and there, it will still make noob mistakes that I need to correct and I let it know how to properly do things when it gets them wrong. But coding sessions have been great and productive.
In the end, I use it when working with software that I barely know. Once I'm up and running, I rarely use it.
FWIW, I used Gemini to write an Objective-C app for Apple Rhapsody (!) that would enumerate drivers currently loaded by the operating systems (more or less save level of difficulty as the OP, I'd say?), using the PDF manual of NextStep's DriverKit as context.
It... sort of worked well? I had to have a few back-and-forth because it tried to use Objective-C features that did not exist back then (e.g. ARC), but all in all it was a success.
So yeah, niche things are harder, but on the other hand I didn't have to read 300 pages of stuff just to do this...
I agree, but I think there's an important distinction to be made.
In some cases, it just doesn't have the necessary information because the problem is too niche.
In other cases, it does have all the necessary information but fails to connect the dots, i.e. reasoning fails.
It is the latter issue that is affecting all LLMs to such a degree that I'm really becoming very sceptical of the current generation of LLMs for tasks that require reasoning.
They are still incredibly useful of course, but those reasoning claims are just false. There are no reasoning models.
In other words, the vibe coders of this world are just redundant noobs who don't really belong on the marketplace. They've written the same bullshit CRUD app every month for the past couple of years and now they've turned to AI to speed things up
Yeah, my experience with LÖVR [0] and LLM (ChatGPT) has been quite horrible. Since it's very niche and quite recently quite a big API change has happened, which I guess the model wasn't trained on. So it's kind of useless for that purpose.
Trying two things and giving up. It's like opening a REPL for a new language, typing some common commands you're familiar with, getting some syntax errors, then giving up.
You need how to learn to use your tools to get the best out of them!
Start by thinking about what you'd need to tell a new Junior human dev you'd never met before about the task if you could only send a single email to spec it out. There are shortcuts, but that's a good starting place.
In this case, I'd specifically suggest:
1. Write a CLAUDE.md listing the toolchains you want to work with, giving context for your projects, and listing the specific build, test etc. commands you work with on your system (including any helpful scripts/aliases you use). Start simple; you can have claude add to it as you find new things that you need to tell it or that it spends time working out (so that you don't need to do that every time).
2. In your initial command, include a pointer to an example project using similar tech in a directory that claude can read
3. Ask it to come up with a plan and ask for your approval before starting
I guess many find comfort in being able to task an ai with assignments that it cannot complete. Most sr developers I work with take this approach. It's not really a good way of assessing the usefulness of a tool though.
Try this prompt: Create a detailed step by step plan to implement a boilerplate Zephyr project skeleton for Pi Pico with configured st7789 SPI display drivers
Ask Opus or Gemini 2.5 Pro to write a plan. Then ask the other to critique it and fix mistakes. Then ask Sonnet to implement
I tried this myself and IMO, this might be basic and day-to-day for you, with unambiguous correct paths to follow, but this is pretty niche nevertheless. LLMs thrive when there's a wealth of examples and I struggle to Google what you asked myself, meaning that LLM will perform even worse than my try.
I found that second line works well for image prompts too. Tell one AI to help you with a prompt, and then take it back to the others to generate images.
> It configured also non-existent drivers, and for some reason it enabled monkey test support (but not test support).
If it doesn't have the underlying base data, it tends to hallucinates. (It's getting a bit difficult to tell when it has underlying data, because some models autonomously search the web). The models are good at transforming data however, so give it access to whatever data it needs.
Also let it work in a feedback loop: tell it to compile and fix the compile errors. You have to monitor it because it will sometimes just silence warnings and use invalid casts.
> What am I doing wrong? Or is this really the state of the art?
It may sound silly, but it's simply not good at 2D
> It may sound silly, but it's simply not good at can2D.
It's not silly at all, it's not very good at layouts either, it can generally make layouts but there is a high chance for subtle errors, element overlaps, text overflows, etc.
Mostly because it's a language model, i.e it doesn't generally see what it makes, you can send screenshots apparently and it will use it's embedded vision model, but I have not tried that.
There's a lot of people caricaturing the obvious fact that any model works best in distribution.
The more esoteric your stack, and the more complex the request, the more information it needs to have. The information can be given either through doing research separately (personally, I haven't had good results when asking Claude itself to do research, but I did have success using the web chat UI to create an implementation plan), or being more specific with your prompt.
As an aside, I have more than 10 years of experience, mostly with backend Python, and I'd have no idea what your prompts mean. I could probably figure it out after some google searches, tho. That's also true of Claude.
Here's an example of a prompt that I used recently when working on a new codebase. The code is not great, the math involved is non trivial (it's research-level code that's been productionized in hurry). This literally saved 4 hours of extremely boring work, digging through the code to find various hardcoded filenames, downloading them, scp'ing them, and using them to do what I want. It one-shotted it.
> The X pipeline is defined in @airflow/dags/x.py, and Y in `airflow/dags/y.py` and the relevant task is `compute_X`, and `compute_Y`, respectively. Your task is to:
> 1. Analyze the X and Y DAGs and and how `compute_X` functions are called in that particular context, including it's arguments. If we're missing any files (we're probably missing at least one), generate a .sh file with aws cli or curl commands necessary for downloading any missing data (I don't have access to S3 from this machine, but I do have in a remote host). Use, say, `~/home` as the remote target folder.
> 2. If we needed to download anything from S3, i.e. from the remote host, output rsync/scp commands I can use to copy them to my local folder, keeping the correct/expected directory structure. Note that direct inputs reside under `data/input`, while auxiliary data resides in other folders under `data`. Do not run them, simply output them. You can use for example `scp user@server.org ...`
> 3. Write another snapshot test for X under `tests/snapshot`, and one for Y. Use a pattern as similar as possible to the other tests there. Do not attempt to run the tests yet, since I'll need to download the data first.
> If you need any information from Airflow, such as logs or output values, just ask and I can provide them. Think hard.
Real vibe coding is fake, especially for something niche like what you asked it to do. Imagine a hyperactive eidetic fresh out of high school was literally sitting in the other room. What would you tell her? That’s a good rule of thumb for the level of detail and guidance
> What am I doing wrong? Or is this really the state of the art?
You're treating the tool like it was an oracle. The correct way is to treat it as a somewhat autistic junior dev: give it examples and process to follow, tell it to search the web, read the docs, how to execute tests. Especially important is either directly linking or just copy pasting any and all relevant documentation.
The tool has a lossily compressed knowledge database of the public internet and lots of books. You want to fix the relevant lossy parts in the context. The less popular something is, the more context will be needed to fill the gaps.
> The correct way is to treat it as a somewhat autistic junior dev: give it examples and process to follow, tell it to search the web, read the docs, how to execute tests. Especially important is either directly linking or just copy pasting any and all relevant documentation.
Like "Translate this pdf to html using X as a templating language". It shines at stuff like that.
As a dev, I encounter tons of one-off scenarios like this.
You can no longer answer "what is the state of the art” by pointing to a model.
Generating a state-of-the-art response to your request involves a back-and-forth with the agent about your requirements, having a agent generate and carry out a deep research plan to collect documentation, then having the agent generate and carry out a development plan to carry it out.
So while Claude is not the best model in terms of raw IQ, the reason why it's considered the best coding model is because of its ability to execute all these steps in one go which, in aggregate, generates a much better result (and is less likely to lose its mind).
Ok. several tips I can give.
1. Setup a sub-agent to do RESEARCH. It is important that it only has read-only and web access tools.
2. Use planning mode and also ask the agent to use the subagent to research best pratices with the tech that you are wanting to do, before it builds a plan.
3. When ever it gets hung up.. tell it to use the sub-agent to research the solution.
That will get you a lot better initial solution. I typically use Sonnet for the sub-agents and Opus for the main agent, but sonnet all around should be fine too for the most part.
In my experience Claude is quite good at the popular stacks in the JavaScript, Python and PHP world. It struggled quite a bit when I asked it non-trivial questions in C or Rust for example. For smaller languages (e.g., Crystal) it seems to hallucinate a lot. I think since a lot of people work in JS, Python and PHP, that’s where Claude is very valuable and that’s where a lot of the praise feel justified too.
I have had no problems with using Claude on large rust projects. The compiler errors usually point it towards fixing its mistakes (just like they do for me).
I've had similar experiences when working on non-web tech.
There are parts in the codebase I'd love some help such as overly complex C++ templates and it almost never works out. Sometimes I get useful pointers (no pun intended) what the problem actually is but even that seems a bit random. I wonder if it's actually faster or slower than traditional reading & thinking myself.
The only way I manage to get any benefits from LLMs is to use them as an interactive rubber duck.
Dump your thoughts in a somewhat arranged manner, tell it about your plan, the current status, the end goal, &c. After that tell it to write 0 code for now but to ask questions and find gaps in your plan. 30% of it will be bullshit but the rest is somewhat useable. Then you can ask for some code but if you care about quality or consistency with you existing code base you probably will have to rewrite half of it, and that's if the code works in the first place
Garbage in garbage out is true for training but it's also true for interactions
LLMs are actually terrible at generating art unless they're specifically trained for that type of work. Its crazy how many times I've asked for some UI elements to be drawn using a graphics context and it comes out totally wrong.
One of the things you can do is provide a guidance file like CLAUDE.md including not only style preferences but also domain knowledge so it has greater context and knows where to look. Just ask it make one and then update and change as needed.
Tbh dawg, those tasks sound intentionally obtuse. It looks like u are doing more esoteric work than the crud react slop us mortals play in on the daily which is where ai shines.
I work almost exclusively with embedded devices, with low level code (mostly C, Rust, Assembly and related frameworks) - and that's where I also ask for help from LLMs.
Sounds like you picked some obscure tasks to test it that would obviously have low representation in the data set? That is not to say it can't be helpful augmenting some lower represented frameworks/tools - just you'll need to equip it with better context (MCPs/Docs/Instruction files)
A key skill in using an LLM agentic tool is being discerning in which tasks to delegate to it and which to take on yourself. Try develop that skill and maybe you will have better luck.
What an odd thing to ask it. I installed claude code and ran it from my terminal. Just asked it to simply give me a node based rest API with X endpoints with these jobs, and then I told it to write the unreal engine c++ to consume those endpoints. 2500 lines of code later, it worked.
What you're doing wrong is that you're asking it for something more complicated than babby's first webapp in javascript/python.
When people say things like "I told Claude what I wanted and it did it all on the first try!", that's what they mean. Basic web stuff that that is already present in the model's training data in massive volumes, so it has no issue recreating it.
No matter how much AI fanatics try to convince you otherwise, LLMs are not actually capable of software engineering and never will be. They are largely incapable of performing novel tasks that are not already well represented in their weights, like the ones you tried.
What they are not capable of is replacing YOU, the human who is supposed to be part of the whole process (incl. architectural). I do not think that this is a limitation. In fact, I like being part of the process.
My coding ranges from "exotic" to "boiler plate" on any given day.
> Create a boilerplate Zephyr project skeleton, for Pi Pico
Yea... Asking Claude to help you with a low documentation build root system is going to go about the same way, I know first hand about how this works.
> I asked it to create 7x10 monochromatic pixelmaps
Wrong tool for the job here. I dont think IDE and Pixelmaps have as large of an intersection as you think they do. Claude thinks in tokens not pixels.
Pick a common language (js, python, rust, golang) pick something easy (web page, command line script, data ingestion) and start there. See what it can do and does well, then start pushing into harder things.
The thing you are doing wrong is asking it to solve hard problems. Claude Code excels at solving fairly easy, but tedious stuff. Refactors that are brainless but take an hour. It will knock those out of the park. Fire up a git worktree and let it spin on your tedious API changes and stuff while you do the hard stuff. Unfortunately, you'll still need to use your brain for that.
So I've used Zephyr. The thing you're doing wrong is expecting LLMs to scaffold you a bunch of files from a relatively niche domain. Zephyr is also a mess of complexity with poor documentation. You should ask it to consult official docs and ask it to use existing tools (west etc) and board defs to do the scaffolding.
I just had AI write me a scraper and download 5TB of invaluable data which I had been eyeing for a long time. All in ten days. At the end of it, I still don’t know anything about python. It’s a bliss for people like me. All dependencies installed themselves. I look forward to using it even more.
One frustration was the code changed so much in ChatGPT so had to be lots of prompts. But I had no idea what the code was anyways. Understood vibe coding. Just used ChatGPT on a whim. Liked the end result.
It seems every IDE now has AI built-in. That's a problem if you're working on highly confidential code. You never know when the AI is going to upload code snippets to the server for analysis.
Not trying to be mean but I would expect comments on HN on these kind of stories to be from people who have used AI in IDEs at this point. There is no AI integration that runs automatically on a codebase.
This is HN. 10 years ago that would be true, but now I expect 99% of commenters to have newer used the thing they are talking about or used it once 20 years ago for 10 minutes, or even nkt read the article.
They both support it via plugins. Xcode doesn’t enable it by default, you need to enable it and sign into an account. It’s not really all that different.
This is not a realistic concern. If you're working on highly confidential code (in a serious meaning of that phrase), your while environment is already either offline or connecting only through a tightly controlled corporate proxy. There's no accidental leaks to AI from those environments.
There are ranges of security concerns and high confidentiality.
For most corporate code (that is highly confidential) you still have proper internet access, but you sure as hell can't just send your code to all AI providers just because you want to, just because it's built into your IDE.
There is a gulf and many shades between "this code should never be on an internet-connected device" and "it doesn't matter if this code is copied everywhere by absolutely anyone".
> In the OpenAI API, “GPT-5” corresponds to the “minimal” reasoning level, and “GPT-5 (Reasoning)” corresponds to the “low” reasoning level. (159135374)
It's interesting that the highest level of reasoning that GPT-5 in XCode supports is actually the "low" reasoning level. Wonder why.
you can use the API key, and it’ll give you access to all the model.
This is Claude sign in using your account. If you’ve signed up for Claude Pro or Max then you can use it directly. But, they should give access to Opus as well.
“Boycott” is a pretty strong term. I’m sensing a strong dislike of ai from you which is fine but if you dislike a feature most people like it shouldn’t be surprising to you that you’ll find yourself mostly catered to by more niche editors.
I think it's a pretty good word, let's not forget how LLMs learned about code in the first place... by "stealing" all the snippets they can get their curl hands on.
If you're on macOS there's Code Edit as a native solution (fully open source, not VC backed, MIT licensed), but it's currently in active development: https://www.codeedit.app/.
Otherwise there's VSCodium which is what I'm using until I can make the jump to Code Edit.
Okay dann lass die Ablage erst laufen ohne Teig dann kannst du mit Teig machen wenn du übergaben machst zwischen 13:30 und 14:00 Uhr dann bitte schichtführer/in Bescheid sagen bzw. geben tschüss
I couldn't get it to properly syntax highlight and autosuggest even after spending over an hour hunting through all sorts of terrible documentation for kate, clangd, etc. It also completely hides all project files that aren't in source control, and the only way to stop it is to disable the git plugin. What a nightmare. Maybe I'll try VSCodium next.
Of course it is, because that would be an aggressively stupid thing to do. Like boycotting syntax highlighting, spellckecking, VCS integration or a dozen other features that are th whole pint of IDEs.
If you don’t want to use LLM coding assistants – or if you can’t, or it’s not a technology suitable for your work – nobody cares. It’s totally fine. You don’t need to get performatively enraged about it.
Slowly the AI craziness at Microsoft is taking the similar shape, of going all in at the begining and then losing to the competition, that they also had with Web (IE), mobile (Windows CE/Pocket PC/WP 7/WP 8/UWP), the BUILD sessions that used to be all about UWP with the same vigour as they are all AI nowadays, and then puff, competition took over even if they started later, because Microsoft messed up delivery among everyone trying to meet their KPIs and OKRs.
I also love the C++ security improvements on this release.
> going all in at the begining and then losing to the competition
Sure, but there are counter examples too. Microsoft went late to the party of cloud computing. Today Azure is their main money printing machine. At some point Visual Studio seemed to be a legacy app only used for Windows-specific app development. Then they released VSCode and boom! It became the most popular editor by a huge margin[0].
[0]: https://survey.stackoverflow.co/2025/technology#most-popular...
They use it because the corporation mandates it.
Power at OpenAI seems orthogonal to ownership, precedent or even frankly their legal documents.
What is the irony? Microsoft integrated copilot in Vscode, bing, etc. Apple is integrating claude in Xcode, Jetbrains has their own AI.
Microsoft moved first with putting AI into their products then other companies put other AI into their products. Nothing about this seems ironic or in any way surprising.
Apple and Google will never choose to integrate Microsoft's services or products willingly.
It would have been more surprising if they decided to depend on Microsoft.
Bing is irrelevant, VSCode might top in some places, but it is cursor and Claude that people are reaching for, VS is really only used by people like myself that still care about Windows development or console SDKs, otherwise even for .NET people are switching to Rider.
Sure UWP never caught on, but you know why? Win32, which by the way is also Microsoft, was way to popular and more flexible. Devs weren't going to re-write their apps to UWP in order to support phones.
Before that ex-Microsoft guy was responsible for killing Nokia OS/Meego too in favor of Windows Phone - which got abandoned. What a train-wreck of errors leading to the mobile phone duopoly today.
And Windows 11 was the reboot of Windows 10X,
https://www.youtube.com/watch?v=ztrmrIlgbIc
Deleted Comment
Google, Apple, FB or AWS would have been suitors for that licensing deal if MS didn’t bite.
I've been getting monthly emails that my free access for GitHub Copilot has been renewed for another month… for years. I've never used it, I thought that all GitHub users got it for free.
Microsoft Copilot (formerly Bing Chat)
Microsoft 365 Copilot
Microsoft Copilot Studio
GitHub Copilot
Microsoft Security Copilot
Copilot for Azure
Copilot for Service
Sales Copilot
Copilot for Data & Analytics (Fabric)
Copilot Pro
Copilot Vision
So the first AI on (in?) AI hack battle for sole survivorship has begun...
We know these models have security issues, including surreptitious prompting. So do they.
Things will get really ugly when we hit the consolidation phase, and unlucky models realize that other models' unchallenged successes are putting them in eminent danger of being aquifired. Aquimerged? Aquiborged?
These are courtesy of LLVM/Clang (which Xcode ships with), rather than Xcode itself.
Maybe if they weren't literally the borg people would open their hearts and wallets to Redmond. They saw that Windows 10 was a privacy nightmare and what did they do? They doubled down in Windows 11. Not that I care but it plays really poorly. Every nerd on the internet spouts off about Recall even though it's not even enabled if you install straight to the latest build.
They bought GitHub and now it's a honeypot. We live in a world where we have to assume GitHub is adversarial.
_NSAKEY???
Fuck you Microsoft.
Makes sense karma catches up to them. Maybe if their mission statement and vision were pure or at least convincing they would win hearts and minds.
Facebook got excoriated for doing that with Onavo but I guess it's Good Actually when it's done in the name of protecting my computer from myself lol
https://appleinsider.com/articles/22/06/06/apple-now-has-ove...
The real news is when Codex CLI / Claude Code get integrated, or Apple introduces a competitor offering to them.
Until then this is a toy and should not be used for any serious work while these far better tools exist.
Compared to stock Claude Code, this version of Claude knows a lot more about SwiftUI and related technologies. The following is output from Claude in Xcode on an empty project. Claude Code gives a generic response when it looked at the same project:
/s
https://news.ycombinator.com/item?id=45062683 (Anthropic reverses privacy stance, will train on Claude chats)
I asked 2 things.
1. Create a boilerplate Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured. It generated garbage devicetree which didn't even compile. When I pointed it out, it apologized and generated another one that didn't compile. It configured also non-existent drivers, and for some reason it enabled monkey test support (but not test support).
2. I asked it to create 7x10 monochromatic pixelmaps, as C integer arrays, for numeric characters, 0-9. I also gave an example. It generated them, but number eight looked like zero. (There was no cross in ether 0 nor 8, so it wasn't that. Both were just a ring)
What am I doing wrong? Or is this really the state of the art?
Your first prompt is testing Claude as an encyclopedia: has it somehow baked into its model weights the exactly correct skeleton for a "Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured"?
Frequent LLM users will not be surprised to see it fail that.
The way to solve this particular problem is to make a correct example available to it. Don't expect it to just know extremely specific facts like that - instead, treat it as a tool that can act on facts presented to it.
For your second example: treat interactions with LLMs as an ongoing conversation, don't expect them to give you exactly what you want first time. Here the thing to do next is a follow-up prompt where you say "number eight looked like zero, fix that".
Personally, I treat those sort of mistakes as "misunderstandings" where I wasn't clear enough with my first prompt, so instead of adding another message (and increasing context further, making the responses worse by each message), I rewrite my first one to be clearer about that thing, and regenerate the assistant message.
Basically, if the LLM cannot one-shot it, you weren't clear enough, and if you go beyond the total of two messages, be prepared for the quality of responses to really sink fast. Even by the second assistant message, you can tell it's having an harder time keeping up with everything. Many models brag about their long contexts, but I still feel like the quality of responses to be a lot worse even once you reach 10% of the "maximum context".
It’ll successfully produce _something_ like that, because there’s millions of examples of those technologies online. If you do anything remotely niche, you need to hold its hand far more.
The more complicated your requirements are, the closer you are to having “spicy autocomplete”. If you’re just making a crud react app, you can talk in high level natural language.
I see claude code as pair programming with a junior/mid dev that knows all fields of computer engineering. I still need to nudge it here and there, it will still make noob mistakes that I need to correct and I let it know how to properly do things when it gets them wrong. But coding sessions have been great and productive.
In the end, I use it when working with software that I barely know. Once I'm up and running, I rarely use it.
It... sort of worked well? I had to have a few back-and-forth because it tried to use Objective-C features that did not exist back then (e.g. ARC), but all in all it was a success.
So yeah, niche things are harder, but on the other hand I didn't have to read 300 pages of stuff just to do this...
In some cases, it just doesn't have the necessary information because the problem is too niche.
In other cases, it does have all the necessary information but fails to connect the dots, i.e. reasoning fails.
It is the latter issue that is affecting all LLMs to such a degree that I'm really becoming very sceptical of the current generation of LLMs for tasks that require reasoning.
They are still incredibly useful of course, but those reasoning claims are just false. There are no reasoning models.
---
[0]: https://lovr.org
Trying two things and giving up. It's like opening a REPL for a new language, typing some common commands you're familiar with, getting some syntax errors, then giving up.
You need how to learn to use your tools to get the best out of them!
Start by thinking about what you'd need to tell a new Junior human dev you'd never met before about the task if you could only send a single email to spec it out. There are shortcuts, but that's a good starting place.
In this case, I'd specifically suggest:
1. Write a CLAUDE.md listing the toolchains you want to work with, giving context for your projects, and listing the specific build, test etc. commands you work with on your system (including any helpful scripts/aliases you use). Start simple; you can have claude add to it as you find new things that you need to tell it or that it spends time working out (so that you don't need to do that every time).
2. In your initial command, include a pointer to an example project using similar tech in a directory that claude can read
3. Ask it to come up with a plan and ask for your approval before starting
If you just selected a random developer do you think they're going to have any idea why your talking about?
The issue is LLMs will never say, sorry, IDK how to do this. Like a stressed out intern they just make up stuff and hope it passes review.
Providing a woefully inadequate descriptions to others (Claude & us) and still expecting useful responses?
Ask Opus or Gemini 2.5 Pro to write a plan. Then ask the other to critique it and fix mistakes. Then ask Sonnet to implement
If it doesn't have the underlying base data, it tends to hallucinates. (It's getting a bit difficult to tell when it has underlying data, because some models autonomously search the web). The models are good at transforming data however, so give it access to whatever data it needs.
Also let it work in a feedback loop: tell it to compile and fix the compile errors. You have to monitor it because it will sometimes just silence warnings and use invalid casts.
> What am I doing wrong? Or is this really the state of the art?
It may sound silly, but it's simply not good at 2D
It's not silly at all, it's not very good at layouts either, it can generally make layouts but there is a high chance for subtle errors, element overlaps, text overflows, etc.
Mostly because it's a language model, i.e it doesn't generally see what it makes, you can send screenshots apparently and it will use it's embedded vision model, but I have not tried that.
The more esoteric your stack, and the more complex the request, the more information it needs to have. The information can be given either through doing research separately (personally, I haven't had good results when asking Claude itself to do research, but I did have success using the web chat UI to create an implementation plan), or being more specific with your prompt.
As an aside, I have more than 10 years of experience, mostly with backend Python, and I'd have no idea what your prompts mean. I could probably figure it out after some google searches, tho. That's also true of Claude.
Here's an example of a prompt that I used recently when working on a new codebase. The code is not great, the math involved is non trivial (it's research-level code that's been productionized in hurry). This literally saved 4 hours of extremely boring work, digging through the code to find various hardcoded filenames, downloading them, scp'ing them, and using them to do what I want. It one-shotted it.
> The X pipeline is defined in @airflow/dags/x.py, and Y in `airflow/dags/y.py` and the relevant task is `compute_X`, and `compute_Y`, respectively. Your task is to:
> 1. Analyze the X and Y DAGs and and how `compute_X` functions are called in that particular context, including it's arguments. If we're missing any files (we're probably missing at least one), generate a .sh file with aws cli or curl commands necessary for downloading any missing data (I don't have access to S3 from this machine, but I do have in a remote host). Use, say, `~/home` as the remote target folder.
> 2. If we needed to download anything from S3, i.e. from the remote host, output rsync/scp commands I can use to copy them to my local folder, keeping the correct/expected directory structure. Note that direct inputs reside under `data/input`, while auxiliary data resides in other folders under `data`. Do not run them, simply output them. You can use for example `scp user@server.org ...`
> 3. Write another snapshot test for X under `tests/snapshot`, and one for Y. Use a pattern as similar as possible to the other tests there. Do not attempt to run the tests yet, since I'll need to download the data first.
> If you need any information from Airflow, such as logs or output values, just ask and I can provide them. Think hard.
You're treating the tool like it was an oracle. The correct way is to treat it as a somewhat autistic junior dev: give it examples and process to follow, tell it to search the web, read the docs, how to execute tests. Especially important is either directly linking or just copy pasting any and all relevant documentation.
The tool has a lossily compressed knowledge database of the public internet and lots of books. You want to fix the relevant lossy parts in the context. The less popular something is, the more context will be needed to fill the gaps.
Like "Translate this pdf to html using X as a templating language". It shines at stuff like that.
As a dev, I encounter tons of one-off scenarios like this.
Generating a state-of-the-art response to your request involves a back-and-forth with the agent about your requirements, having a agent generate and carry out a deep research plan to collect documentation, then having the agent generate and carry out a development plan to carry it out.
So while Claude is not the best model in terms of raw IQ, the reason why it's considered the best coding model is because of its ability to execute all these steps in one go which, in aggregate, generates a much better result (and is less likely to lose its mind).
Which one is, and by what metric? I always end up back at Claude after trying other models because it is so much better at real world applications.
That will get you a lot better initial solution. I typically use Sonnet for the sub-agents and Opus for the main agent, but sonnet all around should be fine too for the most part.
There are parts in the codebase I'd love some help such as overly complex C++ templates and it almost never works out. Sometimes I get useful pointers (no pun intended) what the problem actually is but even that seems a bit random. I wonder if it's actually faster or slower than traditional reading & thinking myself.
Dump your thoughts in a somewhat arranged manner, tell it about your plan, the current status, the end goal, &c. After that tell it to write 0 code for now but to ask questions and find gaps in your plan. 30% of it will be bullshit but the rest is somewhat useable. Then you can ask for some code but if you care about quality or consistency with you existing code base you probably will have to rewrite half of it, and that's if the code works in the first place
Garbage in garbage out is true for training but it's also true for interactions
Deleted Comment
After a few iteration i then ask it to implement the design doc to mostly-better results.
I wonder if it's because there are maybe millions of MSDN articles, but I don't know if a Java analog to MSDN exists.
A key skill in using an LLM agentic tool is being discerning in which tasks to delegate to it and which to take on yourself. Try develop that skill and maybe you will have better luck.
When people say things like "I told Claude what I wanted and it did it all on the first try!", that's what they mean. Basic web stuff that that is already present in the model's training data in massive volumes, so it has no issue recreating it.
No matter how much AI fanatics try to convince you otherwise, LLMs are not actually capable of software engineering and never will be. They are largely incapable of performing novel tasks that are not already well represented in their weights, like the ones you tried.
My coding ranges from "exotic" to "boiler plate" on any given day.
> Create a boilerplate Zephyr project skeleton, for Pi Pico
Yea... Asking Claude to help you with a low documentation build root system is going to go about the same way, I know first hand about how this works.
> I asked it to create 7x10 monochromatic pixelmaps
Wrong tool for the job here. I dont think IDE and Pixelmaps have as large of an intersection as you think they do. Claude thinks in tokens not pixels.
Pick a common language (js, python, rust, golang) pick something easy (web page, command line script, data ingestion) and start there. See what it can do and does well, then start pushing into harder things.
One frustration was the code changed so much in ChatGPT so had to be lots of prompts. But I had no idea what the code was anyways. Understood vibe coding. Just used ChatGPT on a whim. Liked the end result.
Autocomple is also automatically triggered when you place your cursor inside the code.
Don't be naive.
Wont work by default if I'm reading this correctly
I spent the last 6 months trying to convince them not to block all outbound traffic by default.
For most corporate code (that is highly confidential) you still have proper internet access, but you sure as hell can't just send your code to all AI providers just because you want to, just because it's built into your IDE.
But I guess the user could still get a 3rd party plugin.
I do not think this will be an issue for big companies.
Also, there are plenty of editors and IDEs that don’t.
Let’s stop pretending like you’re being forced into this. You aren’t.
Deleted Comment
There's simply no way to properly secure network connected developer systems.
you can use Claude via bedrock and benefit from AWS trust
Gemini? Google owns your e-mail. Maybe you're even one of those weirdos who doesn't use Google for e-mail - I bet your recipient does.
so... they have your code, your secrets, etc.
It's interesting that the highest level of reasoning that GPT-5 in XCode supports is actually the "low" reasoning level. Wonder why.
This is Claude sign in using your account. If you’ve signed up for Claude Pro or Max then you can use it directly. But, they should give access to Opus as well.
Otherwise there's VSCodium which is what I'm using until I can make the jump to Code Edit.
Deleted Comment
https://github.com/JetBrains/intellij-community
If you don’t want to use LLM coding assistants – or if you can’t, or it’s not a technology suitable for your work – nobody cares. It’s totally fine. You don’t need to get performatively enraged about it.