Ask HN: Go deep into AI/LLMs or just use them as tools?

antirez · 9 months ago

My 2 centes:

1. Learn basic NNs at a simple level, build from scratch (no frameworks) a feed forward neural network with back propagation to train against MNIST or something as simple. Understand every part of it. Just use your favorite programming language.

2. Learn (without having to implement with the code, or to understand the finer parts of the implementations) how the NN architectures work and why they work. What is an encoder-decoder? Why the first part produces an embedding? How a transformer works? What are the logits in the output of an LLM, and how sampling works? Why is attention of quadratic? What is Reinforcement Learning, Resnets, how do they work? Basically: you need a solid qualitative understanding of all that.

3. Learn the higher level layer, both from the POV of the open source models, so how to interface to llama.cpp / ollama / ..., how to set the context window, what is quantization and how it will affect performances/quality of output, and also, how to use popular provider APIs like DeepSeek, OpenAI, Anthropic, ... and what model is good for what.

4. Learn prompt engineering techniques that influence the qualtily of the output when using LLMs programmatically (as a bag of algorithms). This takes patience and practice.

5. Learn how to use AI effectively for coding. This is absolutely non-trivial, and a lot of good programmers are terrible LLMs users (and end believing LLMs are not useful for coding).

6. Don't get trapped into the idea that the news of the day (RAG, MCP, ...) is what you should spend all your energy. This is just some useful technology surrounded by a lot of hype of all the people that want to get rich with AI and understand they can't compete with the LLMs themselves. So they pump the part that can be kinda "productized". Never forget that the product is the neural network itself, for the most part.

gpjt · 9 months ago

This, 100%. A full-stack engineer will likely have at least a solid understanding of the HTTP protocol, HTTPS, WebSockets, the interface layer between the frontend server and their chosen Web webdev stack, and so on. Then a more general understanding of networking protocols, TCP vs UDP, DNS, routing, etc. In general, you need to have a solid understanding of the layer below where you're working, some understanding of the layer below that, and so on, less and less detail needed for each layer down.

(That's not to say that you shouldn't bother with learning more -- more knowledge is always good -- or that the OP specifically only knows that. It's more a sensible minimum.)

My own "curriculum" for that has been Jeremy Howard's Fast AI course and Sebastian Raschka's book "build an LLM from scratch". Still working through it, but once I'm done I think I'll be solid on your point 2 above. My guess is that I'll want to learn more, but that's out of interest more than because I think its necessary.

losvedir · 9 months ago

As someone who I both respect a lot and know is really knowledgeable about the latest with AI and LLMs: can you clarify one thing for me? Are all these points based on preparing for a future where LLMs are even better? Or do you think they're good enough now that they will transform the way software is built and software engineers work, with just better tooling?

I've tried to keep up with them somewhat, and dabble with Claude Code and have personal subscriptions to Gemini and ChatGPT as well. They're impressive and almost magical, but I can't help but feel they're not quite there yet. My company is making a big AI push, as are so many companies, and it feels like no one wants to be "left behind" when they "really take off". Or is that people think what we have is already enough for the revolution?

antirez · 9 months ago

I think that LLMs already changed the way we code, mostly, but I believe that agentic coding (vibe coding) is right now able to produce only bad results, and that the better approach is to use LLMs only to augment the programmer work (however it should be noted that I'm all for vibe coding for people that can't code, or that can't find the right motivation. I just believe that the excellence in the field is human+LLM). So failing to learn LLMs right now is yet not catastrophic, but creates a disadvantage because certain things become more explorable / faster with the help of 200 yet-not-so-smart PHDs in all the human disciplines. However other than that, there is the fact that this is the biggest technology emerging to date, so I can't find a good reason for not learning it.

apwell23 · 9 months ago

> Learn how to use AI effectively for coding. This is absolutely non-trivial, and a lot of good programmers are terrible LLMs users (and end believing LLMs are not useful for coding).

I've been asking this on every AI coding thread. Are there good youtube videos of ppl using AI on complex codebases. I see tons of build tic-tac-to in 5 minutes type videos but not on bigger established codebases.

antirez · 9 months ago

You may want to check my channel perhaps. There are videos of coding with LLMs doing real world things. Just search for "Salvatore Sanfilippo" on YouTube. The videos about coding+LLM are mostly in English.

becquerel · 9 months ago

IIRC the guy who makes Aider (Paul Gauthier) has some videos along these lines, of him working on Aider while using Aider (how meta).

manmal · 9 months ago

My problem with 5. is that there are many unknowns, especially when it comes to agents. They have wildly different system prompts that are optimized on a daily basis. I’ve noticed that Gemini 2.5 Pro seems way dumber when used in the Copilot agent, vs me just running all the required context through OpenRouter in Continue.dev. The former doesn’t produce usable iOS tests, while the latter was almost perfect. On the surface, it looks like those should be doing the same thing; but internally, it seems that they are not. And I guess that means I should just use Continue, but they broke something and my workflow doesn’t work anymore.

And people keep saying you need to make a plan first, and then let the agent implement it. Well I did, and had a few markdown files that described the task well. But Copilot‘s Agent didn’t manage to write this Swift code in a way that actually works - everything was subtly off and wrong, and untangling would have taken longer than rewriting it.

Is Copilot just bad, and I need to use Claude Code and/or Cursor?

diggan · 9 months ago

> Is Copilot just bad, and I need to use Claude Code and/or Cursor?

I haven't used Claude Code much, so cannot really speak of it. But Copilot and Cursor tends to make me waste more time than I get out of it. Aider running locally with a mix-and-match of models depending on the problem (lots of DeepSeek Reasoner/Chat since it's so cheap), and Codex, are both miles ahead of at least Copilot and Cursor.

Also, most of these things seems to run with temperature>0.0, so doing multiple runs, even better with multiple different models, tend to give you better results. My own homegrow agent that runs Aider multiple times with a combination of models tend to give me a list of things to chose between, then I either straight up merge the best one, or iterate on the best one sometimes inspired by the others.

antirez · 9 months ago

I never ever use agents for coding. Just the web interface of Gemini, Claude, ..., you are perfectly right that agentic coding just creates a layer of indetermination and chaos.

prohobo · 9 months ago

Agreed with most of this except the last point. You are never going to make a foundational model, although you may contribute to one. Those foundational models are the product, yes, but if I could use an analogy: foundational models are like the state of the art 3D renderers in games. You still need to build the game. Some 3D renderers are used/licensed for many games.

Even the basic chat UI is a structure built around a foundational model; the model itself has no capability to maintain a chat thread. The model takes context and outputs a response, every time.

For more complex processes, you need to carefully curate what context to give the model and when. There are many applications where you can say "oh, chatgpt can analyze your business data and tell you how to optimize different processes", but good luck actually doing that. That requires complex prompts and sequences of LLM calls (or other ML models), mixed with well-defined tools that enable the AI to return a useful result.

This forms the basis of AI engineering - which is different from developing AI models - and this is what most software engineers will be doing in the next 5-10 years. This isn't some kind of hype that will die down as soon as the money gets spent, a la crypto. People will create agents that automate many processes, even within software development itself. This kind of utility is a no-brainer for anyone running a business, and hits deeply in consumer markets as well. Much of what OpenAI is currently working on is building agents around their own models to break into consumer markets.

I recommend anyone interested in this to read this book: https://www.amazon.com/AI-Engineering-Building-Applications-...

antirez · 9 months ago

I agree that instrumenting the model is useful in many contexts, but I don't believe it is something so unique to value Cursor such valuation, or all the attention RAG, memory, MCP get. If people say LLMs are going to be commodities (we will see) imagine the layer about RAG, tool usage, memory...

The progresses we are seeing in agents are 99% due to new LLMs being semantically more powerful.

mafro · 9 months ago

Thanks for this breakdown, I guess I'm largely in the window of points 3-6.

Any suggestion on where to start with point 1? (Also a SWE).

mikedelfino · 9 months ago

Thank you for sharing. Do you recommend any courses or books for following that path?

namnnumbr · 9 months ago

For SWEs interested in "AI Engineering" (either getting involved in how models work, or building applications on them), there's a critical paradigm shift in that using "AI" requires more of an experimental mindset than software engineering typically does.

- I strongly recommend Chip Huyen's books ("Designing Machine Learning Systems" and "AI Engineering") and blog (https://huyenchip.com/blog/).

- Andreessen Horowitz' "AI Cannon" is a good reference listicle (https://a16z.com/ai-canon/)

- "12 factor agents" (https://github.com/humanlayer/12-factor-agents)

loveparade · 9 months ago

I come from a more traditional (PhD) ML/DL background. I wouldn't recommend getting into (1) because the field is incredibly saturated. We have hundreds of new, mostly low quality, papers each day. If you want to get into AI/ML on a more fundamental level now is probably the worst time in terms of competition. There are probably 100x more people in this field than there are jobs, and most of them have a stronger background than you if you are just starting out.

sMarsIntruder · 9 months ago

Looks like OP’s curiosity isn’t just about deep diving LLMs —he’s probably itching to dig into adjacent topics like RAG, AI pipelines, and all the other adjacent LLM rabbit holes.

So in that case I don’t see why not?

drdunce · 9 months ago

I just wanted to second the previous comment, and this is even for adjacent fields. Also a PhD AI/ML grad, and so many of us are out of work at the moment that we'll happily settle for prompt engineering roles, let alone RAG etc., just to maintain appearances on CVs/eligibilty for possible future roles.

risyachka · 9 months ago

If there indeed were 100x people more than jobs the salaries would tank. And this is not the case at all with AI/ML salaries being much higher than regular devs

apwell23 · 9 months ago

> 100x people more than jobs

there aren't 100x 'top shelf' ml engineers.

There aren't a lot of jobs self taught ml programmers like there are for self taught python programmers.

karmasimida · 9 months ago

I don't think so, unless you work for Top AI companies or teams in big tech.

drdunce · 9 months ago

Really?

NitpickLawyer · 9 months ago

> Is this another bubble that might burst

I see this a lot, but I think it's irrelevant. Even if this is a bubble, and even if (when?) it bursts, the underlying tech is not going anywhere. Just like the last dotcom bubble gave us FAANG+, so will this give us the next letters. Sure, agentsdotcom or flowsdotcom or ragdotcom might fail (likely IMO), but the stack is here to stay, and it's only gonna get better, cheaper, more integrated.

What is becoming increasingly clear, IMO, is that you have to spend some time with this. Prompting an LLM is like the old google-fu. You need to gain experience with it, to make the most out of it. Same with coding stacks. There are plenty of ways to use what's available now, as "tools". Play around, see what they can do for you now, see where it might lead. You don't need to buy into the hype, and some skepticism is warranted, but you shouldn't ignore the entire field either.

Nullabillity · 9 months ago

This is just survivorship bias speaking. Some bubbles have a useful core somewhere, but that doesn't mean that every (or even most) bubble does.

fendy3002 · 9 months ago

agree on this point, like web3 and blockchain is not essential as of today (CMIIW).

However not in the case of AI (agentic AI / LLM), because simply they already have a use case, and a valid one. Contextual query and document searching / knowledge digging will be there to stay, either in form of current agentic model or different one.

wmeredith · 9 months ago

This is the most reasonable stance, and I see a lot of smart people take it. The bubble will burst and some winners will remain. Those companies will make bank because the tools are useful to those that know how to use them effectively will excel.

jillesvangurp · 9 months ago

Depends on what you want to do. But my 2 cents are that like all new technology, LLMs will become a commodity. Which means that everybody uses them but few people are able to develop them from scratch. It's not different from other things like databases, GPU drivers, 3D engines for games, etc. That all involves a lot of hardcore computer science and math. But lots of people use these things without being hindered by such skills.

It probably helps a little to understand some of the internals and math. Just to get a feel for what the limitations are.

But your job as a software engineer is probably to stick things together and bang on them until they work. I sometimes describe what I do as being a glorified plumber. It requires skills but surprisingly few skills related to math and algorithms. That stuff comes in library form mostly.

So, get good at using LLMs and integrating what they do into agentic systems. Figure out APIs, limitations, and learn about different use cases. Because we'll all be doing a lot of work related to that in the next few years.

fxtentacle · 9 months ago

LLMs already are a commodity. Google has already kicked off the competitive price wars. Plus I’ve already seen some local companies just buy a beefy GPU server and deploy an open LLM model. While OpenAI is still trying to push quality, their competitors have already positioned themselves to offer the lowest possible prices. And since Nvidia has no easy path for scaling up compute anymore, I also wouldn’t bet on much larger LLMs anytime soon.

That means, if you learn more about the internals of LLMs, your market angle is going to be artisanal customised models. Fashion is commoditised, but people still pay for a custom tailored suit. In the same way companies will continue to pay for finetunes optimised for their business usecase.

If you decide to focus more on the application of LLMs, you should really invest into high-level architectural skills. Good “code completion” models can already do what an outsourced 10 bucks per hour developer used to do. Your job in the future is going to be to decide the structure of which fuse and against the towel and or which type of state is being stored and managed. But the actual coding of the UI forms and the glue code to synchronise from an SQL query to the client state, that part is probably going to be fully outsourced to LLMs.

matt_s · 9 months ago

I think this is the key point - LLMs will go through a commoditization phase and I think you left out a key example from a technology and business context: search engines. There was a huge trend where everyone needed search and was building search, etc. and a couple decades later there are lots of companies that evaporated and a few left standing.

There was also a dot com bubble, mostly bursting not because of search but because there were a lot of what today would be "AI startup" but is just a web app calling AI Api's. So there's likely to be some bubble burst but it should be smaller maybe hitting more of these small tools that eventually become features.

roncesvalles · 9 months ago

>It's not different from other things like databases, GPU drivers, 3D engines for games, etc.

Not quite the same. E.g. databases are a part of the system itself. It's actually pretty helpful for a SWE to understand them reasonably deeply, especially when they're so leaky as an abstraction (arguably, even the more nuanced characteristics of your database of choice will influence the design of your whole application). AI/LLMs are more like dev tooling. You don't really need to know how a text editor, compiler or IDE works.

nnadams · 9 months ago

We have a service at work which categorizes internal documents and logs, then triggers some automation depending on the category. It processes maybe 100 per day. Previously we only used some combination of metadata, regex, and NLP to categorize. Now a call to a LLM is part of that service. We save a lot of manual time where we used to have to resolve unknown documents. The LLM can help fill out missing data, too. It's all stored as annotations so it's clear who/what edited the data.

Granted this is a pretty simple task and a low stakes scenario, but I don't think we should limit ourselves to assuming AI will always only be dev tooling.

Abimelex · 9 months ago

That said, I think there is this thing in between of developing LLMs and using LLMs via APIs and the lines are of cause blurry: Training LLMs (or other neural networks). So best I think is to start digging on the surface and going deeper as long as you feel comfortable. Maybe at a certain point you will have the wish for more power full hardware. Thats the point where you need to decide how much to get invested or to rent a cluster.

amelius · 9 months ago

But the question is what mindset will allow you to put yourself ahead of the rest. Because I suppose the OP doesn't want to end up as just another mediocre programmer.

jbs789 · 9 months ago

Do what interests you.

zwnow · 9 months ago

Every programmer really is just mediocre. There is no perfect software yet. Hence people who built it are mediocre.

nssnsjsjsjs · 9 months ago

There are a lot of paths to become T shaped.

joshdavham · 9 months ago

I’d recommend you simply follow your curiosity and not take this choice too seriously. If you’re simply doing this for career purposes, then the honest answer is that absolutely no one knows where these fields will go in the next couple years so I wouldn’t take anyone’s advice too seriously.

But as for my 2 cents, knowing machine learning has been valuable to me, but not anywhere near as valuable as knowing software dev. Machine learning problems are much more rare and often don’t have a high return on investment.

janalsncm · 9 months ago

As an MLE I get a decent amount of LinkedIn messages. I think I got on someone’s list or something. I would bucket the companies into two groups:

1) Established companies (meta/google/uber) with lots of data and who want MLEs to make 0.1% improvements because each of those is worth millions.

2) Startups mostly proxying OpenAI calls.

The first group is definitely not hype. Their core business relies on ML and they don’t need hype for that to be true.

For the second group, it depends on the business model. The fact that you can make an API call doesn’t mean anything. What matters is solving a customer problem.

I also (selfishly) believe a lot of the second group will hire folks to train faster and more personalized models once their business models are proven.

ednite · 9 months ago

I think about this a lot. If you're early in your career, it must feel like you're staring at a technological fork in the road, with AI standing there ominously, waving both paths like it's the final boss in a RPG game.

Between your two options, I’d lean toward continuing to build what you’re good at and using AI as a powerful tool, unless you genuinely feel pulled toward the internals and research side.

I’ve been lucky to build a fun career in IT, where the biggest threats used to be Y2K, the dot-com bubble, and predictions that mobile phones would kill off PCs. (Spoiler: PCs are still here, and so am I.)

The real question is: what are you passionate enough about to dive into with energy and persistence? That’s what will make the learning worth it. Everything else is noise in my opinion.

If I had to start over today, I'd definitely be in the same uncertain position, but I know I'd still just pick a direction and adapt to the challenges that come with it. That’s the nature of the field.

Definitely learn the fundamentals of how these AI tools work (like understanding how AI tools process context or what transformers actually do). But don’t feel like you need to dive head-first into gradient descent to be part of the future. Focus on building real-world solutions, where AI is a tool, not the objective. And if a cheese grater gets the job done, don’t get bogged down reverse-engineering its rotational torque curves. Just grate the cheese and keep cooking.

That’s my 2 cents, shredded, not sliced.

mdp2021 · 9 months ago

Building AIs has always been there - it's a (fuzzy, continuous to its complement) way to engineer things. Now we have a boom over the development of some technologies (some next-layer NN implementations).

If you are considering whether the future will boost the demand to build AIs (i.e. for clients), we could say: probably so, given regained awareness. It may not be about LLMs - and it should not, at this stage (it can hit reputation - they can hardly be made reliable).

Follow the Classical Artificial Intelligence course, MIT 6.034, from Prof. Patrick Winston - as a first step.