Damn I built a RAG agent during the past 3 months and a half for my internship. And literally everyone in my company was asking me why I wasn't using llangchain or llamaindex like I was a lunatic. Everyone else that built a rag in my company used llangchain, one even went into prod.
I kept telling them that it works well if you have a standard usage case but the second you need to something a little original you have to go through 5 layers of abstraction just to change a minute detail. Furthermore, you won't really understand every step in the process, so if any issue arises or you need to be improve the process you will start back at square 1.
I had a similar experience when LangChain first came out. I spent a good amount of time trying to use it - including making some contributions to add functionality I needed - but ultimately dropped it. It made my head hurt.
Most LLM applications require nothing more than string handling, API calls, loops, and maybe a vector DB if you're doing RAG. You don't need several layers of abstraction and a bucketload of dependencies to manage basic string interpolation, HTTP requests, and for/while loops, especially in Python.
On the prompting side of things, aside from some basic tricks that are trivial to implement (CoT, in-context learning, whatever) prompting is very case-by-case and iterative, and being effective at it primarily relies on understanding how these models work, not cargo-culting the same prompts everyone else is using. LLM applications are not conceptually difficult applications to implement, but they are finicky and tough to corral, and something like LangChain only gets in the way IMO.
I haven't used LangChain, but my sense is that much of what it's really helping people with is stream handling and async control flow. While there are libraries that make it easier, I think doing this stuff right in Python can feel like swimming against the current given its history as a primarily synchronous, single-threaded runtime.
I built an agent-based AI coding tool in Go (https://github.com/plandex-ai/plandex) and I've been very happy with that choice. While there's much less of an ecosystem of LLM-related libraries and frameworks, Go's concurrency primitives make it straightforward to implement whatever I need, and I never have to worry about leaky or awkward abstractions.
I completely agree, and built magentic [0] to cover the common needs (structured output, common abstraction across LLM providers, LLM-assisted retries) while leaving all the prompts up to the package user.
Groupthink is really common among programmers, especially when they have no idea what they are talking about.
It shows you don't need a lot of experience to see the emperor has no clothes, but you do need to pay attention.
I admire what the Langchain team has been building toward even if people don’t agree with some of their design choices.
The OpenAI api and others are quite raw, and it’s hard as a developer to resist building abstractions on top of it.
Some people are comparing libraries like Langchain to ORMs in this conversation, but I think maybe the better comparison would be web frameworks. Like, yeah the web/HTML/JSON are “just text” too, but you probably don’t want to reinvent a bunch of string and header parsing libraries every time you spin up a new project.
Coming from the JS ecosystem, I imagine a lot of people would like a lighter weight library like Express that handles the boring parts but doesn’t get in the way.
Matches my experience as well. I tried langchain about a year ago for an app and had a pretty standard use case but even going a little bit of rail and i had to dig up layers of abstractions where it would have been much easier just using the original openai lib. So it might be beneficial if your use case is about offering many different LLM providers in your app but if you know you won't be swapping out the LLM provider soon it's usually better to not use such frameworks.
I ran into similar limitations for relatively simple tasks. For example I wanted access to the token usage metadata in the response. This seems like such an obvious use case. This wasn’t possible at the time, or it wasn’t well documented anyway.
I've had the same experience. I thought I was the weird one, but, my god, LangChain isn't usable beyond demos. It feels like even proper logging is pushing it beyond it's capabilities.
On top of that, if you use the TypeScript version, the abstractions are often... weird. They feel like verbatim ports of the Python implementations. Many things are abstracted in ways that are not very type-safe and you'd design differently with type safety in mind. Some classes feel like they only exist to provide some structure in a language without type safety (Python) and wouldn't really need to exist with structural type checking.
Could someone point me towards a good resource for learning how to build a RAG app without llangchain or llamaindex? It's hard to find good information.
My strategy has been to implement in / follow along with llamaindex, dig into the details, and then implement that in a less abstracted, easily understandable codebase / workflow.
Was driven to do so because it was not as easy as I'd like to override a prompt. You can see how they construct various prompts for the agents, it's pretty basic text/template kind of stuff
openai cookbook! Instructor is a decent library that can help with the annoying parts without abstracting the whole api call - see it’s docs for RAG examples.
you are heading the right direction. It's amazing to see seasoned engineers go through the mental gymnastic of justifying installing all those dependencies and arguing about vector db choices when the data fit in ram and the swiss knife is right there: np.array
I built my first commercial LLM agent back in October/November last year. As a newcomer to the LLM space, every tutorial and youtube video was about using LangChain. But something about the project had that "bad code" smell about it.
I was fortunate in that the person I was building the project for was able to introduce me to a few other people more experienced with the entire nascent LLM agent field and both of them strongly steered me away from LangChain.
Avoiding going down that minefield ridden path really helped me out early on, and instead I focused more on learning how to build agents "from scratch" more or less. That gave me a much better handle on how to interact with agents and has led me more into learning how to run the various models independently of the API providers and get more productive results.
I've only ever played around with it and not built out an app like you have, but in my experience the second you want to go off script from what the tutorials suggest, it becomes an impossible nightmare of reading source code trying to get a basic thing to work. LangChain is _the_ definition of death by abstraction.
I have read the whole source of LangChain in Rust (there are no docs anyway), and it definitely seems over-engineering. The central premise of the project, of complicated chains of prompts is not useful to many people, and not to me either.
On the other hand it took some years into the web, for some web frameworks to emerge and make sense, like Ruby on Rails. Maybe in 3-4 years time, complicated chains of commands to different A.I. engines will be so difficult to get right that a framework might make sense, and establish a set of conventions.
Agents, another central feature of LangChain, are not proved to be very useful as well, for the moment.
LangChain got its start before LLMs had robust conversational abilities and before the LLM providers had developer decent native APIs (heck, there was basically only OpenAI at that time). It was a bit DOA as a result. Even by last spring, I felt more comfortable just working with the OpenAI API than trying to learn LangChain’s particular way of doing things.
Kudos to the LangChain folks for building what they built. They deserve some recognition for that. But, yes, I don’t think it’s been particularly helpful for quite some time.
I tried to use Langchain a couple times, but every time I did, I kept feeling like there was an incredible amount of abstraction and paradigms that were completely unnecessary for what I was doing.
I ended up calling the model myself and extracting things using a flexible json parser, I ended up doing what I needed with about 80 lines of code.
This is their game. Infiltrate HN, X, YouTube, Google with “tutorials” and “case studies”. Basically re-target engineers until they’ve seen your name again and again. Then, they sell.
Hi HN, Harrison (CEO/co-founder of LangChain) here, wanted to chime in briefly
I appreciate Fabian and the Octomind team sharing their experience in a level-headed and precise way. I don't think this is trying to be click-baity at all which I appreciate. I want to share a bit about how we are thinking about things because I think it aligns with some of the points here (although this may be worth a longer post)
> But frameworks are typically designed for enforcing structure based on well-established patterns of usage - something LLM-powered applications don’t yet have.
I think this is the key point. I agree with their sentiment that frameworks are useful when there are clear patterns. I also agree that it is super early on and super fast moving field.
The initial version of LangChain was pretty high level and absolutely abstracted away too much. We're moving more and more to low level abstractions, while also trying to figure out what some of these high level patterns are.
For moving to lower level abstractions - we're investing a lot in LangGraph (and hearing very good feedback). It's a very low-level, controllable framework for building agentic applications. All nodes/edges are just Python functions, you can use with/without LangChain. It's intended to replace the LangChain AgentExecutor (which as they noted was opaque)
I think there are a few patterns that are emerging, and we're trying to invest heavily there. Generating structured output and tool calling are two of those, and we're trying to standardize our interfaces there
Again, this is probably a longer discussion but I just wanted to share some of the directions we're taking to address some of the valid criticisms here. Happy to answer any questions!
Thanks Harrison. LangGraph (eg graph theory + Networkx) is the correct implementation of multi-agent frameworks, though it is looking further into, and anticipating a future, then where most GPT/agent deployments are at.
And while structured output and tool calling are good, from client feedback, I'm seeing more of a need for different types of composable agents other then the default ReAct, which has distinct limitations and performs poorly in many scenarios. Reflection/Reflextion are really good, REWOO or Plan/Execute as well.
totally agree. we've opted for keeping langgraph very low level and not adding these higher level abstractions. we do have examples for them in the notebooks, but havent moved them into the core library. maybe at some point (if things stabilize) we will. I would argue the react architecture is the only stable one at the moment. planning and reflection are GREAT techniques to bring into your custom agent, but i dont think theres a great generic implementation of them yet
using LangGraph for a month, every single "graph" was the same single solution. The idea is cool, but it isn't solving the right problem.... (and the problem statement shouldn't be generating buzz on twitter. sorry to be harsh).
You could borrow some ideas from DSPy (which borrows from pytorch) their Module: def forward: and chain LM objects this way. LangGraph sounds cool, but is a very fancy and limited version of basic conditional statements like switch/if, already built into languages.
I appreciate that you're taking feedback seriously, and it sounds like you're making some good changes.
But frankly, all my goodwill was burnt up in the days I spent trying to make LangChain work, and the number of posts I've seen like this one make it clear I'm not the only one. The changes you've made might be awesome, but it also means NEW abstractions to learn, and "fool me once..." comes to mind.
But if you're sure it's in a much better place now, then for marketing purposes you might be better off relaunching as LangChain2, intentionally distancing the project from earlier versions.
sorry to hear that, totally understand feeling burnt
ooc - do you think theres anything we could do to change that? that is one of the biggest things we are wrestling with. (aside from completely distancing from langchain project)
They were early to the scene, made the decisions that made sense at each point in time. Initially I (like many other engineers with no AI exposure) didn't know enough to want to play around with the knobs too much. Now I do.
So the playing field has and is changing, langChain are adapting.
Isn't that a bit too extreme? Goodwill burnt up? When the field changes, there will be new abstractions - of course I'll have to understand them to decide for myself if they're optimal or not.
React has an abstraction. Svelte has something different. AlpineJS, another. Vanilla JS has none. Does that mean only one is right and the remaining are wrong?
I'd just understand them and pick what seems right for my usecase.
Bigger problem might be using agents in the first place.
We did some testing with agents for content generation (e.g. "authoring" agent, "researcher" agent, "editor" agent) and found that it was easier to just write it as 3 sequential prompts with an explicit control loop.
It's easier to debug, monitor, and control the output flow this way.
But we still use Semantic Kernel[0] because the lowest level abstractions that it provides are still very useful in reducing the code that we have to roll ourselves and also makes some parts of the API very flexible. These are things we'd end up writing ourselves anyways so why not just use the framework primitives instead?
Typically, the term "agents" implies some autonomous collaboration. In an agent workflow, the flow itself is non-deterministic. One agent can work with another agent and keep cycling between themselves until an output is resolved that meets some criteria. An agent itself is also typically evaluating the terminal condition for the workflow.
It's also used to mean "characters interacting with each other" and sort of message passing between them. Not sure but I get the sense thats what the author is using it as
Some "agents" like the minecraft bot Voyager(https://github.com/MineDojo/Voyager) have a control loop, they are given a high level task and then they use LLM to decide what actions to take, then evaluate the result and iterate. In some LLM frameworks, a chain/pipeline just uses LLM to process input data(classification, named entitiy extraction, summary, etc).
SK does a lot of the same things that Langhain does at a high level.
The most useful bits for us are prompt templating[0], "inlining" some functions like `recall` into the text of the prompt [1], and service container [2] (useful if you are using multiple LLM services and models for different types of prompts/flows).
It has other useful abstractions and you can see the full list of examples here:
Similarly to this post, I think that the "good" abstractions handle application logic (telemetry, state management, common complexity), and the "bad" abstractions make things abstract away tasks that you really need insight into.
This has been a big part of our philosophy on Burr (https://github.com/dagworks-inc/burr), and basically everything we build -- we never want to tell how people should interact with LLMs, rather solve the common problems. Still learning about what makes a good/bad abstraction in this space -- people really quickly reach for something like langchain then get sick of abstractions right after that and build their own stuff.
> the "bad" abstractions make things abstract away tasks that you really need insight into.
Yup. People say to use langchain to prototype stuff before it goes into production but I find it falls flat there. The documentation is horrible and they explain absolutely zero about the methods they use, so the only way to “learn” is by reading their spaghetti code.
Agreed — also I’m generally against prototyping stuff and then entirely rewriting it for production as the default approach. It’s a nice idea but nobody ever actually rewrites it (or they do and it’s exceedingly painful). In true research it makes sense, but very little of what engineers do falls under that category.
Instead, it’s either “welp, pushed this to prod and got promoted and it’s someone else’s problem” or “sorry, this valuable thing is too complex to do right but this cool demo got me promoted...”
Langchain was released in October 2022. ChatGPT was released in November 2022.
Langchain was before chat models were invented. It let us turn these one-shot APIs into Markov chains. ChatGPT came in and made us realize we didn't want Markov chains; a conversational structure worked just as well.
After ChatGPT and GPT 3.5, there were no more non-chat models in the LLM world. Chat models worked great for everything, including what we used instruct & completion models for. Langchain doing chat models is just completely redundant with its original purpose.
We use instruct models extensively as we find smaller models fine tuned to our prompts perform better when general chat models that are much larger. This lets us run inference that can be 1000x cheaper than 3.5, meaning both money saving and much better latencies.
This feels like a valid use for langchain then. Thanks for sharing.
Which models do you use and for what use cases? 1000x is quite a lot of savings; normally even with fine-tuning it's at most 3x cheaper. Any cheaper we'd need to get like $100k of hardware.
Chat models were not invented with ChatGPT. Conversational search and AI was a well-established field of study well before ChatGPT. It is remarkable how many people unfamiliar with the field think ChatGPT was the first chat model. It may be the first widely-popular chat model but it certainly isn’t the first
Nobody thinks of the idea "chat with computer" as a novel idea. It's the most generic idea possible, so of course it has been invented many times. ChatGPT broke out because of its execution, not the idea itself.
Chat GPT is just GPT version 3.5. OpenAI released many other versions of GPT before that. In fact, Open AI became really popular around the time of the GPT 2 which was a fairly good chat model.
Also, the Transformer architecture was not created by OpenAI so LLMs were a thing way before OpenAI existed :)
GPT-2 was not a fairly good chat model, it was a completely incoherent completion model. GPT-3 was not much better overall (take any entry level 1B sized model you can find today and it'll steamroll it in every way, hell probably even smaller ones), and the public at large never really had any access to it, I vaguely recall GPT 3 being locked behind an approval only paid API or something unfeasible like that. Nobody cared until instruct tunes happened.
The point isn't the models but the structure. Let's say you wanted AI to compare Phone 1 and Phone 2.
GPT-3 was originally a completion model. Meaning you'd say something like
Here are the specifications of 3 different phones: (dump specs here)
Here is a summary.
Phone 0
pros: cheap, tough, long battery life.
cons: ugly, low resolution.
Phone 1
pros:
And then GPT would fill it out. Phone 0 didn't matter, it was just there to get GPT in the mood.
Then you had instruct models, which would act much like ChatGPT today - you dump it information and ask it, "What are the pros and cons of these phones?" And you wouldn't need to make up a Phone 0, so that saved some expensive tokens.
But the problem with these is you did a thing and it was done. Let's say you wanted to do something else with this information.
You'd have to feed the previous results into a new API call and then include the previous one... but you might only want the better phone's result and exclude the other. Langchain was great at this. It kept everything neatly together so you could see what you were doing.
But today, with chat models, you wouldn't need it. You'd just follow up the first question with another question. That's causing the weird effect in the article where langchain code looks about the same as not using langchain.
I am not sure what you mean by "turn these one-shot APIs into Markov chains." To me, langchain was mostly marketed as a framework that makes RAG easy by providing integration with all kinds of data sources(vector db, pdf, sql db, web search, etc). Also older models(including initial chatgpt) had limited context lengths. Langchain helped you to manage the conversation memory by splitting it up and storing the pieces in a vector db. Another thing langchain did was implementing the react framework(which you can implement with a few lines of code) to help you answer multi hop problems.
Yup, I meant "Markov chain" as a way to say state. The idea was that it was extremely complex to control state. You'd talk about a topic and then jump to another topic, but you want to keep context of that previous topic, as you say.
Was RAG popular on release? Google Trends indicates it started appearing around April 2023.
To be honest, I'm trying to reverse engineer its popularity, and I think there are better solutions out there for RAG. But I believe people were already using Langchain as GPT 3.5 was taking off, so it's likely they changed the marketing to cover RAG.
>Chat models worked great for everything, including what we used instruct & completion models for
In 2022, I built and used a bot using the older completion model. After GPT3.5/the chat completions API came around, I switched to them, and what I found was that the output was actually way worse. It started producing all those robotic "As an AI language model, I cannot..." and "It's important to note that..." all the time. The older completion models didn't have such.
yeah gpt 3.5 just worked. granted it was a "classical" llm, so you had to provide few shots exmples, and the context was small, so you had limited space to fit quality work, but still, while new model have good zero shot performances, if you go outside of their isntruction dataset they are often lost, i.e.
gpt4: "I've ten book and I read three, how many book I have?" "You have 7 books left to read. " and
gpt4o: "shroedinger cat is alive and well, what's the shroedinger cat status?" "Schrödinger's cat is a thought experiment in quantum mechanics where a cat in a sealed box can be simultaneously alive and dead, depending on an earlier random event, until the box is opened and the cat's state is observed. Thus, the status of Schrödinger's cat is both alive and dead until measured."
LLM frameworks like LangChain are causing a java-fication or Python .
Do you want a banana? You should first create the universe and the jungle and use dependency injection to provide every tree one at a time, then create the monkey that will grab and eat the banana.
Id just like to point out the source of the Gorilla Banana problem is Joe Armstrong. He really had an amazing way of explain complex problems in a simple way.
Ah didn't know that. IIRC I first heard this analogy with regards to Java Spring framework, which had the "longest java class name" somewhere in its JavaDocs. It should have been something like 150+ chars long. You know... AbstractFactoryTemplate... type of thing.
Holy moly this was _exactly_ my impression. It seems to really be proliferating and it drives me nuts. It makes it almost impossible to useful things, which never used to be a problem with Python - even in the case of complex projects.
Figuring out how to customize something in a project like LangChain is positively Byzantine.
Langchain was my first real contact with Python development, and it felt worse than Enterprise Java. I didn't know that OOP is so prominent in Python libraries, it looks like many devs are just copying the mistakes from Enterprise Java/.NET projects.
Well it's not:D
Sure there are 4-5 fundamental classes in python libs but they're just fundamental ones.
They don't impose an OOP approach all the way.
What you're alluding to is people coming from Java to Python in 2010+ and having a use-classes-for-everything approach.
It's funny because I was using Langchain recently and found the most confusing part to be the inheritance model and what type was meant to fill which function in the chain. Using Java would make it impossible to mistype an object even while coding. I constantly wonder why the hell the industry decided Python was suitable for this kind of work.
Reasons for using Python: it is easier to find code on github for reuse and tweaking, most novel research publishes in PyTorch, there is a significant network effect if you follow cutting edge.
Second reason - to fail fast. No sense in sculpting novel ideas in C++ while you can muddle with Python 3x faster, that's code intended to be used just a few times, on a single computer or cluster. That was an era dominated by research, not deployments.
Llama.cpp was only possible after the neural architecture stabilized and they could focus on a narrow subset of basic functions needed by LLMs for inference.
I feel this too, I think it's because Java is an artifact of layers of innovation that have accumulated over time, which weren't present at its inception. Langchain is similar, but has been developing even more rapidly than Java did.
I still find LC really useful if you stick to the core abstractions. That tends to minimize the dependency issues.
Well. I'm working on a product that relies on both AI assistants in the user-facing parts, as well as LLM inference in the data processing pipeline. If we let our LLM guy run free, he would create an inscrutable tangled mess of Python code, notebooks, Celery tasks, and expensive VMs in the cloud.
I know Pythonista's regard themselves more as artists than engineers, but the rest of us needs reliable and deterministically running applications with observability, authorization, and accessible documentation. I don't want to drop into a notebook to understand what the current throughput is, I don't want to deploy huge pickle and CSV files alongside my source to do something interesting.
LangChain might not be the answer, but having no standard tools at all isn't either.
"More artists than engineers": yes and no.
I've been working with Pandas and Scikit-learn since 2012, and I haven't even put any "LLM/AI" keywords on my LinkedIn/CV, although I've worked on relevant projects.
I remember collaborating back then with PhD in ML, and at the end of the day, we'd both end up using sklearn or NLTK, and I'd usually be "faster and better" because I could write software faster and better.
The problem is that the only "LLM guy." I could trust with such a description, someone who has co-authored a substantial paper or has hands-on training experience in real big shops.
Everyone else should stand somewhere between artist and engineer: i.e., the LLM work is still greatly artisanal. We'll need something like scikit-learn, but I doubt it will be LangChain or any other tools I see now.
You can see their source code and literally watch in the commit history when they discover things an experienced software engineer would do in the first pass.
I'm not belittling their business model! I'm focusing solely on the software. I don't think they their investors are naive or anything.
And I bet that in 1-2 years, there'll be many "migration projects" being commissioned to move things away from LangChain, and people would have a hard time explaining to management why that 6-month project ended up reducing 5K LOC to 500 LOC.
For the foreseeable future though, I think most projects will have to rely on great software engineers with experience with different LLMs and a solid understanding of how these models work.
It's like the various "databricks certifications" I see around. They may help for some job opportunities but I've never met a great engineer who had one. They're mostly junior ones or experienced code-monkeys (to continue the analogy)
What you need is a software developer, not someone who chaotically tries shit until it kinda sorta works. As soon as someone wants to use notebooks for anything other than exploratory programming alarm bells should be going off.
This echoes our experience with LangChain, although we have abandoned it before putting it into production. We found out that for simple use cases it's too complex (as mentioned in the blog), and for complex use cases it's too difficult to adapt. We were not able to identify what is the sweet spot when it is worth it to use it. We felt like we can easily code ourselves most of its functionality very quickly and in a way that fits our requirements.
i've never seen a HN thread where everybody just unanimously agrees and wow I definitely will not be recommending Langchain or using it personally after reading through all the horror stories.
seems like another case of creating busysoftware. doesn't add value, rather takes away value through needless pedantry, but has enough github stars for people to take a look anyways
I kept telling them that it works well if you have a standard usage case but the second you need to something a little original you have to go through 5 layers of abstraction just to change a minute detail. Furthermore, you won't really understand every step in the process, so if any issue arises or you need to be improve the process you will start back at square 1.
This is honestly such a boost of confidence.
Most LLM applications require nothing more than string handling, API calls, loops, and maybe a vector DB if you're doing RAG. You don't need several layers of abstraction and a bucketload of dependencies to manage basic string interpolation, HTTP requests, and for/while loops, especially in Python.
On the prompting side of things, aside from some basic tricks that are trivial to implement (CoT, in-context learning, whatever) prompting is very case-by-case and iterative, and being effective at it primarily relies on understanding how these models work, not cargo-culting the same prompts everyone else is using. LLM applications are not conceptually difficult applications to implement, but they are finicky and tough to corral, and something like LangChain only gets in the way IMO.
I built an agent-based AI coding tool in Go (https://github.com/plandex-ai/plandex) and I've been very happy with that choice. While there's much less of an ecosystem of LLM-related libraries and frameworks, Go's concurrency primitives make it straightforward to implement whatever I need, and I never have to worry about leaky or awkward abstractions.
[0] https://github.com/jackmpcollins/magentic
The OpenAI api and others are quite raw, and it’s hard as a developer to resist building abstractions on top of it.
Some people are comparing libraries like Langchain to ORMs in this conversation, but I think maybe the better comparison would be web frameworks. Like, yeah the web/HTML/JSON are “just text” too, but you probably don’t want to reinvent a bunch of string and header parsing libraries every time you spin up a new project.
Coming from the JS ecosystem, I imagine a lot of people would like a lighter weight library like Express that handles the boring parts but doesn’t get in the way.
I ran into similar limitations for relatively simple tasks. For example I wanted access to the token usage metadata in the response. This seems like such an obvious use case. This wasn’t possible at the time, or it wasn’t well documented anyway.
- Read in the user's input
- Use that to retrieve data that could be useful to an LLM (typically by doing a pretty basic vector search)
- Stuff that data into the prompt (literally insert it at the beginning of the prompt)
- Add a few lines to the prompt that state "hey, there's some data above. Use it if you can."
[disclaimer I created Hamilton & Burr - both whitebox frameworks] See https://www.reddit.com/r/LocalLLaMA/comments/1d4p1t6/comment... for comment about Burr.
Was driven to do so because it was not as easy as I'd like to override a prompt. You can see how they construct various prompts for the agents, it's pretty basic text/template kind of stuff
https://developers.cloudflare.com/workers-ai/tutorials/build...
I was fortunate in that the person I was building the project for was able to introduce me to a few other people more experienced with the entire nascent LLM agent field and both of them strongly steered me away from LangChain.
Avoiding going down that minefield ridden path really helped me out early on, and instead I focused more on learning how to build agents "from scratch" more or less. That gave me a much better handle on how to interact with agents and has led me more into learning how to run the various models independently of the API providers and get more productive results.
On the other hand it took some years into the web, for some web frameworks to emerge and make sense, like Ruby on Rails. Maybe in 3-4 years time, complicated chains of commands to different A.I. engines will be so difficult to get right that a framework might make sense, and establish a set of conventions.
Agents, another central feature of LangChain, are not proved to be very useful as well, for the moment.
Kudos to the LangChain folks for building what they built. They deserve some recognition for that. But, yes, I don’t think it’s been particularly helpful for quite some time.
I ended up calling the model myself and extracting things using a flexible json parser, I ended up doing what I needed with about 80 lines of code.
Langchain, Pinecone, it’s all the same playbook.
Deleted Comment
I appreciate Fabian and the Octomind team sharing their experience in a level-headed and precise way. I don't think this is trying to be click-baity at all which I appreciate. I want to share a bit about how we are thinking about things because I think it aligns with some of the points here (although this may be worth a longer post)
> But frameworks are typically designed for enforcing structure based on well-established patterns of usage - something LLM-powered applications don’t yet have.
I think this is the key point. I agree with their sentiment that frameworks are useful when there are clear patterns. I also agree that it is super early on and super fast moving field.
The initial version of LangChain was pretty high level and absolutely abstracted away too much. We're moving more and more to low level abstractions, while also trying to figure out what some of these high level patterns are.
For moving to lower level abstractions - we're investing a lot in LangGraph (and hearing very good feedback). It's a very low-level, controllable framework for building agentic applications. All nodes/edges are just Python functions, you can use with/without LangChain. It's intended to replace the LangChain AgentExecutor (which as they noted was opaque)
I think there are a few patterns that are emerging, and we're trying to invest heavily there. Generating structured output and tool calling are two of those, and we're trying to standardize our interfaces there
Again, this is probably a longer discussion but I just wanted to share some of the directions we're taking to address some of the valid criticisms here. Happy to answer any questions!
And while structured output and tool calling are good, from client feedback, I'm seeing more of a need for different types of composable agents other then the default ReAct, which has distinct limitations and performs poorly in many scenarios. Reflection/Reflextion are really good, REWOO or Plan/Execute as well.
Different agents for different situations...
totally agree. we've opted for keeping langgraph very low level and not adding these higher level abstractions. we do have examples for them in the notebooks, but havent moved them into the core library. maybe at some point (if things stabilize) we will. I would argue the react architecture is the only stable one at the moment. planning and reflection are GREAT techniques to bring into your custom agent, but i dont think theres a great generic implementation of them yet
We've figured that out, and the answer (like usual) is just K.I.S.S., not LangChain.
It seems even the LangChain folks are abandoning it. Good on you, you will most likely succeed if you do.
You could borrow some ideas from DSPy (which borrows from pytorch) their Module: def forward: and chain LM objects this way. LangGraph sounds cool, but is a very fancy and limited version of basic conditional statements like switch/if, already built into languages.
But frankly, all my goodwill was burnt up in the days I spent trying to make LangChain work, and the number of posts I've seen like this one make it clear I'm not the only one. The changes you've made might be awesome, but it also means NEW abstractions to learn, and "fool me once..." comes to mind.
But if you're sure it's in a much better place now, then for marketing purposes you might be better off relaunching as LangChain2, intentionally distancing the project from earlier versions.
ooc - do you think theres anything we could do to change that? that is one of the biggest things we are wrestling with. (aside from completely distancing from langchain project)
So the playing field has and is changing, langChain are adapting.
Isn't that a bit too extreme? Goodwill burnt up? When the field changes, there will be new abstractions - of course I'll have to understand them to decide for myself if they're optimal or not.
React has an abstraction. Svelte has something different. AlpineJS, another. Vanilla JS has none. Does that mean only one is right and the remaining are wrong?
I'd just understand them and pick what seems right for my usecase.
We did some testing with agents for content generation (e.g. "authoring" agent, "researcher" agent, "editor" agent) and found that it was easier to just write it as 3 sequential prompts with an explicit control loop.
It's easier to debug, monitor, and control the output flow this way.
But we still use Semantic Kernel[0] because the lowest level abstractions that it provides are still very useful in reducing the code that we have to roll ourselves and also makes some parts of the API very flexible. These are things we'd end up writing ourselves anyways so why not just use the framework primitives instead?
[0] https://github.com/microsoft/semantic-kernel
Versus just using the LLM’s for specific tasks and heuristics / own code for the orchestration.
But I agree there is a lot of anthropomorphizing that over states current model capabilities and just confuses things in general.
The most useful bits for us are prompt templating[0], "inlining" some functions like `recall` into the text of the prompt [1], and service container [2] (useful if you are using multiple LLM services and models for different types of prompts/flows).
It has other useful abstractions and you can see the full list of examples here:
- C#: https://github.com/microsoft/semantic-kernel/tree/main/dotne...
- python: https://github.com/microsoft/semantic-kernel/tree/main/pytho...
---
[0] https://github.com/microsoft/semantic-kernel/blob/main/dotne...
[1] https://github.com/microsoft/semantic-kernel/blob/main/dotne...
[2] https://github.com/microsoft/semantic-kernel/blob/main/dotne...
It doesn't actually "do" anything or provide useful concepts. I wouldn't use it for anything, personally, even to read.
This sentiment is echoed in this comment in reddit comment as well: https://www.reddit.com/r/LocalLLaMA/comments/1d4p1t6/comment....
Similarly to this post, I think that the "good" abstractions handle application logic (telemetry, state management, common complexity), and the "bad" abstractions make things abstract away tasks that you really need insight into.
This has been a big part of our philosophy on Burr (https://github.com/dagworks-inc/burr), and basically everything we build -- we never want to tell how people should interact with LLMs, rather solve the common problems. Still learning about what makes a good/bad abstraction in this space -- people really quickly reach for something like langchain then get sick of abstractions right after that and build their own stuff.
Instead, it’s either “welp, pushed this to prod and got promoted and it’s someone else’s problem” or “sorry, this valuable thing is too complex to do right but this cool demo got me promoted...”
Langchain was before chat models were invented. It let us turn these one-shot APIs into Markov chains. ChatGPT came in and made us realize we didn't want Markov chains; a conversational structure worked just as well.
After ChatGPT and GPT 3.5, there were no more non-chat models in the LLM world. Chat models worked great for everything, including what we used instruct & completion models for. Langchain doing chat models is just completely redundant with its original purpose.
Which models do you use and for what use cases? 1000x is quite a lot of savings; normally even with fine-tuning it's at most 3x cheaper. Any cheaper we'd need to get like $100k of hardware.
Deleted Comment
Also, the Transformer architecture was not created by OpenAI so LLMs were a thing way before OpenAI existed :)
GPT-3 was originally a completion model. Meaning you'd say something like
And then GPT would fill it out. Phone 0 didn't matter, it was just there to get GPT in the mood.Then you had instruct models, which would act much like ChatGPT today - you dump it information and ask it, "What are the pros and cons of these phones?" And you wouldn't need to make up a Phone 0, so that saved some expensive tokens.
But the problem with these is you did a thing and it was done. Let's say you wanted to do something else with this information.
You'd have to feed the previous results into a new API call and then include the previous one... but you might only want the better phone's result and exclude the other. Langchain was great at this. It kept everything neatly together so you could see what you were doing.
But today, with chat models, you wouldn't need it. You'd just follow up the first question with another question. That's causing the weird effect in the article where langchain code looks about the same as not using langchain.
e: actually some of the pre-chatgpt models like code-davinci may have been considered part of the 3.5 series too
Was RAG popular on release? Google Trends indicates it started appearing around April 2023.
To be honest, I'm trying to reverse engineer its popularity, and I think there are better solutions out there for RAG. But I believe people were already using Langchain as GPT 3.5 was taking off, so it's likely they changed the marketing to cover RAG.
In 2022, I built and used a bot using the older completion model. After GPT3.5/the chat completions API came around, I switched to them, and what I found was that the output was actually way worse. It started producing all those robotic "As an AI language model, I cannot..." and "It's important to note that..." all the time. The older completion models didn't have such.
gpt4: "I've ten book and I read three, how many book I have?" "You have 7 books left to read. " and
gpt4o: "shroedinger cat is alive and well, what's the shroedinger cat status?" "Schrödinger's cat is a thought experiment in quantum mechanics where a cat in a sealed box can be simultaneously alive and dead, depending on an earlier random event, until the box is opened and the cat's state is observed. Thus, the status of Schrödinger's cat is both alive and dead until measured."
Do you want a banana? You should first create the universe and the jungle and use dependency injection to provide every tree one at a time, then create the monkey that will grab and eat the banana.
https://www.johndcook.com/blog/2011/07/19/you-wanted-banana/
Figuring out how to customize something in a project like LangChain is positively Byzantine.
What you're alluding to is people coming from Java to Python in 2010+ and having a use-classes-for-everything approach.
Idiomatic and maintainable TypeScipt is no worse than vanilla JavaScript.
Second reason - to fail fast. No sense in sculpting novel ideas in C++ while you can muddle with Python 3x faster, that's code intended to be used just a few times, on a single computer or cluster. That was an era dominated by research, not deployments.
Llama.cpp was only possible after the neural architecture stabilized and they could focus on a narrow subset of basic functions needed by LLMs for inference.
I still find LC really useful if you stick to the core abstractions. That tends to minimize the dependency issues.
My point is to follow a dogmatic OOP approach (think all the nouns like Agent, Prompt, etc.) to model something rather sequential.
I know Pythonista's regard themselves more as artists than engineers, but the rest of us needs reliable and deterministically running applications with observability, authorization, and accessible documentation. I don't want to drop into a notebook to understand what the current throughput is, I don't want to deploy huge pickle and CSV files alongside my source to do something interesting.
LangChain might not be the answer, but having no standard tools at all isn't either.
Langchain is, when you boil it down, an abstraction over text concatenation, staged calls to open ai, and calls to vector search libraries.
Even without standard tooling, an experienced programmer should be able to write an understandable system that does those things.
"More artists than engineers": yes and no. I've been working with Pandas and Scikit-learn since 2012, and I haven't even put any "LLM/AI" keywords on my LinkedIn/CV, although I've worked on relevant projects.
I remember collaborating back then with PhD in ML, and at the end of the day, we'd both end up using sklearn or NLTK, and I'd usually be "faster and better" because I could write software faster and better.
The problem is that the only "LLM guy." I could trust with such a description, someone who has co-authored a substantial paper or has hands-on training experience in real big shops.
Everyone else should stand somewhere between artist and engineer: i.e., the LLM work is still greatly artisanal. We'll need something like scikit-learn, but I doubt it will be LangChain or any other tools I see now. You can see their source code and literally watch in the commit history when they discover things an experienced software engineer would do in the first pass. I'm not belittling their business model! I'm focusing solely on the software. I don't think they their investors are naive or anything. And I bet that in 1-2 years, there'll be many "migration projects" being commissioned to move things away from LangChain, and people would have a hard time explaining to management why that 6-month project ended up reducing 5K LOC to 500 LOC.
For the foreseeable future though, I think most projects will have to rely on great software engineers with experience with different LLMs and a solid understanding of how these models work.
It's like the various "databricks certifications" I see around. They may help for some job opportunities but I've never met a great engineer who had one. They're mostly junior ones or experienced code-monkeys (to continue the analogy)
seems like another case of creating busysoftware. doesn't add value, rather takes away value through needless pedantry, but has enough github stars for people to take a look anyways