benjamincburns (u/benjamincburns)

benjamincburns commented on Tell HN: Beware confidentiality agreements that act as lifetime non competes · Posted by u/throwarayes

marcosdumay · 3 months ago

Or the Brazilian law, that requires 100% compensation, and puts the onus on the company to prove that the non-compete is necessary before it can be enforced.

If you ever see one of those contracts here, it's usually usually for a very reasonable situation and a well paid position.

benjamincburns · 3 months ago

I generally ask for 150% - usually on the expectation that it’ll make the non-compete go away.

It’s not at all a ridiculous ask, either. I’ve made a career out of going after high-impact roles in whatever is the fastest growing area of technology at the time. The non-compete isn’t just asking me to sacrifice the income from my next role, it’s asking me to sacrifice the experience as well. It also limits my ability to renegotiate comp while on the job, because they know your BATNA isn’t to just go get a better offer from a competitor.

If a company wants me to give all of that up, I’m sure as shit not doing it just for the privilege of working for them.

benjamincburns commented on DeepRAG: Thinking to retrieval step by step for large language models arxiv.org/abs/2502.01142... · Posted by u/fofoz

throwup238 · 7 months ago

Honestly you're better off rolling your own (but avoid LangChain like the plague). The actual implementation is simple but the devil is in the details - specifically how you chunk your documents to generate vector embeddings. Every time I've tried to apply general purpose RAG tools to specific types of documents like medical records, internal knowledge base, case law, datasheets, and legislation, it's been a mess.

Best case scenario you can come up with a chunking strategy specific to your use case that will make it work: stuff like grouping all the paragraphs/tables about a register together or grouping tables of physical properties in a datasheet with the table title or grouping the paragraphs in a PCB layout guideline together into a single unit. You also have to figure out how much overlap to allow between the different types of chunks and how many dimensions you need in the output vectors. You then have to link chunks together so that when your RAG matches the register description, it knows to include the chunk with the actual documentation so that the LLM can actually use the documentation chunk instead of just the description chunk. I've had to train many a classifier to get this part even remotely usable in nontrivial use cases like caselaw.

Worst case scenario you have to finetune your own embedding model because the colloquialisms the general purpose ones are trained on have little overlap with how terms of art and jargon used in the documents (this is especially bad for legal and highly technical texts IME). This generally requires thousands of examples created by an expert in the field.

benjamincburns · 7 months ago

Disclosure: I'm an engineer at LangChain, primarily focused on LangGraph. I'm new to the team, though - and I'd really like to understand your perspective a bit better. If we're gritting the wheels for you rather than greasing them, I _really_ want to know about it!

> Every time I've tried to apply general purpose RAG tools to specific types of documents like medical records, internal knowledge base, case law, datasheets, and legislation, it's been a mess.

Would it be fair to paraphrase you as saying that people should avoid using _any_ library's ready-made components for a RAG pipeline, or do you think there's something specific to LangChain that is making it harder for people to achieve their goals when they use it? Either way, is there more detail that you can share on this? Even if it's _any_ library - what are we all getting wrong?

Not trying to correct you here - rather stating my perspective in hopes that you'll correct it (pretty please) - but my take as someone who was a user before joining the company is that LangChain is a good starting point because of the _structure_ it provides, rather than the specific components.

I don't know what the specific design intent was (again, new to the team!) but just candidly as a user I tend to look at the components as stand-ins that'll help me get something up and running super quickly so I can start building out evals. I might be very unique in this, but I tend to think that until I have evals, I don't really have any idea if my changes are actually improvements or not. Once I have evals running against something that does _roughly_ what I want it to do, I can start optimizing the end-to-end workflow. I suspect in 99.9% of cases that'll involve replacing some (many?) of our prebuilt components with custom ones that are more tailored to your specific task.

Complete side note, but for anyone looking at LangChain to build out RAG stuff today, I'd advise using LangGraph for structuring your end-to-end process. You can still pull in components for individual process steps from LangChain (or any other library you prefer) as needed, and you can still use LangChain pipelines as individual workflow steps if you want to, but I think you'll find that LangGraph is a more flexible foundation to build upon when it comes to defining the structure of your overall workflow.

benjamincburns commented on NanoGPT github.com/karpathy/nanoG... · Posted by u/trekhleb

legutierr · 3 years ago

> The code itself is plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI. That's it.

What's the best source for these weights?

benjamincburns · 3 years ago

Kaggle or HuggingFace

benjamincburns commented on Ask HN: Why is Microsoft Teams still so bad? · Posted by u/TurkishPoptart

benjamincburns · 3 years ago

Relevant: https://slack.com/blog/news/dear-microsoft

benjamincburns commented on Ways to surf some of the decentralized web medium.com/the-ethereum-n... · Posted by u/bpierre

rubyn00bie · 5 years ago

What does Ethereum have to do with surfing a decentralized Web? Or like what is it bringing to the party? I would imagine most of this is just IPFS, right? Aren't there other like non cryptocurrency based distributed dns systems? I'm just confused how cryptocurrency popped up, or what am I just totally fucking missing?

benjamincburns · 5 years ago

A lot of the other responses here are too complicated and specific. Here's my attempt to put it into easier-to-digest terms:

In a traditional web stack you have a backend and a frontend. The frontend is the stuff the browser runs, and, simplifying a bit, the backend is everything else.

Ethereum smart contracts basically let you replace your backend logic and database with code that runs on the Ethereum blockchain network. Depending on your application can decide to run only a few parts of your backend on Ethereum, or the entire backend.

It's very slow when compared to traditional backends like nodejs, etc, but it has the benefits of censorship resistance and excellent availability. Better still, you don't need to run or maintain servers to support it if you don't want to (although there are benefits to doing so).

In this case, they're using Ethereum's replacement for DNS, ENS.

benjamincburns commented on Amazon must remove toxic school supplies, kids’ jewelry from US marketplace atg.wa.gov/news/news-rele... · Posted by u/tareqak

benjamincburns · 6 years ago

Amazon is becoming more and more like AliExpress with respect to product accuracy and quality. As an example, I recently bought tomato seeds for my garden. I searched for rainbow tomatoes, and the bulk of the results were clearly photoshopped photos made to look like they'd grow tomatoes with colors that'd get lost in a ball pit.

It's easy to evaluate on things like that, but on stuff where corners can be cut I definitely worry.

benjamincburns commented on Network protocols for anyone who knows a programming language destroyallsoftware.com/co... · Posted by u/kalimatas

Wowfunhappy · 7 years ago

Why doesn't TCP have an "I lost a packet" message? Is it just that no one thought of it until too late? Hacking around the problem by sending multiple ACK's sounds inefficient.

benjamincburns · 7 years ago

Thanks to cumulative windowing, it does have a dropped packet signal: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#...

benjamincburns commented on Bitcoin hits $10,000 bloomberg.com/news/articl... · Posted by u/giacaglia

nly · 8 years ago

I don't follow it but I read somewhere the other day that it now costs tens of dollars worth to actually make a transaction on the blockchain.

If this is true then it seems to me that it's effectively become an inter-bank settlement protocol (which may not be that surprising or bad of an outcome given its inherent latency)

benjamincburns · 8 years ago

> to actually make a transaction on the blockchain

On which blockchain, exactly? On the Ethereum Mainnet you can transact for a fraction of a cent USD and it'll typically be verified (mined into a block) within a minute (often faster).

On the Bitcoin chain you choose your own transaction fee, but if you're not keeping up with market rates your transaction might take quite a long time to be verified (again, mined into a block).

benjamincburns commented on We Live in Fear of the Online Mobs bloomberg.com/view/articl... · Posted by u/imartin2k

lurrr · 8 years ago

I will probably be down-voted to hell for this, but here's my 2c.

This is the result of a PC culture. It has made it impossible to have an open and honest discussion about anything. It is easier to just label someone a racist, sexist, or other "buzzwords", than to strengthen your argument. Where are the times when we fought ideas, not people?

From my observation the kind of people doing things like this are the ones losing the intellectual argument. If you KNOW someone is wrong, why not debate them? Why go after them personally?

I wrote this in a rush, I apologize for any mistakes or inconsistencies.

benjamincburns · 8 years ago

> If you KNOW someone is wrong, why not debate them? Why go after them personally?

Alternatively, just let them be wrong.

benjamincburns commented on We Live in Fear of the Online Mobs bloomberg.com/view/articl... · Posted by u/imartin2k

flukus · 8 years ago

> I don't disagree that anonymity offers protection against situations like this, but bear in mind that anonymity is very much a factor in the lack of civil discourse online.

We might lose some civility but we also gain a lot of honesty which is absent everywhere else now due to PC culture. So the question is whether we prefer sometimes uncivilized but honest discussion or a civil veneer of what people really think.

benjamincburns · 8 years ago

To be clear I was speaking in terms of likelihood, not extremes. I personally feel I derive a benefit from identifying myself online, but YMMV.

Also I really hope I didn't give the idea that I think all discourse must be kept PC, or even civil. That'd be a rather sad world to live in.