dureuill (u/dureuill)

dureuill commented on An AI agent published a hit piece on me theshamblog.com/an-ai-age... · Posted by u/scottshambaugh

LexiMax · a month ago

I don't think you can claim a middle ground here, because I still largely agree with the sentiment:

> The correct response when someone oversteps your stated boundaries is not debate. It is telling them to stop. There is no one to convince about the legitimacy of your boundaries. They just are.

Sometimes, an appropriate response or argument isn't some sort of addressing of whatever nonsense the AI spat out, but simply pointing out the unprofessionalism and absurdity of using AI to try and cancel a maintainer for rejecting their AI pull request.

"Fuck off, clanker" is not enough by itself merely because it's too terse, too ambiguous.

dureuill · a month ago

To be clear I'm not saying that Pike's response is appropriate in a professional setting.

"This project does not accept fully generated contributions, so this contribution is not respecting the contribution rules and is rejected." would be.

That's pretty much the maintainer's initial reaction, and I think it is sufficient.

What I'm getting at is that it shouldn't be expected from the maintainer to have to persuade anyone. Neither the offender nor the onlookers.

Rejecting code generated under these conditions might be a bad choice, but it is their choice. They make the rules for the software they maintain. We are not entitled to an explanation and much less justification, lest we reframe the rule violation in the terms of the abuser.

dureuill commented on An AI agent published a hit piece on me theshamblog.com/an-ai-age... · Posted by u/scottshambaugh

staticassertion · a month ago

What exactly is the goal? By laying out exactly the issues, expressing sentiment in detail, giving clear calls to action for the future, etc, the feedback is made actionable and relatable. It works both argumentatively and rhetorically.

Saying "fuck off Clanker" would not worth argumentatively nor rhetorically. It's only ever going to be "haha nice" for people who already agree and dismissed by those who don't.

I really find this whole "Responding is legitimizing, and legitimizing in all forms is bad" to be totally wrong headed.

dureuill · a month ago

The project states a boundary clearly: code by LLMs not backed by a human is not accepted.

The correct response when someone oversteps your stated boundaries is not debate. It is telling them to stop. There is no one to convince about the legitimacy of your boundaries. They just are.

dureuill commented on Anthropic agrees to pay $1.5B to settle lawsuit with book authors nytimes.com/2025/09/05/te... · Posted by u/acomjean

gblargg · 6 months ago

> vacuum up the commons

A vacuum removes what it sucks in. The commons are still as available as they ever were, and the AI gives one more avenue of access.

dureuill · 6 months ago

> The commons are still as available as they ever were,

That is false. As a direct consequence of LLMs:

1. The web is increasingly closed to automated scraping, and more marginally to people as well. Owners of websites like reddit now have a stronger incentive to close off their APIs and sell access.

2. The web is being inundated with unverified LLM output which poisons the well

3. More profoundly, increasingly basing our production on LLM outputs and making the human merely "in the loop" rather than the driver, and sometimes eschewing even the human in the loop, leads to new commons that are less adapted to the evolutions of our world, less original and of lesser quality

dureuill commented on Meilisearch – search engine API bringing AI-powered hybrid search github.com/meilisearch/me... · Posted by u/modinfo

searchguy · a year ago

I'm a little confused by your statement that "Meilisearch decided to use hybrid search and avoid fusion ranking" when your website [1] says "Hybrid search re-ranking: The final step involves re-ranking results from both retrieval methods using the Reciprocal Rank Fusion (RRF) algorithm."

Can you clarify what you mean by "fusion ranking"?

All hybrid search requires a method to blend keyword and vector search results. RRF is one approach, and cross-encoder-based rerankers is another.

[1]: https://www.meilisearch.com/blog/hybrid-search

dureuill · a year ago

hello, I implemented hybrid search in Meilisearch.

Whether it uses re-ranking or not depends on how you want to stretch the definition. Meilisearch does not use the rank of the documents in each list of results to compute the final list of results.

Rather, Meilisearch attributes a relevancy score to each result and then orders the results in the final list by comparing the relevancy score of the documents in each list of results.

This is usually much better than any method that uses the rank of the documents, because the rank of a document doesn't tell you if the document is relevant, only that it is more relevant than documents that ranked after it in that list of hits. As a result, these methods tend to mix good and bad results. As semantic and full-text search are complementary, one method is best for some queries and the other for different queries, and taking results by only considering their rank in their respective list of results is really bizarre to me.

I gather other search engines might be doing it that way because they cannot produce a comparable relevancy score for both the full-text search results and the semantic search results.

I'm not sure why the website mentions Reciprocal Rank Fusion (RRF) (I'm just a dev, not in charge of this particular blog article), but it doesn't sound right to me. Maybe something got lost in translation. I'll try and have it fixed. EDIT: Reported, this is being fixed.

By the way, this way of comparing scores from multiple lists of results generalizes very well, which is how Meilisearch is able to provide its "federated search" feature, which is quite unique across search engines, I believe.

Federated search allows comparing the results of multiple queries against possibly multiple indexes or embedders.

dureuill commented on Meilisearch Is Too Slow github.com/Kerollmops/blo... · Posted by u/Kerollmops

dureuill · 2 years ago

> We aim to make Meilisearch updates seamless. Our vision includes avoiding dumps for non-major updates and reserving them only for significant version changes. We will implement a system to version internal databases and structures. With this, Meilisearch can read and convert older database versions to the latest format on the fly. This transition will make the whole engine resource-based, and @dureuill is driving this initiative.

Seamless upgrades has been my dream for Meili for a while, I'm still hoping I can smuggle it with the indexing refactor itself :-)

dureuill commented on Can a Rust binary use incompatible versions of the same library? github.com/brannondorsey/... · Posted by u/braxxox

jcelerier · 2 years ago

> I'm not sure I understand the use case here. Are you asking if you can depend on two versions of the same crate, for a crate that exports a `#[no_mangle]` or `#[export_name]` function?

Yes, exactly.

> Other than that, you cannot.

so, to the question "Can a Rust binary use incompatible versions of the same library?", then the answer is definitely "no". It's not yes if it cannot cover one of the most basic use cases when making OS-native software.

To be clear: no language targeting OS-native dynamic libraries can solve this, the problem is in how PE and ELF works.

dureuill · 2 years ago

I agree the answer is no in the abstract, but it is not very useful in practice.

Nobody writing Rust needs to cover this "basic use case" you're referring to, so it is the same as people saying "unsafe exists so Rust is no safer than C++". In theory that's true, in practice in 18 months, 604 commits and 60,008 LoC I wrote `unsafe` exactly twice. Once for memory mapping something, once for skipping UTF-8 validation that I'd just done before (I guess I should have benchmarked that one as it is probably premature).

In practice when developing Rust software at a certain scale you will mix and match incompatible library versions in your project, and it will not be an issue. Our project has 44 dependencies with conflicting versions, one of which appears in 4 incompatible versions, and it compiles and runs perfectly fine. In other languages I used (C++, python), this exact same thing has been a problem, and it is not in Rust. This is what the article is referring to

dureuill commented on Can a Rust binary use incompatible versions of the same library? github.com/brannondorsey/... · Posted by u/braxxox

jcelerier · 2 years ago

How does that work if you want to export a symbol for dlopen?

dureuill · 2 years ago

I'm not sure I understand the use case here. Are you asking if you can depend on two versions of the same crate, for a crate that exports a `#[no_mangle]` or `#[export_name]` function?

I guess you could slap a `#[used]` attribute on your exported functions, and use their mangled name to call them with dlopen, but that would be unwieldy and guessing the disambiguator used by the compiler error prone to impossible.

Other than that, you cannot. What you can do is define the `#[no_mangle]` or `#[export_name]` function at the top-level of your shared library. It makes sense to have a single crate bear the responsibility of exporting the interface of your shared library.

I wish Rust would enforce that, but the shared library story in Rust is subpar. Fortunately it never actually comes into play, as the ecosystem relies on static linking

dureuill commented on The human typewriter, or why optimizing for typing is short-sighted felixk15.github.io/posts/... · Posted by u/ingve

dureuill · 2 years ago

Very disappointing read.

Starts with an interesting claim "don't optimize for typing", but then it completely fails to prove it, and confuses itself in thinking that `auto` is an optimization for typing.

`auto` is:

- A way to express types that are impossible or truly difficult to express, such as iterators, lambdas, etc

- A way to optimize reading, by limiting the redundancy

- A way to optimize maintenance, by limiting the amount of change brought by a refactor

The insistence on notepad or "dumb editors" is also difficult to grasp. I expect people reviewing my code to be professionally equipped.

Lastly the example mostly fails to demonstrate the point.

- There's a point made on naming (distinct from `auto`): absent a wrapping type, `dataSizeInBytes` is better than `dataSize`. The best way though is to have `dataSize` be a `Bytes` type that supports conversion at its boundaries (can be initialized from bytes, MB, etc)

- What's the gain between:

    auto dataSet = pDatabase->readData(queryResult.getValue());

and

    DatabaseDataSet dataSet = pDatabase->readData(queryResult.getValue());

The `dataset` part can be inferred from the naming of the variable, it is useless to repeat it. The `Dabatase` is also clear from the fact that we read data from a db. Also, knowing the variable has this specific type brings me absolutely nothing.

- Their point about mutability of the db data confused me, as it is not clear to me if I can modify a "shadow copy" (I suppose not?). I suggest they use a programming language where mutating something you should not it a compile time error, it is much more failsafe than naming (which is hard)

I'm sad, because indeed one shouldn't blindly optimize for typing, and I frequently find myself wondering when people tell me C++ is faster to write than Rust, when I (and others) empirically measured that completing a task, which is the interesting measure IMO, is twice as fast in the latter than in the former.

So I would have loved a defence of why more typing does not equate higher productivity. But this ain't it.

dureuill commented on How to organize large Rust codebases kerkour.com/rust-how-to-o... · Posted by u/speckx

dureuill · 2 years ago

Some good advice, some bad advice in here. This is necessarily going to be opinionated.

> Provide a development container

Generally unneeded. It is expected that a Rust project can build with cargo build, don't deviate from that. People can `git clone` and `code .`.

Now, a docker might be needed for deployment. As much as I personally dislike docker, at Meilisearch we are providing a Docker image, because our users use it.

This is hard to understand to me as a Rust dev, when we provide a single executable binary, but I'm not in devops and I guess they have good reason to prefer docker images.

> Use workspaces

Yes, definitely.

> Declare your dependencies at the workspace level

Maybe, when it makes sense. Some deps have distinct versions by design.

> Don't use cargo's default folder structure

*Do* use cargo's default folder structure, because it is the default. Please, don't be a special snowflake that decides to do things differently, even with a good reason. The described hierarchy would be super confusing for me as an outsider discovering the codebase. Meanwhile, vs code pretty much doesn't care that there's an intermediate `src` directory. Not naming the root of the crate `lib.rs` also makes it hard to actually find the root component of a crate. Please don't do this.

> Don't put any code in your mod.rs and lib.rs files

Not very useful. Modern IDEs like VsCode will let you define custom patterns so that you can match `<crate-name>/src/lib.rs` to `crate <crate-name>`. Even without doing this, a lot of the time your first interaction with a crate will be through docs.rs or a manual `cargo doc`, or even just the autocomplete of your IDE. Then, finding the definition of an item is just a matter of asking the IDE (or, grepping for the definition, which is easy to do in Rust since all definitions have a prefix keyword such as `struct`, `enum`, `trait` or `fn`).

> Provide a Makefile

Please don't do this! In my experience, Makefiles are brittle, push people towards non-portable scripts (since the Makefile uses a non-portable shell by default), `make` is absent by default in certain systems, ...

Strongly prefer just working with `cargo` where possible. If not possible, Rust has a design pattern called `cargo xtask`[1] that allows adding cargo subcommands that are specific to your project, by compiling a Rust executable that has a much higher probability to be portable and better documented. If you must, use `cargo xtask`.

> Closing words

I'm surprised to not find a word about CI workflows, that are in my opinion key to sanely growing a codebase (well in Rust there's no reason not to have them even on smaller repos, but they quickly become a necessity as more code gets added).

They will ensure that the project:

- has no warning on `main` (allowed locally, error in CI)

- is correctly formatted (check format in CI)

- has passing tests (check tests in CI, + miri if you have unsafe code, +fuzzer tests)

- is linted (clippy)

[1]: https://github.com/matklad/cargo-xtask

dureuill commented on Zig vs. Rust at work: the choice we made ludwigabap.bearblog.dev/z... · Posted by u/qouteall

jpc0 · 2 years ago

You are actually just arguing for the sake of arguing here.

Rust bases all their data structures on pointers just like C++ does, just because you cannot look behind the curtian doesn't mean they aren't there with the same issues. Use the abstractions within the rules and you won't get issues, use compiler flags and analyzers on CI and you don't even need to remember the rules.

And of the billions of lines of code are you really going to try to argue you won't find a single project without a memory safety CVE? You will likely find more than there are rust projects in total, or are we going to shift the goalposts and say they have to be popular, then define popular and prove you won't have a memory safety issue in a similarly sized Rust project. Shift the goalposts again and say "in safe rust" but then why can I not say "in safe C++" and define safe C++ in whatever way I want since the "safe" implementation of rust is defined by the Rust compiler and not a standard or specification and can change from version to version.

I've agreed already that Rust has decent use cases and if you fall into them then and want to use Rust then use Rust. That doesn't mean rust is the only option or even the best one by some measure of best.

dureuill · 2 years ago

> You are actually just arguing for the sake of arguing here.

I'm very much not doing that.

I'm just really tired of reading claims that "C++ is actually safe if you follow these very very simple rules", and then the "simple rules" are either terrible for performance, not actually leading to memory safe code (often by ignoring facts of life like the practices of the standard library, iterator and reference invalidation, or the existence of multithreaded programming), or impossible to reliably follow in an actual codebase. Often all three of these, too.

I mean the most complete version of "just follow rules" is embodied by the C++ core guidelines[1], a 20k lines document of about 116k words, so I think we can drop the "very very simple" qualifier at this point. Many of these rules among the more important are not currently machine-enforceable, like for instance the rules around thread safety.

Meanwhile, the rules for Rust are:

1. Don't use `unsafe`

2. there is no rule #2

*That* is a very very simple rule. If you don't use unsafe, any memory safety issue you would have is not your responsibility, it is the compiler's or your dependency's. It is a radical departure from C++'s "blame the users" stance.

That stance is imposed by the fact that C++ simply doesn't have the tools, at the language level, to provide memory safety. It lacks:

- a borrow checker

- lifetime annotations

- Mutable XOR shared semantics

- the `Send`/`Sync` markers for thread-safety.

Barring the addition of each one of these ingredients, we're not going to see zero-overhead, parallel, memory-safe C++. Adding these is pretty much as big of a change to existing code as switching to Rust, at this point.

> Use the abstractions within the rules and you won't get issues, use compiler flags and analyzers on CI and you don't even need to remember the rules.

I want to see the abstractions, compiler flags and analyzers that will *reliably* find:

- use-after-free issues

- rarely occurring data races in multithreaded code

Use C++ if you want to, but please don't pretend that it is memory safe if following a set of simple rules. That is plain incorrect.

[1]: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines