Show HN: I open-sourced the in-memory PostgreSQL I built at work for E2E tests

n2d4 · 2 years ago

Hey HN! For a few months, I've been building an in-memory version of Postgres at work. It has full feature parity with production databases.

The cool thing about it is that you don't need any external processes or proxies. If your platform can run WASM (Node.js, browser, etc.), it can probably run pgmock. Creating a new database with mock data is as simple as creating a JavaScript object.

It's a bit different from the amazing pglite [1] (which inspired me to open-source pgmock in the first place). pgmock runs an x86 emulator with the original Postgres inside, while pglite compiles a Postgres fork to native WASM directly and is hence much faster and more lightweight. However, it only supports single-user mode and a select few extensions, so you can't connect to it with normal Postgres clients (which is quite crucial for E2E testing).

Theoretically, it could be modified to run any Docker image on WebAssembly platforms. Anything specific you'd like to see?

Happy hacking!

[1] https://github.com/electric-sql/pglite

samwillis · 2 years ago

This looks really cool, awesome work!

Correct on PGlite only being single user at the moment, and that certainly is a problem for using it for integration tests in some environments. But I'm hopeful we can bring a multi-connection mode to it, I have a few ideas how, but it will be a while before we do.

There are a few other limitations with PGlite at the moment (related to it being single user mode), such as lacking support for pg_notify (have plans to fix this too). Whereas with this it should "just work" as it's much closer to a real Postgres.

I think there is a big future for these in-memory Postgres projects for testing, it's looks like test run times can be brought down to less than a 1/4 with them.

(I work on PGlite)

waldrews · 2 years ago

Ooh! The 'docker image on WASM' thing sounds promising for a wide range of problems. Recently I wanted to run a FFMPEG/SoX pipeline on the client - too many dependencies to easily recompile with Emscripten; could your approach help there?

jasonjmcghee · 2 years ago

There's already ffmpeg wasm. I've used it in projects. Works great.

https://github.com/ffmpegwasm/ffmpeg.wasm

n2d4 · 2 years ago

Yeah, that should be possible! Though, for audio processing, the performance will probably be terrible (because it's all emulated).

nborwankar · 2 years ago

If it could support the pgvector extension it would be a super fast vector database with all the power of Pg - the relational aspect brings the ability to add and query using rich domain specific metadata usually contained in relational databases.

J_Shelby_J · 2 years ago

I spent last week trying to do that with some of the other pg embeded libs.

And then lancedb released their embedded client for rust, so I went towards that. But it's still lacking FTS. So I fell back to sqlite. have some notes here https://shelbyjenkins.github.io/blog/retrieval-is-all-you-ne...

justinclift · 2 years ago

As a data point, the online demo seems broken for queries it doesn't like:

    select foo();
    Error.captureStackTrace is not a function

That's when using Firefox 124.0.2 on Linux.

rezonant · 2 years ago

Yes, that's a nonstandard function provided by v8, so it wouldn't work on Firefox. [1]

This can be worked around by just constructing an Error and taking it's stack property, captureStackTrace is just a convenience function, so hopefully they can fix that.

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

n2d4 · 2 years ago

My bad, gotta fix that one!

negus · 2 years ago

That's great. But doesn't the whole concept of E2E test mean that you use real environment without mocking the components?

refulgentis · 2 years ago

Explicitly mentioned in the comment as a drawback. In practice E2E means "E2E as much as humanly possible", and I'm glad to see any work that can help.

MuffinFlavored · 2 years ago

> pgmock runs an x86 emulator with the original Postgres inside

Why can't Postgres compile to WASM instead of x86?

anarazel · 2 years ago

Postgres uses multiple processes, shared memory etc. The single user thing that OP referenced is single user because of that...

rsyring · 2 years ago

Why not just run Postgres with it's files on a ramdisk?

Update: this can apparently run in a browser/Node environment so can be created/updated/destroyed by the tests. I guess I'm too much of a backend dev to understand the advantage over a more typical dev setup. Can someone elaborate on where/when/how this is better?

n2d4 · 2 years ago

That's more or less what happens inside the emulator (the emulated disk is an in-memory 9P file system). It's in WebAssembly because that makes it more portable (same behaviour across platforms, architectures, and even in the browser or edge environments), and there are no external dependencies (not even Docker).

Because the emulator lets us boot an "already launched" state directly, it's also faster to boot up the emulated database than spinning up a real one (or Docker container), but this was more of a happy accident than a design goal.

peter_l_downs · 2 years ago

Can you give a specific / concrete example of why I would want to use this instead of running a postgres server a different way (docker, binary, whatever) and having the tests connect to that server? I really don't understand when this would be useful.

gchamonlive · 2 years ago

You could also use memory state dump from a microvm manager like firecracker and have the state replicated

orphea · 2 years ago

I don't get it either. I feel like this is so much unnecessary code, an emulator, a network stack...

Why not use something like https://testcontainers.com/? Is a container engine as an external dependency that bad?

medellin · 2 years ago

It is annoying is you want to run your teat inside a container for ci and now you are running a container in a container and all the issues that come with it.

bastawhiz · 2 years ago

It's the same amount of code and on Mac you still run a full VM to load containers (with a network stack), so I'm not really sure what your point is. If anything it's less code because the notion of the container is entirely abstracted away, and the whole thing is entirely a wasm dependency that you load as a normal import.

hamandcheese · 2 years ago

The fact that this can run in-process is a big deal, as it means you don't have to worry about cleanup.

As soon as you have external processes that your tests depend on, your tests need some sort of wrapper or orchestrator to set everything up before starting tests, and ideally tear it down after.

In 90% of cases I see, that orchestration is done in an extremely non-portable way (like leveraging tools built in to your CI system) which can make reproducing test failures a huge pain in the ass.

zer00eyz · 2 years ago

The whole purpose of End to End testing is that your testing the system in a real state. It's an emulation of your live environment. Because of that you can do interesting things like find out what happens if you pull the plug or run out of disk or ....

The moment that you shove a mock in there, your unit testing. Effective but not the same. One of the critical points of E2E is that without mocks you know that your tests are accurate. Because this isnt Postgres I'm testing it every time and not that system.

>> Can someone elaborate on where/when/how this is better?

If your building PG for an embedded, light weight, or under powered system then this would make sense for verification testing before real E2E testing that would be much slower. (a use case I have)

Other than that its just a cool project and if you ever need a PG shim it's there.

wpietri · 2 years ago

I think you're being a little absolutist about this. Swapping out a possibly equivalent database engine does not turn anything into a unit test, which is defined by testing individual units of code in relative isolation. You can argue that it's not true end to end testing. But almost every E2E test I've seen involves some compromises compared with the true production environment to save money, time, or effort.

peter_l_downs · 2 years ago

> If your building PG for an embedded, light weight, or under powered system then this would make sense for verification testing before real E2E testing that would be much slower. (a use case I have)

If this is actually just Postgres running in an x86 emulator (*edit: originally this said "compiled to wasm"), then how could this be faster than Postgres in any given environment? I don't understand — if it were faster, wouldn't you just want to deploy this in prod in your weird environment rather than Postgres? Why limit this to mocking?

lelandbatey · 2 years ago

Nah, by having in-memory versions of your dependencies, in-memory versions which fulfill the same interfaces as those used in your E2E tests (or the majority of your E2E tests) you unlock running your entire E2E tests suite in milliseconds-to-seconds instead of minutes-to-seconds. And because they're E2E tests that work with any implementation, you can still run your exact same test suite against your "real" E2E dependencies in a CI step to be super sure both implementations behave the same.

I've done this across multiple jobs, and it's amazing to be able to run your "mostly-E2E" tests in 1-2 seconds while developing and the same suite in the full E2E env in CI. It makes developing with confidence so fast and mostly stress free (diverging behavior is admittedly annoying, but usually rare).

I highly recommend using these if feasible.

majikandy · 2 years ago

Until you trust every part of the mock behaves the same as every part of the real database you use… most often the db is your boundary with nothing further downstream. At that point it really is just a faster disposable database, and totally is valid acceptance tests for the e2e system.

Also nothing stops you from using a mock for some tests and a real database for others. It just comes down to trust.

aeyes · 2 years ago

It could be useful for test isolation, moving the Redis backend to FakeRedis in tests fixed quite a bit of noise in our test suite. With Postgres we use savepoints which is not very fast, even on a ramdisk.

LtWorf · 2 years ago

Why make things fast and easy when they could be slower and more complicated?

rickette · 2 years ago

I used to run all kinds of (custom) fake in-memory servers in my tests. Nowadays I just run the real thing using Testcontainers (https://testcontainers.com)

theage · 2 years ago

For prisma/nodejs devs who just want postgres-in-a-can for local dev you are better off using the recently released serverizing of pglite, pglite-server: https://github.com/kamilogorek/pglite-server

It's faster, can persist data to fs, though less stable under heavy use than the full x86 emu e2e test server. I found pglite-server uses only 150MB ram compared to 830MB for pgmock-server. You can then use dotenv to checkout a new .env.local with updated DATABASE_URL for all your nextjs/prisma package.json run scripts

  DATABASE_URL="postgresql://postgres@localhost:5432/awesomeproject"
  "db:pushlocal": "dotenv -e .env.local -- pnpm prisma db push"

Very easy to add to any project, No wonder neon is sponsoring this space.

xlii · 2 years ago

Hate to be w downer but I’d never consider this for use.

For trivial applications maybe it’d work, but with more complexity like anything that has risk of deadlocking or depends on the database shape and such solution subtracts from value as even small shift in behavior can snowball into critical problems.

Today I lean towards resource constrained E2E environment so that local test runners have opportunity to break if someone write anything grossly underperforming.

Not to mention that snapshotting DB after second and distributing this snapshot to test partitions is super fast and many times shaved multiple minutes from test suites.

It’s an interesting idea and definitely great learning experience but I think that target audience is limited.

tumidpandora · 2 years ago

Off-topic, but the title confused me a bit - "...I built at work." Doesn't this imply that the intellectual property for this project belongs to your employer, assuming you used resources from work? If so, are you technically allowed to open-source it?

n2d4 · 2 years ago

We're a startup, open-sourcing was as easy as getting the rest of the team's approval.

Gracana · 2 years ago

drzaiusx11 · 2 years ago

How does this compare to H2 in postgres compatibility mode?

noisy_boy · 2 years ago

I may be wrong but I don't think you can use postgresql stored procs with H2.

crakhamster01 · 2 years ago

This is pretty neat! Some questions, if you're able to answer:

* What was the inspiration for developing this project at work? Was running Postgres in a Docker container too slow?

* What did your CI setup for E2E tests look like before and after integrating pgmock into the flow?

* Was migrating over to this solution difficult?

Thanks!