I did not come in here expecting to read such effusive praise for testcontainers. If you’re coming from a place where docker wasn’t really a thing I can see how it looks beautiful. And in a fair amount of use cases it can be really nice. But if you want it to play well with any other containerized workflow, good freaking luck.
Testcontainers is the library that convinced me that shelling out to docker as an abstraction via bash calls embedded in a library is a bad idea. Not because containerization as an abstraction is a bad idea. Rather it’s that having a library that custom shell calls to the docker CLI as part of its core functionality creates problems and complexity as soon as one introduces other containerized workflows. The library has the nasty habit of assuming it’s running on a host machine and nothing else docker related is running, and footguns itself with limitations accordingly. This makes it not much better than some non dockerized library in most cases and oftentimes much much worse.
Testcontainers is not shelling out to the docker CLI; at least on Java, it is using a Java implementation of the docker network protocol, and I believe that’s the case also for the other platforms.
Not sure this matters for the core argument you are making, just thought I’d point it out.
The comment you are replying to makes so many mistakes about how Testcontainers works on Java that I'm not sure what source code the commenter is looking at.
I'm interested to hear what you would do instead! I'm using Testcontainers in a very basic scenario: A web app with a PostgreSQL database. There are different database backends available (like Sqlite) but I use PostreSQL-specific features.
Currently, in my integration testing project, I use Testcontainers to spin up a PostgreSQL database in Docker and then use that for testing. I can control the database lifecycle from my test code. It works perfectly, both on my local PC and in the CI pipeline. To date it also did not interfere or conflict with other Docker containers I have running locally (like the development database).
From what I gather, that is exactly the use case for Testcontainers. How would you solve this instead? I'm on Windows by the way, the CI pipeline is on Linux.
There is no alternative if you want Postgres "embedded" within your test, I have researched that for a long time, as full PGSQL as Docker image sounded as overkill, but nothing else exists.
Never had any issues. We have 100+ build jobs running on Jenkins and most of them have some Testcontainer tests. These never collide if implemented correctly (randomised ports with check for e.g.) even when run in parallel.
On my machine running several docker dev environments it was also never an issue.
Can you specify what issues you had? Also I am pretty sure the library does not work as you describe. Isn't it using the Docker Engine API? I could be mistaken, never checked the source code.
Edit: Just checked the documentation. According to the docs it is using the Docker API.
> We have 100+ build jobs running on Jenkins and most of them have some Testcontainer tests.
That's probably why you can and do use it, Jenkins. Jenkins let's you install w/e on the hosts where as more modern systems default context is a docker container or at least speak it natively.
> Can you specify what issues you had?
Some of my devs have coded tests containers into their ITs. These are the only pipelines we can't containerize, because test containers don't seem to work inside docker, and won't work in k8s either.
I have had a similar intuition from when trying out testcontainers some years ago.
I do not know how the project has developed but at the time I tried it it felt very orthogonal or even incompabtible to more complex (as in multi language monorepo) projects, cde and containerized ci approaches.
I do not know how this has developed since, the emergence of cde standards like devcontainer and devfile might have improved this situation. Yet all projects I have started in the past 5 years where plain multilingual cde projects based on (mostly) a compose.yml file and not much more so no idea how really widespread their usage is.
I would guess that this speaks to an unattended (developer) user story related to other workflows, or perhaps the container-adjacent ecosystem overall. Testing with any workflow is always tricky to get just right, and tools that make it easy (like, "install a package and go" easy) are underrated.
Came here with exactly this on my mind. Thanks for confirming my suspicion.
That being said, having specific requirements for the environment of your integration tests is not necessarily bad IMO. It's just a question of checking these requirement and reporting any mismatches.
Seems like the „community-maintained” ones they endorse, like the Rust implementation, do
I did not realize Rust wasn’t officially supported until I didn’t go to their GitHub and see in the readme that it’s a community project, and not their „official” one
could you elaborate what limitations does is have?
how this does not play nice with remote docker/other docker containers?
I don't know this library but it looks like something that I started writing myself for exactly the same reasons so it would be great to know that's wrong with this implementation or why shouldn't I migrate to use that, thanks
A few examples of the difficulties I've had with testcontainers:
- Testcontainers running in a DinD configuration is complex and harder to get right
- Testcontainers needing to network or otherwise talk with other containers not orchestrated by testcontainers
- general flakiness of tests which are harder to debug because of the library abstraction around Docker
In general if anything else in your workflow other than testcontainers also spawns and manages container lifecycle, getting it to work together with testcontainers is basically trying to reconcile two different configuration sets of containers being spawned within Docker. I think the crux of the issue is that testcontainers inverts the control of tooling. Typically containers encapsulate applications, and in this case it's the other way around. Which is not necessarily a bad thing (indeed, I am a huge proponent of using code to control containers like this), but when you introduce a level of "container-ception" by having two different methodologies like this it creates a lot of complexity and subsequent pain.
Compose is much more straightforward in terms of playing well with other stuff and being simple but obviously isn't great for this kind of unit test thing that testcontainers excels at
Test containers is such a game changer for integration testing, they have language specific docker apis that make it trivial to bring up containers and verify that they are fully initialized and ready to accept connections.
Pretty much every project I create now has testcontainers for integration testing :)
I setup CI so it lints, builds, unit tests then integration tests (using testcontainers)
If you are testing a microservices "ball of mud", you can (and probably should) setup a testing environment and do your integration tests right there, against real dependencies. The tool seems nice for simple dependencies and local testing but I fail to see it as a game changer.
You mention this as an afterthought but that's the critical feature. Giving developers the ability to run integration tests locally is a massive win in a "ball of mud" environment. There are other ways to accomplish this locally, but the test-infrastructure-as-test-code approach is a powerful and conceptually elegant abstraction, especially when used as a tool to design testcontainers for your own services that can be imported as packages into dependent services.
For example we have pure unit tests. But also some tests that boot up Postgres. Test the db migration and gives you a db to play with for your specific “unit” test test case.
No need for a complete environment with Kafka etc. It provides a cost effective stepping stone to what you describe.
What would be nice if test containers could create a complete environment, on the test machine and delete it again.
Still a deploy with some smoke tests on a real env are nice.
I agree with this. At work we use both approaches but at different levels of the test pyramid.
To test integration with 1 dependency at class level we can use test containers.
But to test the integration of the whole microservice with other microservices + dependencies we use a test environment and some test code.
It's a bit like an E2E test for an API.
I would argue that the test environment is more useful if I had to choose between the two as it can test the service contract fully, unlike lower type testing which requires a lot of mocking.
I very strongly disagree. Having "integration" tests are super powerful, you should be able to test against your interfaces/contracts and not have to spin up an entire environment with all of your hundreds of microservces, building "integrated" tests that are then dependant on the current correctness of the other microservices.
I advocate for not having any integrated environment for automated testing at all. The aim should be to be able to run all tests locally and get a quicker feedback loop.
Can you explain more in more detail why this is a game changer if i already have an inhouse framework that is similiar in using docker for integration tests? Does it start docker up faster then you could do normally? Is it just the out of the box apis it provides?
I dont know why integration testing like this is considered a gamechanger. the testing pyramid is a testing pyramid for a reason and its always considered them important. Sometimes starting with integration tests in your project is right because your dont waste time doing manual point and clicks. Instead you design your system around being able to integration test, this includes when you choose dependancies. You think to yourself "how easily will that be able to be stood up on its own from a command?" If the answer is "not very good" then you move on.
If you have an existing in-house framework for anything, maybe it's not worth switching over. It does help though when a best practice bubbles to the top and makes this in reach for those who don't have an existing in-house framework and who wouldn't know how to get started on one. It also helps for more people to have a shared understanding about a subject like this thanks to a popular implementation.
Meanwhile, Testcontainers is done quite well. It's not perfect, but it's sure better than the in-house stuff I built in the past (for the same basic concept).
No, it does not start faster than other Docker containers.
I do challenge the testing pyramid, though. At the risk of repeating my other comment on a different branch of the discussion: the value of integration tests is high, as the cost of integration tests has decreased, it makes sense to do more integration testing, at the expense of unit testing. The cost has decreased exactly due to Docker and mature application frameworks (like in Java: Spring). (See: Testing Trophy.)
FYI Docker already has a RESTful API, and programming container start/stop is trivial to do in any language. I haven't used Testcontainers before, and can kinda see the utility, but IMO it really isn't worth it in the long term to take on a new external dependency for a bit of code that (1) is a critical part of the team's development and release process and (2) can be written in-house in maybe an hour.
> it really isn't worth it in the long term to take on a new external dependency for a bit of code that (1) is a critical part of the team's development and release process and (2) can be written in-house in maybe an hour.
This seems to be quite a contradiction. If it's so easy to just write from scratch, then why would it be scary to depend on? Of course, it's not that easy to write from scratch. You could make a proof-of-concept in maybe an hour... Maybe. But they already took the proof of concept to a complete phase. Made it work with Podman. Added tons of integration code to make it easy to use with many common services. Ported it to several different languages. And, built a community around it.
If you do this from scratch, you have to go through most of the effort and problems they already did, except if you write your own solution, you have to maintain it right from the git-go, whereas if you choose Testcontainers, you'll only wind up having to maintain it if the project is left for dead and starts to bitrot. The Docker API is pretty stable though, so honestly, this doesn't seem likely to be a huge issue.
Testcontainers is exactly the sort of thing open source is great for; it's something where everyone gets to benefit from the wisdom and battle-testing of everyone else. For most of the problems you might run into, there is a pretty decent chance someone already did, so there's a pretty decent chance it's already been fixed.
Most people have GitHub and Dockerhub dependencies in their critical dependency path for builds and deployment. Services go down, change their policies, deprecate APIs, and go under, but code continues to work if you replicate the environment it originally worked in. The biggest risk with code dependencies (for non-production code like test code) is usually that it blocks you from updating some other software. The biggest risk with services is that they completely disappear and you are completely blocked until you fully remove the dependency.
I think people depending on Testcontainers are fine and doing very well with their risk analysis.
I think they make the biggest difference when testing data pipelines (which have historically been difficult to test). You can now easily test out compatibility between different versions of databases, verify data types, embed as part of your build, etc.
I believe the next step, once using test containers, would be automating data generation and validation. Then you will have an automated pipeline of integration tests that are independent, fast and reliable.
You can automate data validation with snapshot tests. I do it this way with a data pipeline and have a function that queries the destination DBs and puts them unto json to be written validated with a snapshot
Not sure how I hadn't encountered this before, I LOVE this pattern.
I find integration tests that exercise actual databases/Elasticsearch/Redis/Varnish etc to be massively more valuable than traditional unit tests. In the past I've gone to pretty deep lengths to do things like spin up a new Elasticsearch index for the duration of a test suite and spin it down again at the end.
It looks like Testcontainers does all of that work for me.
My testing strategy is to have as much of my application's functionality covered by proper end-to-end integration-style tests as possible - think tests that simulate an incoming HTTP request and then run assertions against the response (and increasingly Playwright-powered browser automation tests for anything with heavy JavaScript).
I'll use unit tests sparingly, just for the bits of my code that have very clear input/output pairs that afford unit testing.
I only use mocks for things that I don't have any chance of controlling - calls to external APIs for example, where I can't control if the API provider will be flaky or not.
I love integration tests. You know why? Because I can safely refactor all I want!
Unit tests are great, but if you significantly refactor how several classes talk to each other, and each of those classes had their own, isolated unit tests that mocked out all of the others, you're suddenly refactoring with no tests. But a black box integration tests? Refactor all your code, replace your databases, do whatever you want, integration test still passes.
Unit test speed is a huge win, and they're incredibly useful for quickly testing weird little edge cases that are annoying to write integration tests for, but if I can write an integration test for it, I prefer the integration test.
Legit. Probably an unpopular opinion but if I had to chose only one type of test (queue a long discussion with no resolution over defining exact taxonomic boundaries), I'd go with integration over unit. Especially if you're a new contributor to a project. I think it comes down to exercising the flow between... Well, integrations across components.
Even better? Take your integration test, put it on a cronjob in your VPN/vpc, use real endpoints and make bespoke auth credentials + namespace, and now you have canaries. Canaries are IMHO God tier for whole system observability.
Then take your canary, clean it up, and now you have examples for documentation.
Unit tests are for me mostly testing domain+codomain of functions and adherence to business logic, but a good type system along with discipline for actually making schemas/POJOs etc instead of just tossing around maps strings and ints everywhere already accomplishes a lot of that (still absolutely needed though!)
Thanks for saying this out loud. I’m a solo dev and in my project I’m doing exactly this: 90% black box integration tests and 10% unit tests for edge cases I cannot trigger otherwise. It buys me precious time to not adjust tests after refactoring. Yet it made me feel like a heretic: everyone knows the testing pyramid and it comes from Google so I must be very wrong.
I've heard this aversion to unit tests a few times in my career, and I'm unable to make sense of it.
Sure, integration tests "save" you from writing pesky unit tests, and changing them frequently after every refactor.
But how do you quickly locate the reason that integration test failed? There could be hundreds of moving parts involved, and any one of them malfunctioning, or any unexpected interaction between them, could cause it to fail. The error itself would likely not be clear enough, if it's covered by layers of indirection.
Unit tests give you that ability. If written correctly, they should be the first to fail (which is a good thing!), and if an integration test fails, it should ideally also be accompanied by at least one unit test failure. This way it immediately pinpoints the root cause.
The higher up the stack you test, the harder it is to debug. With E2E tests you're essentially debugging the entire system, which is why we don't exclusively write E2E tests, even though they're very useful.
To me the traditional test pyramid is still the best way to think about tests. Tests shouldn't be an afterthought or a chore. Maintaining a comprehensive and effective test suite takes as much hard work as, if not more than, maintaining the application itself, and it should test all layers of the system. But if you do have that, it gives you superpowers to safely and reliably work on any part of the system.
how do you handle resetting a sql database after every integration test? Testcontainers may help here by spinning up a new instance for every test but that seems very slow
I once failed a take home assignment because of this. It was writing a couple of api endpoints and for testing, I focused on integration over unit. I even explained my reasoning in the writeup. There was no indication that the company preferred unit tests, but the feedback was "didn't have enough unit tests". What a dumb company.
Another technique I've found very useful is generative integration tests (kind of like fuzzing), especially for idempotent API endpoints (GETs).
For example, assuming you have a test database with realistic data (or scrubbed production data), write tests that are based on generalizable business rules, e.g: the total line of an 'invoice' GET response should be the sum of all the 'sections' endpoint responses tied to that invoice id. Then, just have a process that runs before the tests create a bunch of test cases (invoice IDs to try), randomly selected from all the IDs in the database. Limit the number of cases to something reasonable for total test duration.
As one would expect, overly tight assertions can often lead to many false positives, but really tough edge cases hidden in diverse/unexpected data (null refs) can be found that usually escape the artificial or 'happy path' pre-selected cases.
Running unit tests as integration tests will explode in your face. In any decent complex code base testing time will go through the roof and you will have a hard time getting the genie back in the bottle.
Testing that you actually run "sum()" is a unit test.
This is exactly the strategy I have discovered to bring the most value as well. And honestly, something that simplifies the setup of those containers is pretty great.
Yes, you just focus on a few high level behaviors that you want to validate, instead of the units. It’s more difficult to pull these tests off, as there are more chances for them to become flaky tests, but if they work they provide much more value.
I’d prefer a dozen well written integration tests over a hundred unit tests.
Having said that, both solve different problems, ideally you have both. But when time-constrained, I always focus on integration tests with actual services underneath.
Yeah - I find that sticking to tests like this means I don't have hundreds of tiny unit tests that rely on mocks, and it's still very supportive of refactoring - I can make some pretty big changes and be confident that I've not broken anything because a given request continues to return the expected response.
I didn't quite understand why this was made. We create our local test environments using docker-compose, and so I read:
> Creating reliable and fully-initialized service dependencies using raw Docker commands or using Docker Compose requires good knowledge of Docker internals and how to best run specific technologies in a container
This sounds like a <your programming language> abstraction over docker-compose, which lets you define your docker environment without learning the syntax of docker-compose itself. But then
> port conflicts, containers not being fully initialized or ready for interactions when the tests start, etc.
means you'd still need a good understanding of docker networking, dependencies, healthchecks to know if your test environment is ready to be used.
Am I missing something? Is this basically change what's starting your docker test containers?
Shows how you can embed the declaration of db for testing in a unit test:
> pgContainer, err := postgres.RunContainer(ctx,
> testcontainers.WithImage("postgres:15.3-alpine"),
> postgres.WithInitScripts(filepath.Join("..", "testdata", "init-db.sql")),
> postgres.WithDatabase("test-db"),
> postgres.WithUsername("postgres"),
> postgres.WithPassword("postgres"),
> testcontainers.WithWaitStrategy(
> wait.ForLog("database system is ready to accept connections").
This does look quite neat for setting up test specific database instances instead of spawning one outside of the test context with docker(compose). It should also make it possible to run tests that require their own instance in parallel.
This seems great but is actually quite slow. This will create a new container, with a new postgres server, and a new database in that server, for each test. You'll then need to run migrations in that database. This ends up being a huge pain in the ass.
A better approach is to create a single postgres server one-time before running all of your tests. Then, create a template database on that server, and run your migrations on that template. Now, for each unit test, you can connect to the same server and create a new database from that template. This is not a pain in the ass and it is very fast: you run your migrations one time, and pay a ~20ms cost for each test to get its own database.
I've implemented this for golang here — considering also implementing this for Django and for Typescript if there is enough interest. https://github.com/peterldowns/pgtestdb
As a user of testcontainers I can tell you they are very powerful yet simple.
Indeed all they do is provide an abstraction for your language, but this is soo useful for unit/integration tests.
At my work we have many microservices in both Java and python, all of which use testcontainers to set up the local env or integration tests. The integration with localstack and the ability to programmatically set it up without fighting with compose files, is somewhat I find very useful.
Testcontainers is great. It's got seamless junit integration and really Just Works. I've never once had to even think about any of the docker aspects of it. There's really not much to it.
It’s not coming across in your comment, but Testcontainers can work with unit tests to start a container, run the unit tests and shutdown. For example, to verify database operations against the actual database, the unit test can start an instance of Postgres run tests and then shut it down. If running tests in parallel, each test can start its own container and shutdown at the end.
Wouldn't that just massively, _massively_ slow down your tests, if each test was spinning up its own Postgres container?
I ask because I really like this and would love to use it, but I'm concerned that that would add just an insane amount of overhead to the point where the convenience isn't worth the immense amount of extra time it would take.
Testcontainers are for testing individual components, apart from the application.
I built a new service registry recently, its unit tests spins up a zookeeper instance for the duration of the test, and then kills it.
Also very nice with databases. Spin up a clean db, run migrations, then test db code with zero worries about accidentally leaving stuff in a table that poisons other tests.
> Also very nice with databases. Spin up a clean db, run migrations, then test db code with zero worries about accidentally leaving stuff in a table that poisons other tests.
Are you spinning up a new instance between every test case? Because that sounds painfully slow.
I would just define a function which DELETEs all the data and call it between every test.
This looks to be like just language specific bindings over the docker compose syntax. You're right that docker compose handles all of the situations they describe.
The major issue I had with docker compose in my CI environment is flaky tests when a port is already used by another job I don't control. With testcontainers, I haven't seen any false positive as I can use whatever port is available and not a hardcoded one hoping it won't conflict with what other people are doing.
I looked at testcontainers and ended up rolling my own version. One issue I had is that Docker is a very leaky abstraction. I needed to write one test and have it run in all these scenarios:
- on a Mac
- on a Linux VM
- in a Docker container on a Linux VM, with a Docker socket mounted
The networking for each of these is completely different. I had to make some opinionated choices to get code that could run in all cases. And running inside Docker prevented the test from being able to mount arbitrary files into the test containers, which turns out to be a requirement often. I ended up writing code to build a new image for each container, using ADD to inject files.
I also wanted all the tests to run in parallel and spit out readable logs from every container (properly associated with the correct test).
Not sure if any of these things have changed in testcontainers since I last looked, but these are the things I ran into. It took maybe a month of off and on tweaking, contrary to some people here claiming it can be done in an hour. As always, the devil is in the details.
edit: I did end up stealing ryuk. That thing can’t really be improved upon.
Many of the Mac networking specifics have become less of a problem. I use Rancher Desktop, which uses the correct virtualization framework based on OSX versions and allows you to customize the lima and virtual machine provisioning scripts so that you don't have cross-platform headaches like this (for the most part). Also runs kubernetes locally out of the box so you can test app deployments without waiting around for resources to free up on your shared dev cluster. Newest versions have almost everything Docker Desktop has. Highly recommend if you are on mac.
> No more need for mocks or complicated environment configurations. Define your test dependencies as code, then simply run your tests and containers will be created and then deleted.
Wait what? They think you don't need unit tests because you can run integration tests with containers?
It's trivial to set up a docker container with one of your dependencies, but starting containers is painful and slow.
1) At least in the Java world, the term "unit testing" is often confused by "things you do in JUnit", which runs both "pure" unit tests and project-level integration tests, i.e. spinning up an application context (like Spring) and testing against real REST endpoints etc.
2) While unit tests are cheaper and quicker than (project-level) integration tests, they also in many cases don't provide results as good a result and level of confidence, because a lot of run-time aspects (serialization, HTTP responses, database responses, etc.) are not as straightforward to mock. There's been some noise about The Testing Trophy, instead of the Testing Pyramid where, in short, there are still unit tests where it makes sense, but a lot of testing has moved to the (project-level) integration tests. These are slower, but only by so much that the trade-off is often worth it. Whether it's worth it, depends heavily on what you're testing. If it's a CRUD API: I use integration tests. If it's something algorithmic, or string manipulation, etc.: I use unit tests.
When I saw the Testing Trophy presented, it came with the asterisk that (project-level) integration testing has gotten easier and cheaper over time, thus allowing a shift in trade-off. Testcontainers is one of the primary reasons why this shift has happened. (And... I respect that it's not for everyone.)
Yeah, certainly I see the value of those kinds of tests. And clearly as you say the simpler tests don't provide as realistic a simulation as the more expensive tests.
But on the test philosophy angle, my take on what's happening is just that developers traditionally look for any reason to skip tests. I've seen this in a few different forms.
- right now containers make it trivial to run all of your dependencies. That's much easier than creating a mock or a fake, so we do that and don't bother creating a mock/fake.
- compiler folks have created great static analysis tools. That's easier than writing a bunch of tests, so we'll just assume static analysis will catch our bugs for us.
- <my language>'s types system does a bunch of work type checking, so I don't need tests. Or maybe I just need randomly generated property tests.
- no tests can sufficiently emulate our production environment, so tests are noise and we'll work out issues in dev and prod.
What I've noticed, though, is that looking across a wide number of software projects is there's a clear difference in quality between projects that have a strong testing discipline and those that convince themselves they don't need tests because of <containers, or types, or whatever else>.
Sure it's possible that tests don't cause the quality difference (maybe there's a third factor for example that causes both). And of course if you have limited resources you have to make a decision about which quality assurance steps to cut.
But personally I respect a project more if they just say they don't have the bandwidth to test properly so they're just skipping to the integration stage (or whatever) rather than convince themselves that those tests weren't important any way. Because I've seen so many projects that would have been much better with even a small number of unit tests where they only had integration tests.
I have been doing E2E testing exclusively for close to a decade on several apps and it works great.
Note, not integration, E2E. I can go from bare vm to fully tested system in under 15 minutes. I can re run that test in 1-5 (depending on project) ...
Im creating 100's of records in that time, and fuzzing a lot of data entry. I could get it to go "even faster" if I went in and removed some of the stepwise testing... A->B->C->D could be broken out to a->b, a->c, a->d.
Because my tests are external, they would be durable across a system re-write (if I need to change language, platform etc). They can also be re-used/tweeked to test system perf under load (something unit tests could never do).
No mocks doesn't mean no tests. It means running tests against the full code path which includes requests to running instances of the services you might otherwise mock. For many apps and use cases, the overhead in managing container state is worth it.
> It means running tests against the full code path which includes requests to running instances of the services you might otherwise mock.
Yeah, those are called end to end tests and you run them after integration tests which you run after unit tests. It sounds to me like they're saying just skip to the end to end tests.
> For many apps and use cases, the overhead in managing container state is worth it.
Yeah, and typically you'd run them after you run unit and integration tests. If I have 10 libraries to test that have database access, I have to run 10 database containers simultaneously every few minutes as part of the development process? That's overkill.
Testcontainers is awesome and all the hate it gets here is undeserved.
Custom shell scripts definitely can't compete.
For example one feature those don't have is "Ryuk": A container that testcontainers starts which monitors the lifetime of the parent application and stops all containers when the parent process exits.
It allows the application to define dependencies for development, testing, CI itself without needing to run some command to bring up docker compose beforehand manually.
One cool usecase for us is also having a ephemeral database container that is started in a Gradle build to generate jOOQ code from tables defined in a Liquibase schema.
I dont understand how this is better than a docker-compose.yml with your dependencies, which plays nicer with all other tooling.
Especially if there are complex dependencies between required containers it seems to be pretty weak in comparison. But i also only used it like 5 years ago, so maybe things are significantly better now.
One specific case that I encountered recently was implementing "integration" tests, where I needed to test some behavior that relies on the global state of a database. All other tests before were easily parallelized, and this meant our whole service could be fully tested within 10-30 seconds (dev machine vs. pipeline).
However, the new tests could not be run in parallel with the existing ones, as the changes in global state in the database caused flaky failures. I know there will be other tests like them in the future, so I want a robust way of writing these kinds of "global" tests without too much manual labor.
Spinning up a new postgres instance for each of these specific tests would be one solution.
I would like to instead go for running the tests inside of transactions, but that comes with its own sorts of issues.
Because you may want to spin up a new postgres database to test a specific scenario in an automated way. Testcontainers allows you to do that from code, for example you could write a pytest fixture to provide a fresh database for each test.
I don't know about Postgres but a MySQL container can take at least a few seconds to start up and report back as healthy on a Macbook Pro. You can just purge tables on the existing database in between tests, no need to start up a whole new database server each time.
Testcontainers integrates with docker-compose [1]. Now you can run your tests through a single build-tool task without having to run something else and your build. You can even run separate compose files for just the parts your test touches.
Testcontainers is the library that convinced me that shelling out to docker as an abstraction via bash calls embedded in a library is a bad idea. Not because containerization as an abstraction is a bad idea. Rather it’s that having a library that custom shell calls to the docker CLI as part of its core functionality creates problems and complexity as soon as one introduces other containerized workflows. The library has the nasty habit of assuming it’s running on a host machine and nothing else docker related is running, and footguns itself with limitations accordingly. This makes it not much better than some non dockerized library in most cases and oftentimes much much worse.
Not sure this matters for the core argument you are making, just thought I’d point it out.
Currently, in my integration testing project, I use Testcontainers to spin up a PostgreSQL database in Docker and then use that for testing. I can control the database lifecycle from my test code. It works perfectly, both on my local PC and in the CI pipeline. To date it also did not interfere or conflict with other Docker containers I have running locally (like the development database).
From what I gather, that is exactly the use case for Testcontainers. How would you solve this instead? I'm on Windows by the way, the CI pipeline is on Linux.
Edit: Just checked the documentation. According to the docs it is using the Docker API.
That's probably why you can and do use it, Jenkins. Jenkins let's you install w/e on the hosts where as more modern systems default context is a docker container or at least speak it natively.
> Can you specify what issues you had?
Some of my devs have coded tests containers into their ITs. These are the only pipelines we can't containerize, because test containers don't seem to work inside docker, and won't work in k8s either.
I do not know how the project has developed but at the time I tried it it felt very orthogonal or even incompabtible to more complex (as in multi language monorepo) projects, cde and containerized ci approaches.
I do not know how this has developed since, the emergence of cde standards like devcontainer and devfile might have improved this situation. Yet all projects I have started in the past 5 years where plain multilingual cde projects based on (mostly) a compose.yml file and not much more so no idea how really widespread their usage is.
That being said, having specific requirements for the environment of your integration tests is not necessarily bad IMO. It's just a question of checking these requirement and reporting any mismatches.
https://github.com/testcontainers/testcontainers-java/blob/m...
https://github.com/testcontainers/testcontainers-java/blob/m...
I did not realize Rust wasn’t officially supported until I didn’t go to their GitHub and see in the readme that it’s a community project, and not their „official” one
I don't know this library but it looks like something that I started writing myself for exactly the same reasons so it would be great to know that's wrong with this implementation or why shouldn't I migrate to use that, thanks
- Testcontainers running in a DinD configuration is complex and harder to get right
- Testcontainers needing to network or otherwise talk with other containers not orchestrated by testcontainers
- general flakiness of tests which are harder to debug because of the library abstraction around Docker
In general if anything else in your workflow other than testcontainers also spawns and manages container lifecycle, getting it to work together with testcontainers is basically trying to reconcile two different configuration sets of containers being spawned within Docker. I think the crux of the issue is that testcontainers inverts the control of tooling. Typically containers encapsulate applications, and in this case it's the other way around. Which is not necessarily a bad thing (indeed, I am a huge proponent of using code to control containers like this), but when you introduce a level of "container-ception" by having two different methodologies like this it creates a lot of complexity and subsequent pain.
Compose is much more straightforward in terms of playing well with other stuff and being simple but obviously isn't great for this kind of unit test thing that testcontainers excels at
Pretty much every project I create now has testcontainers for integration testing :)
I setup CI so it lints, builds, unit tests then integration tests (using testcontainers)
https://github.com/turbolytics/latte/blob/main/.github/workf...
Their language bindings provide nice helper functions for common database operations (like generating a connection uri from a container user)
https://github.com/turbolytics/latte/blob/main/internal/sour...
I use them in $day job use them in side projects use them everywhere :)
You mention this as an afterthought but that's the critical feature. Giving developers the ability to run integration tests locally is a massive win in a "ball of mud" environment. There are other ways to accomplish this locally, but the test-infrastructure-as-test-code approach is a powerful and conceptually elegant abstraction, especially when used as a tool to design testcontainers for your own services that can be imported as packages into dependent services.
Test containers provide a middle ground.
For example we have pure unit tests. But also some tests that boot up Postgres. Test the db migration and gives you a db to play with for your specific “unit” test test case.
No need for a complete environment with Kafka etc. It provides a cost effective stepping stone to what you describe.
What would be nice if test containers could create a complete environment, on the test machine and delete it again.
Still a deploy with some smoke tests on a real env are nice.
To test integration with 1 dependency at class level we can use test containers.
But to test the integration of the whole microservice with other microservices + dependencies we use a test environment and some test code. It's a bit like an E2E test for an API.
I would argue that the test environment is more useful if I had to choose between the two as it can test the service contract fully, unlike lower type testing which requires a lot of mocking.
I advocate for not having any integrated environment for automated testing at all. The aim should be to be able to run all tests locally and get a quicker feedback loop.
I dont know why integration testing like this is considered a gamechanger. the testing pyramid is a testing pyramid for a reason and its always considered them important. Sometimes starting with integration tests in your project is right because your dont waste time doing manual point and clicks. Instead you design your system around being able to integration test, this includes when you choose dependancies. You think to yourself "how easily will that be able to be stood up on its own from a command?" If the answer is "not very good" then you move on.
Meanwhile, Testcontainers is done quite well. It's not perfect, but it's sure better than the in-house stuff I built in the past (for the same basic concept).
No, it does not start faster than other Docker containers.
I do challenge the testing pyramid, though. At the risk of repeating my other comment on a different branch of the discussion: the value of integration tests is high, as the cost of integration tests has decreased, it makes sense to do more integration testing, at the expense of unit testing. The cost has decreased exactly due to Docker and mature application frameworks (like in Java: Spring). (See: Testing Trophy.)
https://www.docker.com/blog/docker-whale-comes-atomicjar-mak...
This seems to be quite a contradiction. If it's so easy to just write from scratch, then why would it be scary to depend on? Of course, it's not that easy to write from scratch. You could make a proof-of-concept in maybe an hour... Maybe. But they already took the proof of concept to a complete phase. Made it work with Podman. Added tons of integration code to make it easy to use with many common services. Ported it to several different languages. And, built a community around it.
If you do this from scratch, you have to go through most of the effort and problems they already did, except if you write your own solution, you have to maintain it right from the git-go, whereas if you choose Testcontainers, you'll only wind up having to maintain it if the project is left for dead and starts to bitrot. The Docker API is pretty stable though, so honestly, this doesn't seem likely to be a huge issue.
Testcontainers is exactly the sort of thing open source is great for; it's something where everyone gets to benefit from the wisdom and battle-testing of everyone else. For most of the problems you might run into, there is a pretty decent chance someone already did, so there's a pretty decent chance it's already been fixed.
Most people have GitHub and Dockerhub dependencies in their critical dependency path for builds and deployment. Services go down, change their policies, deprecate APIs, and go under, but code continues to work if you replicate the environment it originally worked in. The biggest risk with code dependencies (for non-production code like test code) is usually that it blocks you from updating some other software. The biggest risk with services is that they completely disappear and you are completely blocked until you fully remove the dependency.
I think people depending on Testcontainers are fine and doing very well with their risk analysis.
I believe the next step, once using test containers, would be automating data generation and validation. Then you will have an automated pipeline of integration tests that are independent, fast and reliable.
I find integration tests that exercise actual databases/Elasticsearch/Redis/Varnish etc to be massively more valuable than traditional unit tests. In the past I've gone to pretty deep lengths to do things like spin up a new Elasticsearch index for the duration of a test suite and spin it down again at the end.
It looks like Testcontainers does all of that work for me.
My testing strategy is to have as much of my application's functionality covered by proper end-to-end integration-style tests as possible - think tests that simulate an incoming HTTP request and then run assertions against the response (and increasingly Playwright-powered browser automation tests for anything with heavy JavaScript).
I'll use unit tests sparingly, just for the bits of my code that have very clear input/output pairs that afford unit testing.
I only use mocks for things that I don't have any chance of controlling - calls to external APIs for example, where I can't control if the API provider will be flaky or not.
Unit tests are great, but if you significantly refactor how several classes talk to each other, and each of those classes had their own, isolated unit tests that mocked out all of the others, you're suddenly refactoring with no tests. But a black box integration tests? Refactor all your code, replace your databases, do whatever you want, integration test still passes.
Unit test speed is a huge win, and they're incredibly useful for quickly testing weird little edge cases that are annoying to write integration tests for, but if I can write an integration test for it, I prefer the integration test.
Even better? Take your integration test, put it on a cronjob in your VPN/vpc, use real endpoints and make bespoke auth credentials + namespace, and now you have canaries. Canaries are IMHO God tier for whole system observability.
Then take your canary, clean it up, and now you have examples for documentation.
Unit tests are for me mostly testing domain+codomain of functions and adherence to business logic, but a good type system along with discipline for actually making schemas/POJOs etc instead of just tossing around maps strings and ints everywhere already accomplishes a lot of that (still absolutely needed though!)
Sure, integration tests "save" you from writing pesky unit tests, and changing them frequently after every refactor.
But how do you quickly locate the reason that integration test failed? There could be hundreds of moving parts involved, and any one of them malfunctioning, or any unexpected interaction between them, could cause it to fail. The error itself would likely not be clear enough, if it's covered by layers of indirection.
Unit tests give you that ability. If written correctly, they should be the first to fail (which is a good thing!), and if an integration test fails, it should ideally also be accompanied by at least one unit test failure. This way it immediately pinpoints the root cause.
The higher up the stack you test, the harder it is to debug. With E2E tests you're essentially debugging the entire system, which is why we don't exclusively write E2E tests, even though they're very useful.
To me the traditional test pyramid is still the best way to think about tests. Tests shouldn't be an afterthought or a chore. Maintaining a comprehensive and effective test suite takes as much hard work as, if not more than, maintaining the application itself, and it should test all layers of the system. But if you do have that, it gives you superpowers to safely and reliably work on any part of the system.
For example, assuming you have a test database with realistic data (or scrubbed production data), write tests that are based on generalizable business rules, e.g: the total line of an 'invoice' GET response should be the sum of all the 'sections' endpoint responses tied to that invoice id. Then, just have a process that runs before the tests create a bunch of test cases (invoice IDs to try), randomly selected from all the IDs in the database. Limit the number of cases to something reasonable for total test duration.
As one would expect, overly tight assertions can often lead to many false positives, but really tough edge cases hidden in diverse/unexpected data (null refs) can be found that usually escape the artificial or 'happy path' pre-selected cases.
Testing that you actually run "sum()" is a unit test.
I’d prefer a dozen well written integration tests over a hundred unit tests.
Having said that, both solve different problems, ideally you have both. But when time-constrained, I always focus on integration tests with actual services underneath.
> Creating reliable and fully-initialized service dependencies using raw Docker commands or using Docker Compose requires good knowledge of Docker internals and how to best run specific technologies in a container
This sounds like a <your programming language> abstraction over docker-compose, which lets you define your docker environment without learning the syntax of docker-compose itself. But then
> port conflicts, containers not being fully initialized or ready for interactions when the tests start, etc.
means you'd still need a good understanding of docker networking, dependencies, healthchecks to know if your test environment is ready to be used.
Am I missing something? Is this basically change what's starting your docker test containers?
Shows how you can embed the declaration of db for testing in a unit test:
> pgContainer, err := postgres.RunContainer(ctx, > testcontainers.WithImage("postgres:15.3-alpine"), > postgres.WithInitScripts(filepath.Join("..", "testdata", "init-db.sql")), > postgres.WithDatabase("test-db"), > postgres.WithUsername("postgres"), > postgres.WithPassword("postgres"), > testcontainers.WithWaitStrategy( > wait.ForLog("database system is ready to accept connections").
This does look quite neat for setting up test specific database instances instead of spawning one outside of the test context with docker(compose). It should also make it possible to run tests that require their own instance in parallel.
A better approach is to create a single postgres server one-time before running all of your tests. Then, create a template database on that server, and run your migrations on that template. Now, for each unit test, you can connect to the same server and create a new database from that template. This is not a pain in the ass and it is very fast: you run your migrations one time, and pay a ~20ms cost for each test to get its own database.
I've implemented this for golang here — considering also implementing this for Django and for Typescript if there is enough interest. https://github.com/peterldowns/pgtestdb
Indeed all they do is provide an abstraction for your language, but this is soo useful for unit/integration tests.
At my work we have many microservices in both Java and python, all of which use testcontainers to set up the local env or integration tests. The integration with localstack and the ability to programmatically set it up without fighting with compose files, is somewhat I find very useful.
I ask because I really like this and would love to use it, but I'm concerned that that would add just an insane amount of overhead to the point where the convenience isn't worth the immense amount of extra time it would take.
I built a new service registry recently, its unit tests spins up a zookeeper instance for the duration of the test, and then kills it.
Also very nice with databases. Spin up a clean db, run migrations, then test db code with zero worries about accidentally leaving stuff in a table that poisons other tests.
I guess the killer feature is how well it works.
Are you spinning up a new instance between every test case? Because that sounds painfully slow.
I would just define a function which DELETEs all the data and call it between every test.
- on a Mac
- on a Linux VM
- in a Docker container on a Linux VM, with a Docker socket mounted
The networking for each of these is completely different. I had to make some opinionated choices to get code that could run in all cases. And running inside Docker prevented the test from being able to mount arbitrary files into the test containers, which turns out to be a requirement often. I ended up writing code to build a new image for each container, using ADD to inject files.
I also wanted all the tests to run in parallel and spit out readable logs from every container (properly associated with the correct test).
Not sure if any of these things have changed in testcontainers since I last looked, but these are the things I ran into. It took maybe a month of off and on tweaking, contrary to some people here claiming it can be done in an hour. As always, the devil is in the details.
edit: I did end up stealing ryuk. That thing can’t really be improved upon.
What does that mean in this case? What does a hand rolled version of this look like?
My version is more opinionated than testcontainers and can really only be used inside Go tests (relies on a testing.TB)
Wait what? They think you don't need unit tests because you can run integration tests with containers?
It's trivial to set up a docker container with one of your dependencies, but starting containers is painful and slow.
2) While unit tests are cheaper and quicker than (project-level) integration tests, they also in many cases don't provide results as good a result and level of confidence, because a lot of run-time aspects (serialization, HTTP responses, database responses, etc.) are not as straightforward to mock. There's been some noise about The Testing Trophy, instead of the Testing Pyramid where, in short, there are still unit tests where it makes sense, but a lot of testing has moved to the (project-level) integration tests. These are slower, but only by so much that the trade-off is often worth it. Whether it's worth it, depends heavily on what you're testing. If it's a CRUD API: I use integration tests. If it's something algorithmic, or string manipulation, etc.: I use unit tests.
When I saw the Testing Trophy presented, it came with the asterisk that (project-level) integration testing has gotten easier and cheaper over time, thus allowing a shift in trade-off. Testcontainers is one of the primary reasons why this shift has happened. (And... I respect that it's not for everyone.)
Some references: https://kentcdodds.com/blog/the-testing-trophy-and-testing-c...https://www.youtube.com/watch?v=t-HmXomntCA
But on the test philosophy angle, my take on what's happening is just that developers traditionally look for any reason to skip tests. I've seen this in a few different forms.
- right now containers make it trivial to run all of your dependencies. That's much easier than creating a mock or a fake, so we do that and don't bother creating a mock/fake.
- compiler folks have created great static analysis tools. That's easier than writing a bunch of tests, so we'll just assume static analysis will catch our bugs for us.
- <my language>'s types system does a bunch of work type checking, so I don't need tests. Or maybe I just need randomly generated property tests.
- no tests can sufficiently emulate our production environment, so tests are noise and we'll work out issues in dev and prod.
What I've noticed, though, is that looking across a wide number of software projects is there's a clear difference in quality between projects that have a strong testing discipline and those that convince themselves they don't need tests because of <containers, or types, or whatever else>.
Sure it's possible that tests don't cause the quality difference (maybe there's a third factor for example that causes both). And of course if you have limited resources you have to make a decision about which quality assurance steps to cut.
But personally I respect a project more if they just say they don't have the bandwidth to test properly so they're just skipping to the integration stage (or whatever) rather than convince themselves that those tests weren't important any way. Because I've seen so many projects that would have been much better with even a small number of unit tests where they only had integration tests.
Note, not integration, E2E. I can go from bare vm to fully tested system in under 15 minutes. I can re run that test in 1-5 (depending on project) ...
Im creating 100's of records in that time, and fuzzing a lot of data entry. I could get it to go "even faster" if I went in and removed some of the stepwise testing... A->B->C->D could be broken out to a->b, a->c, a->d.
Because my tests are external, they would be durable across a system re-write (if I need to change language, platform etc). They can also be re-used/tweeked to test system perf under load (something unit tests could never do).
Yeah, those are called end to end tests and you run them after integration tests which you run after unit tests. It sounds to me like they're saying just skip to the end to end tests.
> For many apps and use cases, the overhead in managing container state is worth it.
Yeah, and typically you'd run them after you run unit and integration tests. If I have 10 libraries to test that have database access, I have to run 10 database containers simultaneously every few minutes as part of the development process? That's overkill.
Especially if there are complex dependencies between required containers it seems to be pretty weak in comparison. But i also only used it like 5 years ago, so maybe things are significantly better now.
However, the new tests could not be run in parallel with the existing ones, as the changes in global state in the database caused flaky failures. I know there will be other tests like them in the future, so I want a robust way of writing these kinds of "global" tests without too much manual labor.
Spinning up a new postgres instance for each of these specific tests would be one solution.
I would like to instead go for running the tests inside of transactions, but that comes with its own sorts of issues.
1. https://java.testcontainers.org/modules/docker_compose/