- Marking and sweeping cause latency spikes which may be unacceptable if your program must have millisecond responsiveness.
- GC happens intermittently, which means garbage accumulates until each collection, and so your program is overall less memory efficient.
With modern concurrent collectors like Java's ZGC, that's not the case any longer. They show sub-millisecond pause times and run concurrently. The trade-off is a higher CPU utilization and thus reduced overall throughput, which if and when it is a problem can oftentimes be mitigated by scaling out to more compute nodes.
Iceberg is optimized for fact data in very large tables with relatively rare changes and likewise rare change to schema. It does that well and will continue to do so for the foreseeable future.
PostgreSQL databases typically don't generate huge amounts of data; that data can also be highly mutable in many cases. Not only that, the schema can change substantially. Both types of changes are hard to manage in replication, especially if the target is a system, like Iceberg, that does not handle change very well in the first place.
So that leaves the case where you have an lot of data in PostgreSQL that's creating bad economics. In that case, why not just skip PostgreSQL and put it in an analytic database to begin with?
p.s., I'm pretty familiar with trading systems that do archive transaction data to data lakes using Parquet for long-term analytics and compliance. That is a different problem. The data is for all intents and purposes immutable.
Edit: clarity
The live data set may not be huge, but the entire trail of all changes of all current and all previously existing data may easily exceed the volume of data you can reasonably process with Postgres.
In addition, its row based storage format doesn't make it an ideal fit for typical analytical queries on large amounts of data.
Replicating the data from Postgres to Iceberg addresses these issues. But, of course, it's not without its own challenges, as demonstrated by the article.