Agree with others saying HN needs more content like this!
After reading I don’t get how locks held in memory affect WAL shipping. WAL reader reads it in a single thread, updates in-memory data structures periodically dumping them on disk. Perhaps you want to read one big instruction from WAL and apply it to many buffers using multiple threads?
>Adapting algorithms to work atomically at the block level is table stakes for physical replication
Why? To me the only thing you have to do atomically is WAL write. WAL readers read and write however they want given that they can detect partial writes and replay WAL.
>If a VACUUM is running on the primary at the same time that a query hits a read replica, it's possible for Postgres to abort the read.
The situation you referring to is:
1. Record inserted
2. Standby long query started
3. Record removed
4. Primary vacuum started
5. Vacuum replicated
6. Vacuum on standby cannot remove record because it is being read by the long query.
7. PG cancels the query to let vacuum proceed.
I guess your implementation generates a lot of dead tuples during compaction. You clearly fighting PG here. Could a custom storage engine be a better option?
After reading I don’t get how locks held in memory affect WAL shipping.
WAL reader reads it in a single thread, updates in-memory data structures
periodically dumping them on disk. Perhaps you want to read one big
instruction from WAL and apply it to many buffers using multiple threads?
We currently use an un-modified/generic WAL entry, and don't implement our own replay. That means we don't control the order of locks acquired/released during replay: and the default is to acquire exactly one lock to update a buffer.
But as far as I know, even with a custom WAL entry implementation, the maximum in one entry would still be ~8k, which might not be sufficient for a multi-block atomic operation. So the data structure needs to support block-at-a-time atomic updates.
I guess your implementation generates a lot of dead tuples during
compaction. You clearly fighting PG here. Could a custom storage
engine be a better option?
`pg_search`'s LSM tree is effectively a custom storage engine, but it is an index (Index Access Method and Custom Scan) rather than a table. See more on it here: https://www.paradedb.com/blog/block_storage_part_one
LSM compaction does not generate any dead tuples on its own, as what is dead is controlled by what is "dead" in the heap/table due to deletes/updates. Instead, the LSM is cycling blocks into and out of a custom free space map (that we implemented to reduce WAL traffic).
Because it's a whole another architectural component for data that is already there, in a tool that is made just for that lacking just of `SELECT full_text_search('kitty pictures');`
Running elasticsearch is a miserable experience. If you can run one tool that you already have with slightly more effort, amazing. And you never need to think about rebuilding indexes or tuning the garbage collector or planning an ES major version migration.
There can be multiple reasons, one that I can think of right away would be to keep the stack as simple as possible until you can. Realistically speaking most of the companies do not operate at the scale where they would need the specialized tools.
There is a cost associated with adopting and integrating another tool like ElasticSearch. For some orgs, the ROI might not be there. And if their existing database provide some additional capabilities in this space, that might be preferrable.
> This is another one of those “when you have a hammer, everything looks like your thumb” stories.
Are you referring to people who think that every reporting problem must be solved by a dedicated OLAP database?
After reading I don’t get how locks held in memory affect WAL shipping. WAL reader reads it in a single thread, updates in-memory data structures periodically dumping them on disk. Perhaps you want to read one big instruction from WAL and apply it to many buffers using multiple threads?
>Adapting algorithms to work atomically at the block level is table stakes for physical replication
Why? To me the only thing you have to do atomically is WAL write. WAL readers read and write however they want given that they can detect partial writes and replay WAL.
>If a VACUUM is running on the primary at the same time that a query hits a read replica, it's possible for Postgres to abort the read.
The situation you referring to is: 1. Record inserted 2. Standby long query started 3. Record removed 4. Primary vacuum started 5. Vacuum replicated 6. Vacuum on standby cannot remove record because it is being read by the long query. 7. PG cancels the query to let vacuum proceed.
I guess your implementation generates a lot of dead tuples during compaction. You clearly fighting PG here. Could a custom storage engine be a better option?
Deleted Comment
But as far as I know, even with a custom WAL entry implementation, the maximum in one entry would still be ~8k, which might not be sufficient for a multi-block atomic operation. So the data structure needs to support block-at-a-time atomic updates.
`pg_search`'s LSM tree is effectively a custom storage engine, but it is an index (Index Access Method and Custom Scan) rather than a table. See more on it here: https://www.paradedb.com/blog/block_storage_part_oneLSM compaction does not generate any dead tuples on its own, as what is dead is controlled by what is "dead" in the heap/table due to deletes/updates. Instead, the LSM is cycling blocks into and out of a custom free space map (that we implemented to reduce WAL traffic).
Why not just use OpenSearch or ElasticSearch? The tool is already in the inventory; why use a screwdriver when a chisel is needed and available?
This is another one of those “when you have a hammer, everything looks like your thumb” stories.
There is a cost associated with adopting and integrating another tool like ElasticSearch. For some orgs, the ROI might not be there. And if their existing database provide some additional capabilities in this space, that might be preferrable.
> This is another one of those “when you have a hammer, everything looks like your thumb” stories.
Are you referring to people who think that every reporting problem must be solved by a dedicated OLAP database?
Dead Comment