Readit News logoReadit News
neal commented on A write-ahead log is not a universal part of durability   notes.eatonphil.com/2024-... · Posted by u/todsacerdoti
pjdesno · a year ago
Unfortunately this post skips over the "atomicity" part of a write-ahead log.

Assume you start with data on disk AAAAAAAA, read it into memory, and update it to BBBBBBBB, then write it back. If you crash in the middle, you might end up with BBBAAAAA, BBBBBBAA, or even some crazy interleaving. (at least for reasonable file sizes - note that the largest atomic write to many NVMe drives is 128K)

If you ditch the in-memory BTree and write a direct-to-disk one, with a lot of care (and maybe a bit of copy-on-write) you can make sure that each disk write leaves the database in a crash-consistent state, but that will cost multiple writes and fsyncs for any database modifications that split or merge BTree nodes - you have to ensure that each write leaves the database in a consistent state.

(for those of you old enough to remember ext2, it had the same problem. If you mounted it async and had a bad crash, the data on disk would be inconsistent - you'd lose data, so you'd vow to always mount your filesystem with synchronous writes so you'd never lose data again, then a few weeks later you'd get tired of the crappy performance and go back to async writes, until the next crash happened, etc. etc.)

The advantage of a log is that it allows you to combine multiple writes to different parts of the database file into a single record, guaranteeing (after crash recovery if necessary) that either all changes happen or none of them do. It serves the same purpose as a mutex in multi-threaded code - if your invariants hold when you get the mutex, and you reestablish them before you drop it, everything will be fine. We'd all love to have a mutex that keeps the system from crashing, but failing that we can use a WAL record to ensure that we move atomically from one valid state to another, without worrying about the order of intermediate changes to the data structure.

neal · a year ago
Good point!

I'm not sure if there are any databases that do your 'with a lot of care' option, but for anyone curious about what that might look like in practice there are file systems that forgo write-ahead logging and maintain metadata consistency using techniques like soft updates[0] or copy-on-write up to a root[1].

[0]: https://www.usenix.org/conference/1999-usenix-annual-technic... [1]: https://www.cs.hmc.edu/~rhodes/courses/cs134/fa20/readings/T... (yes, ZFS can be configured to use a WAL too for durability)

neal commented on Ask HN: Who needs a job?    · Posted by u/d4ft
neal · 15 years ago
Graduating in September with a MS in computer science.

I'm looking for a software development position. My research area was computer networks, but interested in other areas as well. More interested in working on back-end type of stuff than front-end/UI. In Boston area, but willing to relocate.

http://nealc.com

u/neal

KarmaCake day3February 1, 2009View Original