Readit News logoReadit News
pzmarzly · a year ago

  -g kB       Remove old log lines when the in-memory database crosses x kB
Seems like garbage collection is only implemented for in-memory database (by reading SQLITE_DBSTATUS_CACHE_USED). Maybe logrotate could be set up to do it instead, but nothing in documentation indicates so.

Otherwise looks like a great project.

thebeardisred · a year ago
What's the maximum write speed? At what point do you start losing log messages?
cryptonector · a year ago
You can amortize the write speed significantly by not committing often, either in the sense of a SQL `COMMIT` or in the sense of doing a _synchronous_ `COMMIT`. You could commit every N seconds, say, for some sufficiently large N, or you could commit after N seconds of idle time and no more than M seconds since the last commit. You can also disable `fsync()`, commit often, and re-enable `fsync()` once every N seconds. There are many tactics you can use for data where some loss due to power failure is tolerable.

I.e., you can probably get pretty close to the storage device's max sustained write throughput, though with some losses for write magnification due, e.g., to B-tree write magnification and also indices you might want to maintain.

Write magnification due to B-tree write magnification can be amortized by committing infrequently (which is why I listed that _first_ above). Though there should be no need because SQLite3 already amortizes B-tree write magnification by using a write-ahead log (WAL), so be sure to enable the WAL for this sort of application.

Write magnification due to indices can be amortized by partitioning your tables by time ranges, and then use a VIEW to unify your tables, and then you can create an index on any partition only when it closes to new log entries. This approach causes reads to be slow when searching newer log entries, but those probably will all fit in memory, so it's not a problem if you have a large enough page cache.

Now I've not built anything like this so I can't say for sure, but I suspect that one could get very aggressive with these techniques and reach a sustained write rate of around 75% of the storage device(s)' sustained write rate.

singron · a year ago
Turning off fsync is pretty dangerous since a crash could corrupt the database. You might think you would just lose a couple seconds of data, but that's only true if writes are applied in order.

E.g. if some data is moved from page A to page B, you normally write B with the new data, fsync, and then write A without the data. Without fsync, you might only write page A and you would lose that data. This might happen on a internal data structure and corrupt the whole database.

bubblesnort · a year ago
I don't think this is going to be an issue as Linux has a built-in rate limiter.
thebeardisred · a year ago
This is a core design challenge for all logging systems. This is why there are mechanisms for intentionally dropping messages to relieve queue pressure, optimizations around the use of IO_URING. Inversely because logging systems can drop messages it is one of the primary reasons for "MARK" type mechanisms (https://lists.debian.org/debian-user/1998/09/msg00915.html).
cryptonector · a year ago
That's not GP's question. GP wants to know how high the write rate can be regardless of the systemd log rate limiter, likely so as to be able to increase that rate limit!
Spivak · a year ago
I'm actually kinda surprised they went with SQLite here, log messages are the trivialest data format and there's no way you can't beat SQLite's speed by just not having database logic in the middle at all. Just being able to BYOAllocator for the logs themselves with such predictable linear memory usage would make this thing scream.
WhyNotHugo · a year ago
The advantage of SQLite is being able to perform queries like “logs for service X and Y yesterday between 15hs and 16hs”.
nolist_policy · a year ago

  journalctl -u ssh.service -u exim4.service --since='2024-09-14 15:00' --until='2024-09-14 16:00'
(Systemd accepts e.g. 'yesterday' as a timestamp but not together with a time)

cryptonector · a year ago
You're sort of right in that a B-tree is not a good data structure for logs given that append-only files are perfect for logs. But the point of using an RDBMS for logs is to be able to a) index the logs, b) provide a great search facility for logs. Perhaps a better design would be a virtual table plugin for SQLite3 that allows one to use log files as tables, then index and search them with SQLite3, but if one lacks the time to investigate that approach then oe can't be faulted for using SQLite3 directly.
ivzhh · a year ago
Agree, /var/log/messages is there for a long time, writing to log is never a problem. Digesting the log is the niche market and it is profitable enough that we have a lot of tools in this market (rotation, transmission, parsing, etc)
simscitizen · a year ago
How does this work exactly? Is every log line a separate transaction in autocommit mode? Because I don't see any begin/commit statements in this codebase so far...
nine_k · a year ago
Maybe autocommit mode is set? I'd expect that.
simscitizen · a year ago
Inserting a new row for each log line in autocommit mode would be absurdly inefficient compared to just appending a log line to a file.
marcrosoft · a year ago
I did something similar but not open source: centrallogging.com it is surprising how SQLite can scale for smallish amounts of logs (1tb)
juvenn · a year ago
Why not use duckdb? It is a column database, and more situated for log entries (seems to me).
righthand · a year ago
This looks right up my alley. I am experimenting to see how much I can strip systemd from my every day laptop as an exercise in futility and to understand how embedded a distribution like Debian has become.
UI_at_80x24 · a year ago
I stopped trying to swim against the current and switched to OpenBSD/FreeBSD. You might be surprised how viable it really is.
koeng · a year ago
It’s really a shame that OpenBSD doesn’t have a good file system. Otherwise I’d use for my production systems (I could put it inside of proxmox, with a ZFS container outside for stability inside, but I like to run bare metal, no VMs)
nine_k · a year ago
I run Void Linux on my laptop, which seems to exhibit many BSD-esque approaches (and lacks systemd), while also being a rolling release distro with fresh stuff usually appearing in a few days after an upstream release.

Pretty fine so far (about 6 years).

righthand · a year ago
That’s my concern, the Debian experience is usually pretty lovely and I’d hate to leave it behind, but maybe there’s no point in fiddling with something stuck in trends.
pwg · a year ago
Slackware is also systemd free.

http://www.slackware.com/

yjftsjthsd-h · a year ago
I think Alpine would be a smaller jump?
LargoLasskhyfv · a year ago
That's the spirit!

https://www.antixforum.com/forums/topic/antix-23-1_init-dive...

Seems that it even saves power, in spite of using that gaming/latency-optimized kernel from Pika-OS.

Deleted Comment

Dead Comment