Readit News logoReadit News
dmw_ng commented on Restic: Backups done right   restic.net/... · Posted by u/fanf2
dmw_ng · a year ago
I'm a restic user, but have resisted the urge to attempt a bikeshed for a long time, mostly due to perf. It's index format seems to be slow and terrible and the chunking algorithm it uses (rabin fingerprints) is very slow compared to more recent alternatives (like FastCDC). Drives me nuts to watch it chugging along backing up or listing snapshots at nowhere close to the IO rate of the system while still making the fans run. Despite that it still seems to be the best free software option around
dmw_ng commented on Hetzner Object Storage   docs.hetzner.com/storage/... · Posted by u/polyrand
twic · a year ago
Since we're talking object storage, a question for the collective brain: are there any object storage solutions, cloud or on-prem, which support any sort of "operator pushdown"?

By "operator pushdown", i mean any ability to filter or map over the contents of the object on the server side in some way, sending only the results over the network to the client.

For example, say you have a huge CSV file of customer orders in a bucket. You might want to find the timestamp of all the orders which included a particular product. If all you can do is stream the whole file, then you need to do that, just to pick out a few timestamps. But you could imagine a kind of request where you say "only give me lines where the product ID is P01234, and only send the timestamp column". Perhaps you would express that as a pair of regular expressions, or a sed program, or a Lua script, or maybe the server would understand CSV and let you write something a bit like SQL. There are all sorts of ways it could be done. Providing a fully general way might be tricky, but it wouldn't need to be fully general to be useful.

I appreciate that if you want to do this sort of access frequently, you should probably be using a database, not object storage. But it seems like a very useful feature to layer on top of object storage, and one that feels like it should be fairly cheap to execute - the server has to do a small extra amount of computation, but then needs a lto less network bandwidth.

dmw_ng · a year ago
That's been a feature of S3 for quite a long time now, called S3 Select https://docs.aws.amazon.com/AmazonS3/latest/userguide/select...

Despite it being an awesome feature I've been itching to use, I've never actually found a use for it beyond messing around. Most places where S3 Select might make sense seems to be subsumed (for my uses) by Athena. Athena has a rather large amount of conceptual and actual boilerplate to get up and running with, though, S3 Select requires no upfront planning beyond building a fancy query string (or using their SDK wrappers)

Where S3 Select is likely to become fiddly is anywhere multiple files are involved. Athena makes querying large collections of CSVs (etc) straightforward, and handles all the scheduling and results merging for you.

Deleted Comment

Deleted Comment

Deleted Comment

dmw_ng commented on Microsoft donates the Mono Project to the Wine team   mono-project.com/... · Posted by u/itherseed
pdmccormick · a year ago
I'm genuinely curious, for someone who develops web application backends and larger distributed systems & infrastructure, predominantly using Go and Python, exclusively targeting Linux, is there anything in the .NET ecosystem that anyone would recommend I take a look at? Many thanks.
dmw_ng · a year ago
Modern .net on Linux is lovely, you can initialize a project, pull in the S3 client and write a 1-3 line C# program that AOT compiles to a single binary with none of the perf issues or GIL hand-wringing that plagues life in Python.

Given modern Python means type annotations everywhere, the convenience edge between it and modern C# (which dispenses with much of the javaesque boilerplate) is surprisingly thin, and the capabilities of the .net runtime far superior in many ways, making it quite an appealing alternative especially for perf sensitive stuff.

dmw_ng commented on Intel N100 Radxa X4 First Thoughts   bret.dk/intel-n100-radxa-... · Posted by u/geerlingguy
3np · a year ago
Noticed the same thing and I hope we see the N305 from the same generation take over or more vendor offering both options. Considering the rest of the platform package, it can really benefit from 8 cores instead of just 4.

The N100 can be a fair step up compared to Rpi5 but even RK3588 is already 8 cores. Would be a shame if many of the current generation of exciting hackable x86 mini-platforms lock in at the N100 as it will feel obsolete years earlier the the N305.

I run/ran stuff on both, as well as various ARM SBCs and previous generations like J4125/N5XXX. Considering the core-count, RK3588 is still a better pick for many use-cases unless single-thread performance is that important. Benchmark comparison: https://bret.dk/intel-n100-a-challenge-to-arm/

dmw_ng · a year ago
I recently bought an n100 and within a matter of days got buyer's remorse and impulse-purchased an n305 to go right beside it, which is currently sitting with a wildly overpriced 48 GB stick installed and 2TB SN850X, it's an absolute joy perfwise and the absence of heat it generates.

The only thing I'd reserve judgement on is the tendency to throttle. I haven't got far enough to characterize it, but it's not clear how much value those extra cores will add over the n100 with TDP settings tweaked down in the BIOS, and if leaving the n305 to run at max TDP, heat/noise/cost/temperature-related instability may start to become an issue, especially when packing other hot components like a decent SSD into the tiny cases they come in.

dmw_ng commented on Proton launches its own version of Google Docs   engadget.com/proton-launc... · Posted by u/prng2021
dmw_ng · a year ago
Seems like a massive distraction from their offering for a small company, wonder why they didn't consider something like tight integration with OnlyOffice or similar. Setting out to build a new office suite feels about as sensible as building a new web browser from scratch. Except at least with a browser, you have open specs helping you through most of the endless supply of compatibility problems.
dmw_ng commented on 120ms to 30ms: Python to Rust   old.reddit.com/r/rust/com... · Posted by u/0xedb
galkk · a year ago
Almost always when I start prototyping something in python, I wish that I stopped half-way where I am now and switched to something else.

Most recent example - converting huge amount of xml files to parquet. I started very fast with python + pyarrow, but when I realized that parallelizing execution would help enormously, I hit GIL or picking/unpickling/multiprocessing costs.

It did work in python, in the end, but I feel that writing that in Rust/C# (even if I don't know Rust besides tutorials) in the end would be much more performant.

dmw_ng · a year ago
> converting huge amount of xml files

> pickling

Sounds like if this is the tooling and the task at hand, about the most complex things that should be passing through the pickler are partitioned lists of filenames rather than raw data. E.g. you can have each partition generate a parquet for combining in a final step (pyarrow.concat_tables() looks useful), or if it were some other format you were working with, potentially sending flat arrays back to the parent process as giant bytestrings or similar

This is not to say the limitations don't suck, just that very often there are simple approaches to avoid most of the pain

dmw_ng commented on SSH as a Sudo Replacement   whynothugo.nl/journal/202... · Posted by u/legobeet
op00to · a year ago
A big part of sudo is that you should be running individual commands using sudo to increase auditability rather than simply running sudo bash or whatever.
dmw_ng · a year ago
It's comical to see the sudo codebase mentioned in the same breath as increasing auditability here

u/dmw_ng

KarmaCake day1381October 28, 2019View Original