If you've been running your runners on your own infra for cost reasons, you're not really that interesting to the Github business.
If you've been running your runners on your own infra for cost reasons, you're not really that interesting to the Github business.
At least before we had zero-cost exceptions. These days, I suspect the HFT crowd is back to counting microseconds or milliseconds as trades are being done smarter, not faster.
Its like GET <namespace>/object, PUT <namespace>/object. To me its the most obvious mapping of HTTP to immutable object key value storage you could imagine.
It is bad that the control plane responses can be malformed XML (e.g keys are not escaped right if you put XML control characters in object paths) but that can be forgiven as an oversight.
Its not perfect but I don't think its a strange API at all.
3500 pages to describe upload and download, basically. That is pretty strange in my book.
Might have been interesting to actually include this in the article, do you not think so? ;-)
The way the article is written, made it seen that you used cdb on edge nodes to store metadata. With no information as to what your storing / access, how, why ... This is part of the reason we have these discussions here.
It's also quite hard to debug in go, because mmaped files are not visible in pprof; whe you run out of memory, mmap starts behaving really suboptimally. And it's hard to see which file takes how much memory (again it doesn't show in pprof).
But eventually phk left, and you came into conflict with him over the name, which was resolved by him choosing a different name for his version of Varnish?
We've been funding phks work on Varnish and Vinyl cache for 20 years. Do you think phk can write, maintain and release something on his own? Vinyl Cache cannot be a one-man-show, be real.
> The last couple of weeks I've been working on an HTTP-backed filesystem.
It feels like this is micro optimizations, that are going to get blocked anyway by the whole HTTP cycle anyway.
There is also the benchmark issue:
The enhanced CDB format seems to be focused on a read only benefits, as writes introduced a lot of latency, and issue with mmap. In other words, there is a need to freeze for the mmap, then unfreeze, write for updates, freeze for mmap ...
This cycle introduces overhead, does it not? Has this been benchmarked? Because from what i am seeing, the benefits are mostly in the frozen state (aka read only).
If the data is changed infrequently, why not just use json? No matter how slow it is, if your just going to do http requests for the directory listing, your overhead is not the actual file format.
If this enhanced file format was used as file storage, and you want to be able to fast read files, that is a different matter. Then there are ways around it with keeping "part" files where files 1 ... 1000 are in file.01, 2 ... 2000 in file.02 (thus reducing overhead from the file system). And those are memory mapped for fast reading. And where updates are invalidated files/rewrites (as i do not see any delete/vacume ability in the file format).
So, the actual benefits just for a file directory listing db escapes me.
CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.
CDB also makes it possible to access it directly, with ranged HTTP requests. It isn't something I've implemented, but having the option to do so is nice.
Hardware keys would be better, but I think this is a decent balance or security vs convenience for my needs ATM.