pdeva1 (u/pdeva1) - Readit News

pdeva1 commented on Holocron is an object storage based leader election library github.com/kellabyte/holo... · Posted by u/tosh

zinclozenge · a year ago

I love this stuff. Seems like TektiteDB also has an implementation as well https://github.com/spirit-labs/tektite/tree/main/cluster

pdeva1 · a year ago

this algorithm is flawed. i created a test which can read failed write https://gist.github.com/pdeva/58854fa644d074cf07aa0ecca9a465...

pdeva1 commented on ClickHouse Keeper: A ZooKeeper alternative written in C++ clickhouse.com/blog/click... · Posted by u/eatonphil

zX41ZdbW · 2 years ago

1. Yes, it can be used with other applications as a ZooKeeper replacement, unless some unusual ZooKeper features are used (there is no Kerberos integration in Keeper, and it does not support the TTL of persistent nodes) or the application tests for a specific ZooKeeper version.

2. It could be configure to store - snapshots; - RAFT logs other than the latest log; in S3. It cannot use a stateless Kubernetes pod - the latest log has to be located on the filesystem.

Although I see you can make a multi-region setup with multiple independent Kubernetes clusters and store logs in tmpfs (which is not 100% wrong from a theoretical standpoint), it is too risky to be practical.

3. Only the snapshots and the previous logs could be on S3, so the PUT requests are done only on log rotation.

pdeva1 · 2 years ago

2. ok. so can i rebuild a cluster with just state in s3? eg: i create a cluster with local disks and s3 backing. entire cluster gets deleted. if i recreate cluster and point to same s3 bucket, will it restore its state?

pdeva1 commented on ClickHouse Keeper: A ZooKeeper alternative written in C++ clickhouse.com/blog/click... · Posted by u/eatonphil

tylerhannan · 2 years ago

Thanks for sharing!

If anyone has any questions, I'll do my best to get them answered.

(Disclaimer: I work at ClickHouse)

pdeva1 · 2 years ago

1. can this be used without clickhouse as just a zookeeper replacement? 2. am i correct in that its using s3 as disk? so can it be run as stateless pods in k8s? 3. if it uses s3, how are latency and costs of PUTs affected? does every write result in a PUT call to s3?

pdeva1 commented on Kafka is dead, long live Kafka warpstream.com/blog/kafka... · Posted by u/richieartoul

richieartoul · 2 years ago

(WarpStream founder)

1. Yeah, we mention at the end of the post the P99 produce latency is ~400ms. 2. MSK still charges you for networking to produce into the cluster and consumer out of it if follower fetch is not properly configured. Also, you still have to more or less manage a Kafka cluster (hot spotting, partition rebalancing, etc). In practice we think WarpStream will be much cheaper to use than MSK for almost all use-cases, and significantly easier to manage.

pdeva1 · 2 years ago

1. what payload size and flush interval is that latency measured against?

pdeva1 commented on Kafka is dead, long live Kafka warpstream.com/blog/kafka... · Posted by u/richieartoul

ryanworl · 2 years ago

I'm Ryan Worl, co-founder and CTO of WarpStream. We're super excited to announce our Developer Preview of our Kafka protocol compatible streaming system built directly on top of S3 with no stateful disks/nodes to run, no rebalancing data, no ZooKeeper, and 5-10x cheaper because of no cross-AZ bandwidth charges.

If you have any questions about WarpStream, my co-founder (richieartoul) and I will be here to answer them.

pdeva1 · 2 years ago

1. dont producers now have much higher latency since they have to wait for writes to s3.

2. if the '5-10x cheaper' is mostly due to cross AZ savings, isnt that offered by AWS MSK offering too?

pdeva1 commented on Launch HN: Moonrepo (YC W23) – Open-source build system · Posted by u/mileswjohnson

pdeva1 · 3 years ago

one of the reasons Bazel needs BUILD files with explicit inputs /outputs defined per file is to do fine grained incremental builds and test runs. so if i change say foo.c, i only need to recompile foo.obj and run ‘foo-tests’. Moon seems to take globs as input. Thus modifying even a single file inside ‘src’ dir will trigger rebuild/retest of the entire ‘project’

pdeva1 commented on Nvidia Reveals Its CEO Was Computer Generated in Keynote Speech vice.com/en/article/88nbp... · Posted by u/tosh

wgx · 4 years ago

Here's a time point where the CGI CEO makes his appearance with a 3D transition: https://www.youtube.com/watch?v=eAn_oiZwUXA&t=3742s

pdeva1 · 4 years ago

actually that timestamp is deceptive since the frame it opens at is the real jensen. this timestamp shows the cgi jensen and its very obvious due to the distance and animation that it is indeed cgi. https://youtu.be/eAn_oiZwUXA?t=3761

pdeva1 commented on Nvidia Reveals Its CEO Was Computer Generated in Keynote Speech vice.com/en/article/88nbp... · Posted by u/tosh

pdeva1 · 4 years ago

note that this is only for 14 seconds of the video, when it is very obvious in the video that it is indeed a cgi figure.

pdeva1 commented on Rethinking the IDE for the 2020s movingfulcrum.com/rethink... · Posted by u/fsynced

reissbaker · 5 years ago

Context: I most recently worked at Facebook, which develops and maintains an in-house IDE, Nuclide, for working with its massive codebases; currently it uses VSCode as a base, although it previously used Atom. In a previous life I led the DevTools team at Airbnb (and before that did various other things at Airbnb).

This article hits most of the nails exactly on the head, at least from the perspective of large corporate environments; I doubt any of these apply to small startups (and if they do to yours, something has probably gone very wrong). Nuclide handles these issues in fairly interesting ways:

* FB built their own incremental, caching compilers for several languages that Nuclide is aware of, so that typechecking and rebuilds are effectively instantaneous even across massive codebases.

* Nuclide integrates with FB's massive cross-repo search service, BigGrep, so that you can find references to anything.

* Nuclide's version control UI is phenomenal, and is aware of multiple repositories existing, which ones you have checked out, and integrates with their in-house version of Phabricator (a code repository/review tool similar to Github). I literally never learned to use Mercurial while I was there. I just used Nuclide.

However, there's one major difference between Nuclide and the vision that this article lays out: remote vs local code checkouts. Nuclide ran locally, but it used remote code checkouts, and your code ran on the remote, not on your local machine. I think the reasons and benefits were compelling:

* Massive repositories take up massive amounts of disk space.

* Massive repositories take massive amounts of time to download to your laptop, and if you're working in a large corporate environment, they also take massive amounts of time to download recent changes since there are many engineers shipping commits.

* If your machine gets hosed, you can spin up a new one quickly; in FB's case, it took seconds to get one of the "OnDemand" servers. If your local machine gets hosed... A trip to IT is not going to be as easy.

* If you run into issues, tools engineers can SSH directly into your instance and see what the problem is, fix it, and also ship preemptive fixes to everyone else. That would feel quite privacy-invasive for someone's local machine.

* Many employees enjoy being able to take their laptop with them so that they can work remotely e.g. to extend a trip to another country without taking more PTO. Laptops aren't great at running large distributed systems.

* Even some individual services (ignoring the rest of the constellation of services) are impractical to run on constrained devices like laptops, and operate on either "big data" or require large amounts of RAM.

* Remote servers can more conveniently integrate securely with remote services than laptops or desktops.

* Having the entire codebase stored locally on a laptop or desktop is a relative security risk: if it's in a centrally managed server and a laptop gets stolen, revoke the key. If it's on disk, well, hope the thief didn't get the password? Or in FB's case, hope the thief isn't an APT/the NSA and has backdoored the laptop's full-disk encryption? And it's not just laptop theft: a disgruntled employee could intentionally leak the code -- or in Google and Anthony Levandowski's case, sell it to a competitor through a shell company. If everything is stored centrally and local checkouts are extremely uncommon, you have a lot more room to automate defense against that scenario.

Overall I'm a big fan of running your code on remote devservers rather than on local machines once you get to scale.

pdeva1 · 5 years ago

I do think Facebook has the correct direction on this.