steven-xu (u/steven-xu)

steven-xu commented on BazelCon 2024 Recap blogsystem5.substack.com/... · Posted by u/kaycebasques

steven-xu · a year ago

> Bazel is designed to be optimal and has perfect knowledge of the state of the world, yet it’s too slow for quick operations.

This is one of the biggest challenges where Bazel falls short of non-Bazel tooling for us in the Web development/Node.js world. The Node ecosystem has characteristics that push Bazel scaling (node_modules is 100k+ files and 1GB+), and Bazel's insistence on getting builds correct/reproducible is in a way its own enemy. Bazel needs to set up 100s of thousands of file watchers to be correct, but the Node.js ecosystem's "let's just assume that node_modules hasn't changed" is good enough most of the time. For us, many non-Bazel inner devloop steps inflate from <5s to >60s after Bazel overhead, even after significant infra tuning.

steven-xu commented on Microfrontends should be a last resort breck-mckye.com/blog/2023... · Posted by u/keepamovin

steven-xu · 2 years ago

I’ve worked on two 100+ weekly committer monoliths and two similar sized MFE architectures. I think the article hits good points though I’d add some acutely painful ones it misses[1]. I’m someone who is by gut now squarely in the pro-monolith camp, but I think that in the comment thread of a similarity anti-MFE article it’s worth steel manning the pro-MFE arguments rather than characterize it simplistically as cargo-culting or resume-boosting.

First, MFEs solve organizational issues by cheaply offering release independence. An example is when teams that do not overlap in working hours. Triaging and resolving binary release blockers is hard to do correctly and onerous on oncall WLB. Another example is when new products want to move quickly without triggering global binary rollbacks or forcing too fast a release cadence for mature products with lower tolerance for outages or older test pyramids.

Second, MFEs are a pragmatic choice because they can proceed independently from the mono/microrepo decision and any modularization investment, both of which are more costly by several multiples in the cases I've seen. Most infra teams are not ivory towers and MFEs are high bang-for-buck.

Finally, MFEs are a tool to solve fundamental scaling issues with continuous development. At a certain level of commits, race conditions cause bugs or build breakages or test failures, and flaky tests cause inability to (cheaply) certify last known good commit at cut time. You can greatly push out both of these scaling limits with good feature flag/health-mediated releases and mature CI, but having an additional tool in the kit allows you to pick which to invest in based on ROI.

Advocating for modularity is nice but I've never met an MFE advocate who didn't also want a more tree-shakeable modular codebase. We should not jump to the conclusion the MFE as bad or a "last resort" because there exists another solution that better solves an partially overlapping set of problems, especially if the other solution doesn't solve many problems that MFEs do or requires significant more work to solve them.

[1] Runtime JS isolation (e.g. of globals set by third party libraries) is hard and existing methods are leaky abstractions like module federation or require significant infra work like iframing with DOM shims. CSS encapsulation is very hard on complex systems, and workarounds like shadow DOM have a11y and library/tooling interop issues. Runtime state sharing (so not every MFE makes its own fetch/subscriptions for common data) is hard and prone to binary skew bugs. Runtime dynamic linking of shared common code is hard to reason around and static linking of common code can result in the same transition of a lazy loaded module to go from taking 10kB to 1MB+ over the wire.

steven-xu commented on Kentucky made child care free for child care workers npr.org/2023/10/06/120318... · Posted by u/westurner

nickff · 2 years ago

How is universal childcare less market distorting? It seems to me that the larger the subsidy (or tariff for that matter), the more the distortion.

steven-xu · 2 years ago

Universal childcare seems to me significantly less distorting than universal childcare for childcare workers. It’s larger subsidy yes but it corrects for an existing distortion (parents pay full price for childcare to raise children who then end up paying taxes to the state), and it does so in a way that it doesn’t break one specific labor market.

We do not do parents who aren’t already childcare workers a favor by skewing the market for their labor. For some parents, working in childcare is the right choice, but for many it won’t be, and when you artificially inflate short term wages only if they go into childcare, everyone loses.

steven-xu commented on Kentucky made child care free for child care workers npr.org/2023/10/06/120318... · Posted by u/westurner

steven-xu · 2 years ago

Setting aside the philosophical free market questions, additional government subsidy childcare seems fine. There’s a positive externality of readily available childcare: one adult to many children is more efficient than home care, and parents being able to work generates economic production, taxes, etc.

But couldn’t we come up with a subsidy with less market distortion? A specific subsidy to pay for childcare for childcare workers will just cause the market clearing wage to childcare workers to drop to the point that only people with high childcare costs will work in the field. This labor pool is much smaller and people will tend to cycle in and out as their children grow to school age. At the end of the day society will end up paying a lot more due to lower liquidity than with a plain flat subsidy.

It’s like loan forgiveness for federal workers. Sounds lovely but it just ends up breaking the market for example further subsidizing already wasteful higher education spending.

steven-xu commented on Changes to Unity Plans and Pricing unity.com/pricing-updates... · Posted by u/Luc

steven-xu · 2 years ago

I'm glad to see Unity finally choosing to play to its strengths here from an economic standpoint by offering rev share. It's an insurance business model, which a game engine is uniquely able to offer due to its horizontal reach.

There's big financial risk inherent in game development for which studios naturally would "pay" some sum to hedge against, in the format of trading away some upside in the success case for a lower cost in the failure case. In fact, risk aversion, which at least in aggregate often models much real world behavior, dictates that studios would be willing to pay generously, entering into a deal that yields negative EV in exchange for flattening the risk curve. On the other hand, underwriting the risk on Unity's part is basically risk-free because of its horizontal reach across studios. Because of the asymmetrical risk, there's considerable economic surplus to be captured in a way that leaves both parties better off.

Of course all of this only works out if Unity only sufficiently benefits from the big successes. To that end, while generous, the choice to let developers pay the lower of rev share vs. per-install fee seems perplexing to me. When customers can pick after they already know whether their game is successful, Unity fails to set up an effective insurance business and ultimately will lose out on the surplus. The winners will no longer automatically subsidize the losers, Unity may have to raise the costs of both deals to cover its operations, and this all just becomes a more convoluted price increase.

steven-xu commented on Certifications create wrong incentives for engineers and recruiters interviewing.io/blog/why-... · Posted by u/leeny

steven-xu · 2 years ago

Berkson's paradox[1] is a useful lens to analyze inverse correlation. The pool of people doing interview prep are either smart or determined (to accumulate certifications) or both, but dumb and lazy people don't enter the pool, and an inverse correlation applies. Similarly, if an interview bar evaluates candidates though a combination of soft and hard skills and rejects those who lack both, even if soft and hard skills are independently distributed (or positively correlated with insufficient strength), soft and hard skills will appear to be inversely correlated.

Certs could be neutrally or even positively correlated with interview performance, but by pre-filtering the population, the opposite phenomenon arises.

[1] - https://en.wikipedia.org/wiki/Berkson%27s_paradox

steven-xu commented on Launch HN: Moonrepo (YC W23) – Open-source build system · Posted by u/mileswjohnson

mileswjohnson · 3 years ago

We agree. We weren't happy with all of the current solutions, at least in the JavaScript space.

In regards to build graph invalidation, we attempt to hash pieces at the granular level. This includes per file content hashing, and for dependencies (those in `package.json`), we parse the lockfile and extract the resolved version/integrity hashes. We can probably improve this further, but this has been working great so far.

As for phantom dependencies, this isn't something moon solves directly. We encourage users to use pnpm or yarn 3, where these phantom problems are less likely. We don't believe moon should solve this, and instead, the package managers should.

If you have any other questions or concerns, would love to hear them.

steven-xu · 3 years ago

Can you summarize Moon's sandboxing approach? I understand for phantom deps you want to delegate to pnp/pnpm. But how do you handle sources not declared explicitly as inputs to a build graph node?

If I have package A only list its own sources as inputs, how do you prevent the Node runtime from happily doing require('../B')? If you don't, how do you prevent differences in the state of B from poisoning the remote cache?

steven-xu commented on Ask HN: Do we need another build system? · Posted by u/milbone

steven-xu · 3 years ago

General purpose build systems need to make API expressiveness, observability, performance, and correctness tradeoffs in exchange for their platform breadth.

By narrowing breadth, most often by targeting only one ecosystem[1] or making correctness tradeoffs[2], new build systems can be way more expressive and performant. For now.

They’re also fun to write because DAGs are parallelism are cool problems to think about.

[1] e.g. only targeting one language allows you to use that language instead of a generalized DSL to describe your build graph [2] e.g. abandoning sandboxing altogether and using manual invalidation or just trusting the user

steven-xu commented on Turborepo 1.2: High-performance build system for monorepos turborepo.org/blog/turbo-... · Posted by u/leerob

dfabulich · 3 years ago

Is it possible to integrate Turborepo with general-purpose monorepo build tools? Bazel, in particular?

(Is Bazel designed in a way that make it impossible to do JS monorepos well?)

steven-xu · 3 years ago

> Is it possible to integrate Turborepo with general-purpose monorepo build tools? Bazel, in particular?

It's definitely possible, but I think the practical limitations would make it too complex to reason around and maintain. You'd end up creating two separate and overlapping systems to declare dependency graphs and input sources and manage caching and execution.

I haven't yet seen a case where the two are actually interleaved. Currently at Databricks, we use Bazel to provide the correctness guarantees and interop needed for CI, and we use JS-specific tooling (non-Bazel) locally to meet our performance needs, where the usage profile is different and where we're willing to make correctness tradeoffs.

> (Is Bazel designed in a way that make it impossible to do JS monorepos well?)

There are limitations in Bazel that don't play nicely with modern JS conventions. For example, Bazel's standard sandbox is based on symlink farms, and Node.js and the ecosystem by default follow symlinks[1] to their real locations, effectively breaking sandboxing. A FUSE or custom filesystem (Google's version of Bazel takes advantage of one[2]) would be better but is not as portable. As another example, Bazel's action cache tries to watch or otherwise verify the integrity of every input file to an action, and when node_modules is 100k+ files, this gets expensive and is prone to non-determinism. Bazel does this for correctness, which is noble but results in practical performance problems. You need to do extra work to "trick" Bazel into not reading these 100k+ files each time.

The problems feel solvable to me, but not easily without adding yet more configuration options to Bazel. The influx of new JS-specific tooling is a reset to this, building the minimum viable set of functionality that the JS ecosystem specifically needs, without the burdens of being a general purpose build system.

[1] https://nodejs.org/api/cli.html#--preserve-symlinks [2] https://dl.acm.org/doi/10.1145/2854146

steven-xu commented on Turborepo 1.2: High-performance build system for monorepos turborepo.org/blog/turbo-... · Posted by u/leerob

steven-xu · 3 years ago

Well done and congrats to the Turborepo team on the launch as well as the Vercel merger, which I think is a great thing for the JS ecosystem!

We now have a healthy competition between several JS domain-specific build tools bit.dev, Nx, Turborepo, and Rush. This is in addition to plugins to general purpose monorepo tooling like rules_nodejs (Bazel). I'm looking forward to the seeing the new ideas that come out from the community as this field matures.

However, I feel a bit sad at the state of general purpose build tools (Bazel/Pants/Buck, make, GitHub Actions, Maven, Nix) or cloud-based IDEs (Cloud9, Codespaces). These tools come off as too complex to operate and build on top of such that we, the JS community, seem to be choosing to build JS-specific tooling from scratch instead. There are a huge number of mostly non-JS-specific problems that monorepo tooling eventually needs to solve: distributed build artifact and test result caching, distributed action execution, sandboxing, resource management and queuing, observability, and integration with other CI tools to name a few. I wish somehow we could reorganize around a smaller set of primitives instead of what appears to be reinventing the wheel.

Regardless, I think all of this effort and attention has lent credence to the monorepo thesis, and I'm very excited to see what's next.