After this change, the TypeScript compiler will now be compiled with esbuild. I feel like thats probably the best endorsement esbuild could get, hah.
Surprising they call out the 2 space indent level that esbuild is hardcoded[1] to use as a benefit. Why not save even more bytes and re-format the output to single tab indentation? I wrote a simple script to replace the indentation with tabs. 2 indent size: 29.2MB, tabbed size: 27.3MB. 2MB more of indentation saved! Not significant after compression, but parsing time over billions of starts? Definitely worth it.
Spaces are simply inferior to tabs since the latter conveys the meaning of "one level of indentation" while the former does not. It's also better for accessibility and file size. There is not one single logical reason to ever use spaces for indentation, not one.
For some very fucking stupid historical reason someone in the 80s made the idiotic decision of spaces being the default in editors and people just went with it. The people earning more are doing so because those are the seniors who have given up on common sense and just go with the flow of the masses who are unable to grasp "tabs for indentation, spaces for alignment" yet insist on keeping alignment so the (terrible) compromise is just using spaces. And I strongly question whether "alignment" is worth anything, in almost all cases it's just useless and in the rest you're drawing ASCII diagrams in the comments which doesn't affect your code at all.
I wish they had broken down that survey question further to find out _how many_ spaces the highest paid developers use. Then I could finally have a data-driven answer to put in my prettier config!
This was actually a significant issue in a large PHP codebase I used to work on. Client hired a new guy who insisted that we convert everything to spaces, and suddenly it took about twice as long to check the thing out from Subversion.
Reminds me of Silicon Valley (HBO) where Richard uses “we are a compression company” to justify using tabs over spaces. Ironically once gzip compressed I doubt it would make any difference.
Even on an absolutely gigantic codebase using tabs or spaces will make almost no difference to build or type-checking times. Building an AST is much more overhead than white space considerations and once it’s an AST tabs or spaces are not included in the running of the code.
Does that mean they are not using type checking? That’s the really really slow part of writing TS and es build doesn’t include it, which is why I’ve never seen the point of using esbuild as a compiler.
We are still type checking, it's just not needed as a dependency for our JS outputs. Type checking still happens in tests, and I have CI tasks and VS Code watch tasks which will make sure we are still type checking.
> Finally, as a result of both of the previous performance improvements (faster code and less of it), tsc.js is 30% faster to start and typescript.js (our public API) is 10% faster to import. As we improve performance and code size, these numbers are likely to improve. We now include these metrics in our benchmarks to track over time and in relevant PRs.
> [...]
> The TypeScript package now targets ES2018. Prior to 5.0, our package targeted ES5 syntax and the ES2015 library, however, esbuild has a hard minimum syntax target of ES2015 (aka ES6). ES2018 was chosen as a balance between compatibility with older environments and access to more modern syntax and library features
I'd be curious as to what percentage of the improvement comes from modules vs comes from a different target.
From my superficial knowledge of compilers, "modularization" itself should not make code faster, if anything slower. There'll always be some overhead of loading modules and communicating between them, not?
I presume, from my own experience when building software (not compilers), that modules allow for a much easier to reason about, much better isolated (cohesion, loose coupling). And therefore for much easier improvements inside the module. I would presume that, here too, modules allowed them to improve the inner workings much better, allowing for the performance increase. Or am I completely misunderstanding this feature?
There are some key things here that maybe weren't clearly stated in my writeup.
Firstly, the old codebase is TS namespaces, which compile down to IIFEs that push properties onto objects. Each file that declares that namespace is its own IIFE, and so every access to other files incurs the overhead of a property access.
With modules, tooling like esbuild, rollup, can now actually see those dependencies (now they are standard ES module imports) and optimize access to them. In this PR's case, the main boost comes from scope hoisting.
For example, in one file, we may declare the helper `isIdentifier`. In namespaces, we would write `isIdentifier` in another file, but this would at emit time turn into `ts.isIdentifier`, which is slower. Now, we import that helper, and then esbuild (or rollup) can see that exact symbol. All of the helpers get pulled to the top of the output bundle, and calls to those helpers are direct.
That's why modules gives us a boost. There's also more (modules means we can use tooling to tree shake the output, and smaller bundles are faster to load), but the hoisting is the big thing.
There are two performance implications of "modularization": initialization-time and run-time.
You are correct that initializing many modules is usually slower than initializing one module [1]. However, bundling puts all modules into one file, so this PR doesn't actually change anything here. Both before and after this PR, the TypeScript compiler will be published as a single file.
At run-time, switching to ES modules from another JavaScript module system can be a significant performance improvement because it removes the overhead of communicating between them. Other module systems (e.g. TypeScript namespaces, CommonJS modules) use dynamic property accesses to reference identifiers in other modules while ES modules use static binding to reference the identifiers in other modules directly. Dynamic property access can be a big performance penalty in a large code base. Here's an example of the performance improvement that switching to ES modules alone can bring: https://github.com/microsoft/TypeScript/issues/39247.
[1] This is almost always true. A random exception to this is that some buggy compilers have O(n^2) behavior with respect to the number of certain kinds of symbols in a scope, so having too many of those symbols in a single scope can get really slow (and thus splitting your code into separate modules may actually improve initialization time). This issue is most severe in old versions of JavaScriptCore: https://github.com/evanw/esbuild/issues/478. When bundling, esbuild deliberately modifies the code to avoid the JavaScript features that cause this behavior.
> From my superficial knowledge of compilers, "modularization" itself should not make code faster, if anything slower. There'll always be some overhead of loading modules and communicating between them, not?
I think this is a misunderstanding of what actually happened.
TypeScript has a thing called “namespaces” and a thing called “modules”. Both provide modularization. The TS repo is not being modularized, instead, the namespaces are getting converted to modules.
Namespaces are an old-school approach to writing a module in JavaScript. You pack all of your exports into a JS object, and then access the object from somewhere else. This works, but JS is dynamic, and the runtime has no way to guarantee that you won’t mess with this object (replace functions or whatnot).
Modules don’t have this object. You just call the function, instantiate the class, or do whatever else with the names you imported. They are resolved statically, so certain optimizations become more “obvious”, like inlining.
For ES6 modules, the exports object is frozen (made read-only) so the JIT can make some extra assumptions and optimizations. With bundles, unless the bundler inserts `Object.freeze` around `module.exports`, they have to be treated as dynamic objects.
I’m curious, how many people are using TSC only for type-checking, and a different system (eg esbuild or ts-node) to actually compile/bundle/execute their code?
Looks like my suspicion was correct; not even tsc uses tsc!
The default configurations for Create-React-App and others use babel for type stripping today.
This seems to me like a great "win" for Typescript that so many tools just natively handle TS type stripping and that so much Typescript today only needs type stripping and doesn't need other parts of TS emit processes (or tslib).
> a change in the indentation used in our bundle files (4 spaces -> 2 spaces)
I find it interesting that one of the reasons given for the reduction in package size is due to such a simple indentation change from 4 spaces to 2 spaces.
Not interesting that 2 bytes are less than 4 bytes, rather, TypeScript is a large project and it would be interesting to know how much size was saved from this one specific change? Seems like a trivial change, so why not do it sooner? And assuming readability isn't required in the bundle output why not bundle with no indentation at all and put everything on a single line, would this not be even smaller again?
No, you were using non-standard ESM modules (compiled to CommonJS defined by babel)
Typescript recently added support for ESM compatible with node.js see "module": "node16"[1][2]
The Whole ESM saga is clusterfuck, not much better than python 2 -> 3 migration. Large node.js codebases have no viable path to migrate, and most tools still cannot support ESM properly[3]. Stuff is already breaking because prolific library authors are switching to ESM.
As someone that maintain large part of TS/JS tooling in my day job, I absolutely despise decisions made by node.js module team. My side projects are now in Elixir and zig because these communities care about DX.
And that's how it should work imo. But if you enable esm (which you might need in the future because of packages being esm only) you can't use those, only .js.
That's because typescript developers are dead set that they don't want to transpile the imports, they just want to copy paste them into the resulting file when running tsc.
have you worked on Typescript projects using ES modules? What you’ve described is the status quo for CommonJS modules, but doesn’t work when you switch to ESM (afaik, at least)
No, it’s new, to comply with new stuff from nodejs.
You can likely change it with a config, but do note that importing using .js will be the new standard way of doing things and by changing it through configuration you’re chosing to not follow the new standard.
It’s a complete mess IMO. Every project uses a different way to handle modules and there’s a lot of rough edges.
But that doesn't seem like it would fix the typical flow, there still would be no transpiling of imports. So yes you could use .ts in ts-node, but you would have to use .js in tsc. Which is pretty awful (you want your code to work with both).
A thousand times this. It's not only the dumbest thing I've seen a programming language do, it's also dumbest thing I've seen in the JS ecosystem. Ended up having to implement an AST-based post-processor to fix packages before publishing them.
Their complaints are unrelated to this specific PR.
See https://www.typescriptlang.org/docs/handbook/esm-node.html for details about how import paths work in CommonJS vs ESM. In both cases the import path you write in your source code is the same import path that is used in the emitted JavaScript. What's different is that NodeJS's ESM implementation doesn't allow extensionless import paths (but its CommonJS implementation does).
Does anyone have any insight about how to coordinate this kind of change to a large project? This kind of change touches literally every file, so every branch will have merge conflicts. The best idea I can think of is to announce the date ahead of time and make every contributor rebase their branches on the day of the merge. But there has to be a better way.
> | edit [–] | on: PR that converts the TypeScript repo from namespac...
Surprisingly, at least for this PR, solving merge conflicts turns out to not be too hard. By not squash merging it, we can have a single commit that unindents the codebase all in one go (and the commit is in the tree), which means that every line has a clear path back to the current state of the main branch. (And crucially, we can make git blame not point every line to me...)
Potentially, an approach like this might be applicable to other changes; I have a commit in my stack which moves the old build system config to the new build system config's path (even though it's wrong), as git does a much better job understanding where the code is going if you help it.
Surprising they call out the 2 space indent level that esbuild is hardcoded[1] to use as a benefit. Why not save even more bytes and re-format the output to single tab indentation? I wrote a simple script to replace the indentation with tabs. 2 indent size: 29.2MB, tabbed size: 27.3MB. 2MB more of indentation saved! Not significant after compression, but parsing time over billions of starts? Definitely worth it.
[1] https://github.com/evanw/esbuild/issues/1126
To be honest, that's why I love auto formatting cause I never need to think about that stuff, I can just write code.
The real problem is that using spaces for indentation is an accessibility issue.
The solution is to use tabs for indentation, and spaces for alignment.
(if you accidentally use a tab in a file that otherwise uses spaces, you get a runtime exception, or vise versa)
For your answer search "programmers who use spaces make more money"
Spaces are simply inferior to tabs since the latter conveys the meaning of "one level of indentation" while the former does not. It's also better for accessibility and file size. There is not one single logical reason to ever use spaces for indentation, not one.
For some very fucking stupid historical reason someone in the 80s made the idiotic decision of spaces being the default in editors and people just went with it. The people earning more are doing so because those are the seniors who have given up on common sense and just go with the flow of the masses who are unable to grasp "tabs for indentation, spaces for alignment" yet insist on keeping alignment so the (terrible) compromise is just using spaces. And I strongly question whether "alignment" is worth anything, in almost all cases it's just useless and in the rest you're drawing ASCII diagrams in the comments which doesn't affect your code at all.
Also see the top answer at https://www.reddit.com/r/programming/comments/8tyg4l/why_do_...
Even on an absolutely gigantic codebase using tabs or spaces will make almost no difference to build or type-checking times. Building an AST is much more overhead than white space considerations and once it’s an AST tabs or spaces are not included in the running of the code.
> [...]
> The TypeScript package now targets ES2018. Prior to 5.0, our package targeted ES5 syntax and the ES2015 library, however, esbuild has a hard minimum syntax target of ES2015 (aka ES6). ES2018 was chosen as a balance between compatibility with older environments and access to more modern syntax and library features
I'd be curious as to what percentage of the improvement comes from modules vs comes from a different target.
From my superficial knowledge of compilers, "modularization" itself should not make code faster, if anything slower. There'll always be some overhead of loading modules and communicating between them, not?
I presume, from my own experience when building software (not compilers), that modules allow for a much easier to reason about, much better isolated (cohesion, loose coupling). And therefore for much easier improvements inside the module. I would presume that, here too, modules allowed them to improve the inner workings much better, allowing for the performance increase. Or am I completely misunderstanding this feature?
Firstly, the old codebase is TS namespaces, which compile down to IIFEs that push properties onto objects. Each file that declares that namespace is its own IIFE, and so every access to other files incurs the overhead of a property access.
With modules, tooling like esbuild, rollup, can now actually see those dependencies (now they are standard ES module imports) and optimize access to them. In this PR's case, the main boost comes from scope hoisting.
For example, in one file, we may declare the helper `isIdentifier`. In namespaces, we would write `isIdentifier` in another file, but this would at emit time turn into `ts.isIdentifier`, which is slower. Now, we import that helper, and then esbuild (or rollup) can see that exact symbol. All of the helpers get pulled to the top of the output bundle, and calls to those helpers are direct.
That's why modules gives us a boost. There's also more (modules means we can use tooling to tree shake the output, and smaller bundles are faster to load), but the hoisting is the big thing.
You are correct that initializing many modules is usually slower than initializing one module [1]. However, bundling puts all modules into one file, so this PR doesn't actually change anything here. Both before and after this PR, the TypeScript compiler will be published as a single file.
At run-time, switching to ES modules from another JavaScript module system can be a significant performance improvement because it removes the overhead of communicating between them. Other module systems (e.g. TypeScript namespaces, CommonJS modules) use dynamic property accesses to reference identifiers in other modules while ES modules use static binding to reference the identifiers in other modules directly. Dynamic property access can be a big performance penalty in a large code base. Here's an example of the performance improvement that switching to ES modules alone can bring: https://github.com/microsoft/TypeScript/issues/39247.
[1] This is almost always true. A random exception to this is that some buggy compilers have O(n^2) behavior with respect to the number of certain kinds of symbols in a scope, so having too many of those symbols in a single scope can get really slow (and thus splitting your code into separate modules may actually improve initialization time). This issue is most severe in old versions of JavaScriptCore: https://github.com/evanw/esbuild/issues/478. When bundling, esbuild deliberately modifies the code to avoid the JavaScript features that cause this behavior.
I think this is a misunderstanding of what actually happened.
TypeScript has a thing called “namespaces” and a thing called “modules”. Both provide modularization. The TS repo is not being modularized, instead, the namespaces are getting converted to modules.
Namespaces are an old-school approach to writing a module in JavaScript. You pack all of your exports into a JS object, and then access the object from somewhere else. This works, but JS is dynamic, and the runtime has no way to guarantee that you won’t mess with this object (replace functions or whatnot).
Modules don’t have this object. You just call the function, instantiate the class, or do whatever else with the names you imported. They are resolved statically, so certain optimizations become more “obvious”, like inlining.
I’m curious, how many people are using TSC only for type-checking, and a different system (eg esbuild or ts-node) to actually compile/bundle/execute their code?
Looks like my suspicion was correct; not even tsc uses tsc!
This seems to me like a great "win" for Typescript that so many tools just natively handle TS type stripping and that so much Typescript today only needs type stripping and doesn't need other parts of TS emit processes (or tslib).
I find it interesting that one of the reasons given for the reduction in package size is due to such a simple indentation change from 4 spaces to 2 spaces.
Not interesting that 2 bytes are less than 4 bytes, rather, TypeScript is a large project and it would be interesting to know how much size was saved from this one specific change? Seems like a trivial change, so why not do it sooner? And assuming readability isn't required in the bundle output why not bundle with no indentation at all and put everything on a single line, would this not be even smaller again?
Re: indentation: Literally, no one thought of it, as far as anyone can tell. Linus's law appears to have its limits.
I think it's probably correct to laugh this off though. Why would you care about the non-minified/gzipped size this much?
utils/foo.ts
you have to import it as
import Foo from "utils/foo.js"
Even though there is no .js file on disk, and you might be running ts-node or whatever that doesn't build a .js file.
Importing a file that "doesn't exist" is so counterintuitive.
In addition all code breaks because you have to change all your imports, and /index.ts or /index.js won't work either.
1) enforces no extension, e.g. “utils/foo”, or
2) allows TS extensions, e.g. “utils/foo.ts”
I have never imported a TS file using a JS extension. Maybe your woes could be fixed with a configuration change?
The Whole ESM saga is clusterfuck, not much better than python 2 -> 3 migration. Large node.js codebases have no viable path to migrate, and most tools still cannot support ESM properly[3]. Stuff is already breaking because prolific library authors are switching to ESM.
As someone that maintain large part of TS/JS tooling in my day job, I absolutely despise decisions made by node.js module team. My side projects are now in Elixir and zig because these communities care about DX.
That's because typescript developers are dead set that they don't want to transpile the imports, they just want to copy paste them into the resulting file when running tsc.
You can likely change it with a config, but do note that importing using .js will be the new standard way of doing things and by changing it through configuration you’re chosing to not follow the new standard.
It’s a complete mess IMO. Every project uses a different way to handle modules and there’s a lot of rough edges.
See https://github.com/microsoft/TypeScript/issues/37582 which is referenced in the 4.9 Iteration Plan as "Support .ts as a Module Specifier for Bundler/Loader Scenarios": https://github.com/microsoft/TypeScript/issues/50457
See https://www.typescriptlang.org/docs/handbook/esm-node.html for details about how import paths work in CommonJS vs ESM. In both cases the import path you write in your source code is the same import path that is used in the emitted JavaScript. What's different is that NodeJS's ESM implementation doesn't allow extensionless import paths (but its CommonJS implementation does).
Deleted Comment
> | edit [–] | on: PR that converts the TypeScript repo from namespac...
Potentially, an approach like this might be applicable to other changes; I have a commit in my stack which moves the old build system config to the new build system config's path (even though it's wrong), as git does a much better job understanding where the code is going if you help it.
Thank you for the kind words!