It's because the points on that specific dot are defined the opposite direction clock-wise compared to the rest of the curves on that font. So the renderer interprets them as an inner contour and not an outer one.
As far as I could tell, orientation is defined as either TrueType (outer contours are clockwise) or as Postscript (outer contours are counter-clockwise). If I find a Postscript-style orientation I just flip them so the renderer sees a consistent input. But for some reason that dot either is avoiding the flip through FreeType or is defined the other way in the source font file itself.
A lot of renderers are agnostic to the orientation of the curves, but that can lead to its own issues and undesired filled/unfilled areas. So it's pick your poison here. IIRC Sebastian Lague also has to deal with it in his amazing video about rendering text.
I might go check the source font file and see what's up. This is Libre Baskerville which is Open-Source so maybe we can even fix it if it's an inconsistency at that level.
Subpixel font rendering is critical for readability but, as the author points out, it's a tragedy that we can't get pixel layout specs from the existing display standards.
Only on standard resolution displays. And it's not even "critical" then, it's just a nice-to-have.
But the world has increasingly moved to Retina-type displays, and there's very little reason for subpixel rendering there.
Plus it just has so many headaches, like screenshots get tied to one subpixel layout, you can't scale bitmaps, etc.
It was a temporary innovation for the LCD era between CRT and Retina, but at this point it's backwards-looking. There's a good reason Apple removed it from macOS years ago.
Even on standard resolution displays with standard subpixel layout, I see color fringing with subpixel rendering. I don't actually have hidpi displays anywhere but my phone, but I still don't want subpixel text rendering. People act like it's a panacea, but honestly the history of how we ended up with it is pretty specific and kind of weird.
> the world has increasingly moved to Retina-type displays
Not my world. Even the display hooked up to the crispy work MacBook is still 1080p (which looks really funky on macOS for some reason).
Even in tech circles, almost everyone I know still has a 1080p laptop. Maybe some funky 1200p resolution to make the screen a bit bigger, but the world is not as retina as you may think it is.
For some reason, there's actually quite a price jump from 1080p to 4k unless you're buying a television. I know the panels are more expensive, but I doubt the manufacturer is indeed paying twice the price for them.
Because apple controls all their hardware and can assume that everyone has a particulr set of features and not care about those without. The rest of the industry doesn't have that luxury.
> like screenshots get tied to one subpixel layout
we could do with a better image format for screenshots - something that preserves vectors and text instead of rasterizing. HDR screenshots on Windows are busted for similar reasons.
It looks like the DisplayID standard (the modern successor to EDID) is at least intended to allow for this, per https://en.wikipedia.org/wiki/DisplayID#0x0C_Display_device_... . Do display manufacturers not implement this? Either way, it's information that could be easily derived and stored in a hardware-info database, at least for the most common display models.
I don't think any OS exposes an API for this. There's a Linux tool I sometimes use to control the brightness of my screen that works by basically talking directly to the hardware over the GPU.
Unfortunately, EDID isn't always reliable, either: you need to know the screen's orientation as well or rotated screens are going to look awful. You're probably going to need administrator access on computers to even access the hardware to get the necessary data, which can also be a problem for security and ease-of-use reasons.
Plus, some vendors just seem to lie in the EDID. Like with other information tables (ACPI comes to mind), it looks almost like they just copy the config from another product and adjust whatever metadata they remember to update before shipping.
I don't understand why, this has been a thing for decades :(. The article is excellent and links to this "subpixel zoo" highlighting the variety: https://geometrian.com/resources/subpixelzoo/
“Tragedy” is a bit overstating it. Each OS could provide the equivalent of Window’s former ClearType tuner for that purpose, and remember the results per screen or monitor model. You’d also want that in the inevitable case where monitors report the wrong layout.
Subpixel rendering isn't necessary in most languages. Bitmap fonts or hinted vector fonts without antialiasing give excellent readability. Only if the language uses characters with very intricate details such as Chinese or Japanese is subpixel rendering important.
GTK4 moved rendering to GPU and gave up on RGB subpixel rendering. I've heard, that this GPU-centric decision made it impractical to continue with RGB subpixel rendering. The article shows it is possible. So perhaps, the reason for GTK was another one or the presented solution would have disadvantages or just not integrate in the stack...
This looks great. I have some interest in WGPU (Rust's WebGPU implementation), and your tutorial here appears to be an advance course on it--thought it doesn't advertise itself as such. I've translated JavaScript examples to Rust before, and it's ideal for learning, because I can't just copy/paste code, but the APIs are close enough that it's easy to port the code and it gives you an excuse to get used to using the WGPU docs.
Looks like a repurposed VitePress docs template, which is a perfectly fine solution for text-heavy content. The site appears to be open-source, there are links to the repo at the bottom of each page: https://github.com/xiaoiver/infinite-canvas-tutorial
They describe a fair amount of their algorithm directly on their website. Do they have patents for it? It would be fun to make an open source wgpu version, maybe using some stuff from cosmic-text for font parsing and layout. But if at the end of that I'd get sued by Slug, that would be no fun.
I also created glyphon (https://github.com/grovesNL/glyphon) which renders 2D text using wgpu and cosmic-text. It uses a dynamic glyph texture atlas, which works fine in practice for most 2D use cases (I use it in production).
I still don't understand why we need text rendered offline and stored in an atlas alongside tricks like SDFs, when GPUs have like infinite vertex/pixel drawing capabilities.. Even the article mentions writing glyph curves to an atlas. Why can't the shaders render text directly? There has to be a way to convert bezier to triangle meshes. I'm about to embark on a GPU text renderer for a CAD app and I hope to figure out why soon.
It's simply less expensive in most cases to cache the results of rendering when you render the same glyph over and over. GPUs are fast but not infinitely fast, and they are extremely good at sampling from prerendered textures.
Also it's not just about speed, but power consumption. Once you are fast enough to hit the monitor frame rate then further performance improvements won't improve responsiveness, but you may notice your battery lasting longer. So there's no such thing as "fast enough" when it comes to rendering. You can always benefit from going faster.
> Once you are fast enough to hit the monitor frame rate then further performance improvements won't improve responsiveness
This is not true, if your rendering is faster then you can delay the start of rendering and processing of input to be closer to the frame display time thus reducing input latency.
The triangle density of even a basic font is crazy high at typical display sizes. All modern GPU architectures are very bad at handling high density geometry. It's very inefficient to just blast triangles at the GPU for these cases compared to using an atlas or some other scheme.
Most GPUs dispatch pixel shaders in groups of 4. If all your triangles are only 1 pixel big then you end up with 3 of those shader threads not contributing to the output visually. It's called 'quad overdraw'. You also spend a lot of time processing vertices for no real reason too.
GPUs don't have infinite vertex/pixel drawing capabilities. Rendering text directly is simply more expensive. Yes, you can do it, but you'll be giving up a portion of your frame budget and increasing power usage for no real benefit.
To expand on this, GPUs cannot rasterize text directly because they only work with triangles. You need to either implement the rasterization in shaders or convert the smooth curves in the font into enough triangles that the result doesn't look different (number of triangles required increases with font pixel size).
Triangles are the wrong choice, but otherwise you make a good point. This guy uses an atlas because he renders fonts by super sampling bezier curves using up to 512 samples per pixel, which is very expensive. However, you could e.g. compute the integral of the intersection of the bezier curve area with the subpixel area much faster, which I think could run in real time without a need for an atlas and would be more accurate than supersampling.
GPUs are very fast, but not quite infinite. If you spend your GPU time on text, you can't spend it on something else. And almost always you would like to spend it on something else.
Also the more GPU time you require, the faster the minimum required hardware needs to be. Text is cool and important, but maybe not important enough to lose users or customers.
Adaptive distance fields was an interesting older technology used in the Saffron font rendering library. Still very viable and ripe for an open source implementation. ADF was the term coined but never really caught on. It was all locked down by patents but most of them have expired.
GPU rasterizers don't do sub-pixel rendering. This is OK for most 3D geometry but for small text you want to take advantage of any additional resolution you can squeeze out.
On the other hand, if you are rendering to an atlas anyway then you don't really need to bother with a GPU implementation for that an can just use an existing software font rasterizer like FreeType to generate that atlas for you.
> One of those new OLEDs that look so nice, but that have fringing issues because of their non-standard subpixel structure
From what I understood, it's even worse. Not just non standard, but multiple incompatible subpixel layouts that OLEDs have. That's the reason freetype didn't implement subpixel rendering for OLEDs and it's a reason to avoid OLEDs when you need to work with text. But it's also not limited to freetype, a lot of things like GUI toolkits (Qt, GTK. etc.) need to play along too.
Not really sure if there is any progress on solving this.
> I really wish that having access to arbitrary subpixel structures of monitors was possible, perhaps given via the common display protocols.
Yeah, this is a good point. May be this should be communicated in EDIDs.
There are oleds with somewhat standard subpixel layouts. E.g. my laptop has a vertical(!) BGR layout that FreeType and KDE support just fine.
I think the weird layouts are mostly due to needing different sizes for the different colors in HDR displays in order to not burn out one color (blue) too fast.
May be, but I've seen bug reports with a bunch of layouts and nothing looks standard there. Steam Deck OLED is such example, Lenovo laptops, LG UltreaGear OLEDs and etc. I don't really see any commonality.
Very impressive work. For those who aren't familiar with this field, Valve invented SDF text rendering for their games. They published a groundbreaking paper on the subject in 2007. It remains a very popular technique in video games with few changes.
In 2012, Behdad Esfahbod wrote Glyphy, an implementation of SDF that runs on the GPU using OpenGL ES. It has been widely admired for its performance and enabling new capabilities like rapidly transforming text. However it has not been widely used.
Modern operating systems and web browsers do not use either of these techniques, preferring to rely on 1990s-style Truetype rasterization. This is a lightweight and effective approach but it lacks many capabilities. It can't do subpixel alignment or arbitrary subpixel layout, as demonstrated in the article. Zooming carries a heavy performance penalty and more complex transforms like skew, rotation, or 3d transforms can't be done in the text rendering engine. If you must have rotated or transformed text you are stuck resampling bitmaps, which looks terrible as it destroys all the small features that make text legible.
Why the lack of advancement? Maybe it's just too much work and too much risk for too little gain. Can you imagine rewriting a modern web browser engine to use GPU-accelerated text rendering? It would be a daunting task. Rendering glyphs is one thing but how about handling line breaking? Seems like it would require a lot of communication between CPU and GPU, which is slow, and deep integration between the software and the GPU, which is difficult.
> Can you imagine rewriting a modern web browser engine to use GPU-accelerated text rendering? […] Rendering glyphs is one thing but how about handling line breaking?
I’m not sure why you’re saying this: text shaping and layout (including line breaking) are almost completely unrelated to rendering.
> Can you imagine rewriting a modern web browser engine to use GPU-accelerated text rendering?
https://github.com/servo/pathfinder uses GPU compute shaders to do this, which has way better performance than trying to fit this task into the hardware 3D rendering pipeline (the SDF approach).
Just for the record, text rendering - including with subpixel antialiasing - has been GPU accelerated on Windows for ages and in Chrome/Firefox for ages. Probably Safari too but I can't testify to that personally.
The idea that the state of the art or what's being shipped to customers haven't advanced is false.
SDF works by encoding a localized _D_istance from a given pixel to the edge of character as a _F_ield, i.e. a 2d array of data, using a _S_ign bit to indicate whether that distance is inside or outside of the character. Each character has its own little map of data that gets packed together into an image file of some GPU-friendly type (generically called a "map" when it does not represent an image meant for human consumption), along with a descriptor file of where to find the sub-image of each character in that image, to work with the SDF rendering shader.
This definition of a character turns out to be very robust against linear interpolation between field values, enabling near-perfect zoom capability for relatively low resolution maps. And GPUs are pretty good at interpolating pixel values in a map.
But most significantly, those maps have to be pre-processed during development from existing font systems for every character you care to render. Every. Character. Your. Font. Supports. It's significantly less data than rendering every character at high resolution to a bitmap font. But, it's also significantly more data than the font contour definition itself.
Anything that wants to support all the potential text of the world--like an OS or a browser--cannot use SDF as the text rendering system because it would require the SDF maps for the entire Unicode character set. That would be far too large for consumption. It really only works for games because games can (generally) get away with not being localized very well, not displaying completely arbitrary text, etc.
The original SDF also cannot support Emoji, because it only encodes distance to the edges of a glyph and not anything about color inside the glyph. Though there are enhancements to the algorithm to support multiple colors (Multichannel SDF), the total number of colors is limited.
Indeed, if you look closely at games that A) utilize SDF for in-game text and B) have chat systems in which global communities interact, you'll very likely see differences in the text rendering for the in-game text and the chat system.
If I understand correctly, the authors approach doesn't really have this problem since they only upload the glyphs being used to the GPU (at runtime). Yes you still have to pre-compute them for your font, but that should be fine.
Why not prepare SDFs on-demand, as the text comes in? Realistically, even for CJK fonts you only need a couple thousand characters. Ditto for languages with complex characters.
“If an element is promoted to the GPU in current versions of Chrome, Safari or Opera then you lose subpixel antialiasing and text is rendered using the greyscale method”
So, what’s missing? Given that comment, at least part of the step from UTF-8 string to bitmap can be done on the GPU, can’t it?
The issue is not subpixel rendering per se (at least if you're willing to go with the GPU compute shader approach, for a pixel-perfect result), it's just that you lose the complex software hinting that TrueType and OpenType fonts have. But then the whole point of rendering fonts on the GPU is to support smooth animation, whereas a software-hinted font is statically "snapped" to the pixel/subpixel grid. The two use cases are inherently incompatible.
> complex transforms like skew, rotation, or 3d transforms can't be done
Good. My text document viewer only needs to render text in straight lines left to right. I assume right to left is almost as easy. Do the Chinese still want top to bottom?
Deleted Comment
What happened to the dot of the italic "j" in the first video?
That was very perceptive :D
It's because the points on that specific dot are defined the opposite direction clock-wise compared to the rest of the curves on that font. So the renderer interprets them as an inner contour and not an outer one.
As far as I could tell, orientation is defined as either TrueType (outer contours are clockwise) or as Postscript (outer contours are counter-clockwise). If I find a Postscript-style orientation I just flip them so the renderer sees a consistent input. But for some reason that dot either is avoiding the flip through FreeType or is defined the other way in the source font file itself.
A lot of renderers are agnostic to the orientation of the curves, but that can lead to its own issues and undesired filled/unfilled areas. So it's pick your poison here. IIRC Sebastian Lague also has to deal with it in his amazing video about rendering text.
I might go check the source font file and see what's up. This is Libre Baskerville which is Open-Source so maybe we can even fix it if it's an inconsistency at that level.
Good catch!
But the world has increasingly moved to Retina-type displays, and there's very little reason for subpixel rendering there.
Plus it just has so many headaches, like screenshots get tied to one subpixel layout, you can't scale bitmaps, etc.
It was a temporary innovation for the LCD era between CRT and Retina, but at this point it's backwards-looking. There's a good reason Apple removed it from macOS years ago.
Not my world. Even the display hooked up to the crispy work MacBook is still 1080p (which looks really funky on macOS for some reason).
Even in tech circles, almost everyone I know still has a 1080p laptop. Maybe some funky 1200p resolution to make the screen a bit bigger, but the world is not as retina as you may think it is.
For some reason, there's actually quite a price jump from 1080p to 4k unless you're buying a television. I know the panels are more expensive, but I doubt the manufacturer is indeed paying twice the price for them.
we could do with a better image format for screenshots - something that preserves vectors and text instead of rasterizing. HDR screenshots on Windows are busted for similar reasons.
Unfortunately, EDID isn't always reliable, either: you need to know the screen's orientation as well or rotated screens are going to look awful. You're probably going to need administrator access on computers to even access the hardware to get the necessary data, which can also be a problem for security and ease-of-use reasons.
Plus, some vendors just seem to lie in the EDID. Like with other information tables (ACPI comes to mind), it looks almost like they just copy the config from another product and adjust whatever metadata they remember to update before shipping.
Can you tell me more about it? I love making tutorials about GPU stuff and I would love to structure them like yours.
Is it an existing template? Is it part of some sort of course?
[1]: https://sluglibrary.com/
I also created glyphon (https://github.com/grovesNL/glyphon) which renders 2D text using wgpu and cosmic-text. It uses a dynamic glyph texture atlas, which works fine in practice for most 2D use cases (I use it in production).
Also it's not just about speed, but power consumption. Once you are fast enough to hit the monitor frame rate then further performance improvements won't improve responsiveness, but you may notice your battery lasting longer. So there's no such thing as "fast enough" when it comes to rendering. You can always benefit from going faster.
This is not true, if your rendering is faster then you can delay the start of rendering and processing of input to be closer to the frame display time thus reducing input latency.
Of course for e.g. games that breaks if the font size changes, letters rotate and/or become skewed etc.
Most GPUs dispatch pixel shaders in groups of 4. If all your triangles are only 1 pixel big then you end up with 3 of those shader threads not contributing to the output visually. It's called 'quad overdraw'. You also spend a lot of time processing vertices for no real reason too.
Deleted Comment
https://sluglibrary.com/ implements Dynamic GPU Font Rendering and Advanced Text Layout
On the other hand, if you are rendering to an atlas anyway then you don't really need to bother with a GPU implementation for that an can just use an existing software font rasterizer like FreeType to generate that atlas for you.
From what I understood, it's even worse. Not just non standard, but multiple incompatible subpixel layouts that OLEDs have. That's the reason freetype didn't implement subpixel rendering for OLEDs and it's a reason to avoid OLEDs when you need to work with text. But it's also not limited to freetype, a lot of things like GUI toolkits (Qt, GTK. etc.) need to play along too.
Not really sure if there is any progress on solving this.
> I really wish that having access to arbitrary subpixel structures of monitors was possible, perhaps given via the common display protocols.
Yeah, this is a good point. May be this should be communicated in EDIDs.
I think the weird layouts are mostly due to needing different sizes for the different colors in HDR displays in order to not burn out one color (blue) too fast.
* https://bugs.kde.org/show_bug.cgi?id=472340
* https://gitlab.freedesktop.org/freetype/freetype/-/issues/11...
In 2012, Behdad Esfahbod wrote Glyphy, an implementation of SDF that runs on the GPU using OpenGL ES. It has been widely admired for its performance and enabling new capabilities like rapidly transforming text. However it has not been widely used.
Modern operating systems and web browsers do not use either of these techniques, preferring to rely on 1990s-style Truetype rasterization. This is a lightweight and effective approach but it lacks many capabilities. It can't do subpixel alignment or arbitrary subpixel layout, as demonstrated in the article. Zooming carries a heavy performance penalty and more complex transforms like skew, rotation, or 3d transforms can't be done in the text rendering engine. If you must have rotated or transformed text you are stuck resampling bitmaps, which looks terrible as it destroys all the small features that make text legible.
Why the lack of advancement? Maybe it's just too much work and too much risk for too little gain. Can you imagine rewriting a modern web browser engine to use GPU-accelerated text rendering? It would be a daunting task. Rendering glyphs is one thing but how about handling line breaking? Seems like it would require a lot of communication between CPU and GPU, which is slow, and deep integration between the software and the GPU, which is difficult.
I’m not sure why you’re saying this: text shaping and layout (including line breaking) are almost completely unrelated to rendering.
https://github.com/servo/pathfinder uses GPU compute shaders to do this, which has way better performance than trying to fit this task into the hardware 3D rendering pipeline (the SDF approach).
The idea that the state of the art or what's being shipped to customers haven't advanced is false.
SDF works by encoding a localized _D_istance from a given pixel to the edge of character as a _F_ield, i.e. a 2d array of data, using a _S_ign bit to indicate whether that distance is inside or outside of the character. Each character has its own little map of data that gets packed together into an image file of some GPU-friendly type (generically called a "map" when it does not represent an image meant for human consumption), along with a descriptor file of where to find the sub-image of each character in that image, to work with the SDF rendering shader.
This definition of a character turns out to be very robust against linear interpolation between field values, enabling near-perfect zoom capability for relatively low resolution maps. And GPUs are pretty good at interpolating pixel values in a map.
But most significantly, those maps have to be pre-processed during development from existing font systems for every character you care to render. Every. Character. Your. Font. Supports. It's significantly less data than rendering every character at high resolution to a bitmap font. But, it's also significantly more data than the font contour definition itself.
Anything that wants to support all the potential text of the world--like an OS or a browser--cannot use SDF as the text rendering system because it would require the SDF maps for the entire Unicode character set. That would be far too large for consumption. It really only works for games because games can (generally) get away with not being localized very well, not displaying completely arbitrary text, etc.
The original SDF also cannot support Emoji, because it only encodes distance to the edges of a glyph and not anything about color inside the glyph. Though there are enhancements to the algorithm to support multiple colors (Multichannel SDF), the total number of colors is limited.
Indeed, if you look closely at games that A) utilize SDF for in-game text and B) have chat systems in which global communities interact, you'll very likely see differences in the text rendering for the in-game text and the chat system.
It is tricky, but I thought they already (partly) do that. https://keithclark.co.uk/articles/gpu-text-rendering-in-webk... (2014):
“If an element is promoted to the GPU in current versions of Chrome, Safari or Opera then you lose subpixel antialiasing and text is rendered using the greyscale method”
So, what’s missing? Given that comment, at least part of the step from UTF-8 string to bitmap can be done on the GPU, can’t it?
Deleted Comment
Deleted Comment
Good. My text document viewer only needs to render text in straight lines left to right. I assume right to left is almost as easy. Do the Chinese still want top to bottom?
Yes, inconceivable that somebody might ever want to render text in anything but a "text document viewer"!