Syntax highlighting is a waste of an information channel (2020)

I guess I may be an outlier here but I don't find any of those examples motivating at all. They make it harder for me to see the structure of the code at a glance.

Also I think it's a bit off the mark to think of it as being a wasted information channel. Redundancy is a feature of human languages, because our languages are not optimizing solely for density. A bit of redundancy helps our brains pattern match more consistently, almost like a form of forward error correction. Syntax highlighting is like that, at least for me, where it makes a big difference in seeing the structure at a glance, and more overly complex coloring rules thwart that for me. Like I don't want to be trying to match up rainbow shades of parens.

jcranmer · 5 months ago

In my experience, the biggest wins in syntax highlighting come from just a few wins: make comments a different color, make strings a different color, and if you've got something like shell where strings can contain embedded variable references, pop those variable references into a different color.

One of the big problems with a lot of the examples here is that, well, I spend most of my time on multimillion line codebases. If you want to pop stuff out to me, showing it in a different color is useless because it's not on the same screen; no, the way you give it to me is a macro that takes me to the next location of the thing of interest. And with a macro that lets me move to points of interest... the use of color is entirely redundant.

skydhash · 5 months ago

I've just installed a plugin in my emacs config the other day called dumb-jump[0]. The way it works is by running a grep tool (grep, git-grep, the_silver_search (ag), ripgrep) inside the current project with some patterns according to the file type. It hooks into emacs xref (dynamic links and history).

These days, I'm using very minimal highlighting (doric-themes [1] which is basically shade of one color and font-weight). I prefer to separate semantic units of code by whitespace. Then scanning becomes quick.

[0]: https://melpa.org/#/dumb-jump

[1]: https://elpa.gnu.org/packages/doric-themes.html

teo_zero · 5 months ago

Moving to the next point of interest and highlighting all the occurrences are not mutually exclusive. In fact searches work like this in many apps.

tptacek · 5 months ago

I 80% agree with this, but I've also gotten a lot of mileage out of Emacs hi-lock mode. The simplest thing I always want: whatever symbol I'm pointing at right now, highlight everywhere (so I can trace it through the code quickly); more generally, I want to highlight an arbitrary regex, so for instance when I'm auditing a codebase for authorization, I quickly spot the functions that aren't following the same authorization patterns as all the others.

jrochkind1 · 5 months ago

Oh I read it as modes that you'd toggle for specific tasks. I don't want the default coloring of my code to be paren matching, but I do when I'm getting confused and trying to match parens.

babypuncher · 5 months ago

I like rainbow parentheses and this is already a common feature in code editors.

Every example past that was just worse for readability. I think you're right about density not being the only important metric here.

TuringTest · 5 months ago

I think one important point in the article is being unnoticed: these special highlights would not be the default for reading code, but specific tools that the developer can turn on and off for when their use case is needed.

So, not much different than a search for regular expressions or a "show definition" tooltip

James_K · 5 months ago

The rainbow braces thing is quite useful. It can save you a bit of bother counting. A lot of his other suggestions just seem like more general search things that should highlight matches the same way Ctrl+f highlights text in a web page.

vidarh · 5 months ago

Yeah, if that was my only option, I'd turn off highlighting entirely.

For the last five years I've been working on this problem!

To solve it we need to be able to describe the structured content of a document without rendering it, and that means we need an embedding language for code documents.

I hope this doesn't sound overly technical: I'm just borrowing ideas from web browsers. I think of my project as being the creation of a DOM for code documents. The DOM serves a similar function. A semantic HTML documents has meaning independent of its rendered presentation and so it can be rendered many ways.

CSTML is my novel embedding language for code. You could think of it like a safe way to hold or serialize an arbitrary parse tree. Like HTML a CSTML document has "inner text" which this case is the source text if the program the parser saw. E.g. a tiny document might be `<Boolean> 'true' </>`. The parser injects node tags into the source text, creating what is essentially the perfect data stream to feed a syntax highlighter. To do the highlighting you print the string content if the document and use the control tags to decide on color. This is actually already how we syntax highlight the output from our own CLI as it happens. We use our streaming parser technology to parse our log output into a CSTML tag stream (in real time) and then we just swap out open and close node tags for ANSI escape codes, print the strings, and send that stream to stdout.

Here's a more complicated document generated from a real parse: https://gist.github.com/conartist6/412920886d52cb3f4fdcb90e3...

tempfile · 5 months ago

I think this is a good idea, but your language seems extremely similar to XML - why not just write an XML schema? Seems like you can leverage a lot of existing tools and abstractions that way.

conartist6 · 5 months ago

SrcML is a piece of prior art that we studied that does use XML and I think it limits them. For example because literal content isn't quoted in XML all content inside tags is inner text. So they can't pretty print their documents, because indentation added to the embedding document would become indentation in the embedded document instead. Oops! We also support named references and named namespaces, which XML does not.

Noumenon72 · 5 months ago

How many of the ideas he proposes would this support? For example, classifying something as a <Keyword> lets you highlight it in the traditional way, but doesn't do much for "highlight different levels of nesting" or "highlight if imported from a different file". Seems like the parallel to HTML means CSTML mostly supports different rendering like screen reading or styling.

conartist6 · 5 months ago

Yeah I would say we support all of them, but right now the support is low-level rather than high-level. We're not stopping you but we're not (yet) making it trivially easy either.

Technically right now BABLR turns text into parse trees but it doesn't render the trees, so it doesn't have any firsthand concept of styling. If you print the content of a CSTML document to the terminal, you'll have to style it with ANSI codes. If you want to print the document to a web page, you'll have to style it with CSS. Right now we leave that part as an exercise to the user. The tree has the data needed to achieve any of the results you suggest, and as time goes on we will do better at providing higher level APIs that make it really easy to implement those kinds of code-semantic styling rules

jcgl · 5 months ago

How does this approach cohere/compete/disagree with the treesitter ecosystem?

conartist6 · 5 months ago

Yeah it kinda does all three. We think Tree-sitter could adopt CSTML as a way of communicating its parse results with relative ease.

We also think that at some point in the future we could run Tree-sitter grammars without first compiling them from JS to C or wasm.

Our major innovations over Tree-sitter are scripted grammars (no compile step), streaming parsing, and the idea that we are a standalone complete source of truth for an IDE, where Tree-sitter only wants to be half the story: it expects to sync with a text buffer where the text buffer is the source of truth.

tschumacher · 5 months ago

My coworker recently showed me this plugin [1] that fades out all Rust code that is unrelated to the variable under the cursor. Think of it as a more powerful version of the "click to highlight all appearances" you can do in most IDEs but it actually does information flow analysis on the code.

[1]: https://github.com/willcrichton/flowistry

billyjmc · 5 months ago

Will Chrichton gave a talk at Jane Street about his research that led to the development of this plugin. It’s a pretty good talk:

https://www.janestreet.com/tech-talks/rust-for-everyone/

As I recall, he also explains how Rust is uniquely positioned to enable this kind of syntax formatting.

ivape · 5 months ago

Need this for every language …

kccqzy · 5 months ago

It can't. It uses ownership types in Rust to do the analysis. It can be fooled by Rust's interior mutability. Other languages don't have a type system that enables this.

yen223 · 5 months ago

Oh wow, this is actually useful

kazinator · 5 months ago

And, so, does syntax highlighting interfere in that? No.

The information channel is shareable.

CGamesPlay · 5 months ago

That’s basically the first paragraph of the article.

wakawaka28 · 5 months ago

This sounds good but if you need it, you probably can and should refactor the code.

mcphage · 5 months ago

You know what would be great for refactoring the code? Some way to easily see what parts reference the variable you’re trying to refactor.

thomascountz · 5 months ago

I like using syntax highlighting when it breaks. For example, if all code below a particular line is a single color, then I probably forgot an end-quote or something. But this has become less uniquely useful due to the broader integration of parser-driven linters (e.g. tree-sitter), which—besides being able to drive highlighting—can explicitly deliver inline hints or parsing errors.

All that said, I'm one who appreciates information density! How about coloring branching code paths/call stacks?

My keyboard has a concept of "layers," which allows each key to map differently depending on the layer. I've seen this used to make a numpad or to have a QWERTY and DVORAK layer. What if highlighting was the same? Instead of competing for priority over the color channels, developers could explicitly swap layers?

AceJohnny2 · 5 months ago

Great point! Similarly, I sometimes use Emacs' excellent (and near-unique) electric-indent as a hint of syntax brokenness. "What do you mean this is getting indented at that lev-- oooh"

The downside with broken syntax highlighting (and electric-indent!) is when the editor's parser is insufficient, as is often the case with basic online editors, and breaks with legitimate constructs (Emacs with certain C macros). Then I can't trust the highlighter and also I have less-legible code.

gpderetta · 5 months ago

I got so used to electric-indent and the immediate feedback it gives, that for a very long time it prevented me from even considering any other editors.

These days I rely on clangd driven autoindent (which is fast enough to do every line), but I still use emacs because it is so easy to tweak the interaction to clangd to work exactly as I prefer.

btreecat · 5 months ago

> All that said, I'm one who appreciates information density! How about coloring branching code paths/call stacks? > > My keyboard has a concept of "layers," which allows each key to map differently depending on the layer. I've seen this used to make a numpad or to have a QWERTY and DVORAK layer. What if highlighting was the same? Instead of competing for priority over the color channels, developers could explicitly swap layers?

I was thinking about coloring logic /scope blocks as a way to help visualize scope and flow, even if it required static analysis and a simple script it could be useful when I need to debug

jasonwatkinspdx · 5 months ago

hyperrail · 5 months ago

Font sizes and families are also a relatively little-used way to visually distinguish things in a code editor.

Years ago I used to work on C++ code in a commercial editor called Source Insight [1]. At its default settings, it would do things like:

1. Show function and class names in HUGE fonts in declarations and definitions, so you always knew what was a declaration as opposed to a use

2. Show nested parentheses with the outermost parens being biggest and getting smaller as you got further in

3. Show comments in a proportional sans-serif font instead of a monospaced one so that you could tell where the comments were even if you have color blindness

Those features, along with having a C++ parser and code relationship visualizer much faster than the Visual Studio of the day without having to parse ahead of time (a la ctags), made Source Insight a near standard in my company. I still miss it on occasion.

[1] https://www.sourceinsight.com/feature-details/

kstenerud · 5 months ago

You could also use color intensity (faded or bolded or even fully desaturated), animated shrinking/growing font sizes as your cursor moves between code blocks to emphasize the important vs less important data, background coloring, colored box outlines surrounding related code, etc.

So many ways to focus attention and highlight related areas, but so few IDEs that do anything about it...

int_19h · 5 months ago

I remember Source Insight. It was one of those things that some people loved and others hated.

Re: comments in proportional font; while it's an interesting way to highlight them, the problem is that then precludes you from using ASCII art to diagram things in comments (or from reading such diagrams in existing codebases where they exist).

aragonite · 5 months ago

Still possible in VSCode through somewhat hackish methods (esp. arbitrary CSS injection via createTextEditorDecorationType). Here are some quick screenshots of random JS/Rust examples in my installation: https://imgur.com/a/LUZN5bl

fouronnes3 · 5 months ago

This is one of those things where both extremes of madness and genius wrap around to infinity and meet again.

Honestly, it looks like a ransom request letter! :D

I have independently discovered the benefit of (3), having comments (prose) in a proportional font. I think I'll also enjoy (2) too. I'll see how to configure it for my editor.

stevage · 5 months ago

I don't know if it's just because the article is 5 years old, but many of those things do exist in my editor (vs code) with the language extensions and listing I use. Rainbow parentheses, long lines, variables not assigned to, etc etc.

The colour channel is being well and truly used to close to it's maximum.

jryb · 5 months ago

Many (most?) of these are achievable with neovim and tree-sitter (a plugin that gives you access to the AST), and surely many other editors. I have plugins installed right now that do several of the things that are mocked up here. Many more are done with virtual text and not color, but I don’t see why you couldn’t use highlighting instead.

I agree with the broader point of the article that color is underused, but the state of the art has moved way past what the author’s tools are currently configured to provide.

jasonjmcghee · 5 months ago

To be fair, the article is over 5 years old.

The author seemed to be unfamiliar with tree-sitter (first appeared in 2018) and incorrectly assumed Atom used TextMate.

Since then it's gotten much more popular and adopted by other editors.

Good catch, thanks!

It's some time since I've worked with (and contributed to) tree-sitter, but it could exclusively detect syntax, not semantic, back then. For example it could not tell variables from types in C, nor already-defined variables from new/mistyped ones.

Have I missed an important development in tree-sitter? Can it now do such things?

themafia · 5 months ago

You have two color channels. Foreground and background. If you use emacs then many of the suggested alternative modes are available if you point at an item and the background color will be used to show them.

For example, in a loop body, putting the cursor on a "continue" statement will highlight all other loop control statements along with the "for" statement the pointed at "continue" is associated with. This helps massively.

Shame the author missed this.