I wish there were even more of this sort of thing, even though I'm an anti-braces zealot - it's crazy to me that, here in 2024, code is still so tightly linked to specific textual representations. Maybe there's an alternate reality where IDEs and development tooling got better and better post-smalltalk/lisp, instead of ... whatever we have now. Maybe we'd have editors and viewers where you could configure the syntax however you wanted. And sure, it would be persisted to disk in some canonical representation, maybe text, maybe a DB ... but you'd never need to worry about it, because all your tooling (VCS, diffs, refactoring tools, etc) would work with the AST, and you'd never need to worry about tabs or spaces ever again.
I think having a single syntax is the best way. I don't want to have to look at python and have to parse spaces, curly braces, square braces, etc. It all being standard is really helpful. A standard Autoformatter like black or gofmt even though I might might not choose all the options, the uniformity is super valuable.
I agree. To me ESlint in the JavaScript domain is an unsung hero.
Code is so incredibly hard to mentally grasp and every mental overload should be omitted to reflect on the logic.
There is a reason why there is one code basis and this should always be curated by a linter to uniformly enforce a standard.
There is still plenty of room for style and code organization.
I witnessed first hand many trench wars around seemingly small things like curly brackets in IF statements dealing with the question of one white space or none, because it appealed to personal preferences and before ESlint people would go to great length reformatting hundreds of LoCs just to get their right feeling of code syntax.
Weird. And git -diff was massive, as well as the code reviews.
Your justification for a single syntax is "I don't want to have to look at python and have to parse spaces, curly braces, square braces, etc." But the comment you just replied to said:
>Maybe we'd have editors and viewers where you could configure the syntax however you wanted
I took this to mean that, in this fantasy universe, you could make any source file look however you want. Like tabs vs spaces and pure html vs html-with-css, this is about separating meaning from presentation. Is there a good reason to force the same visual representation on everyone?
This may be controversial, but I find 2-character indented Javascript, coming from my 4-character indented Java, for the projects I work on, as much of a shock as moving between braces and Python. Particularly if it has been subjected to an opinionated formatter that just doesn't match my personal style. Completely unreadable, I mumble under my breath. And then my eyes adjust to it for a bit, and after an... hour? (maybe) of coding in the (ahum) "hostile" style, it's familiar (again) and we're on our way.
+1 for ruff for python as a formatter & linter that allows single quotes without shaming you.
I’m all for keeping consistent flat utf-8 files. I’d hate for my code ultra simpleminded, possible to pen test with a pen and paper, python and sql code to be wrapped in a god awful xml or json or proprietary db markup and object hierarchical model of what code should look like.
Like I imagine trying to check in a jupyter notebook but worse.
For instance tabs vs spaces was decided and text editors accommodated this and despite what may be someone’s personal preference a uniform decision was made.
Humans can learn. We should use accessible formats and push for standards and keep those readable.
I like to draw the comparison that in an alternate universe where the ASCII .txt file format had simply included a color byte with every letter byte, we'd now be having coding format guidelines that also go into great detail prescribing manual syntax highlighting. The fact we don't and avoided endless bike shedding over syntax highlighting, instead leaving it to automatic tools and the presentation layer not the code itself, is purely an historical accident. Well, what does that say about other code formatting details which programmers currently do waste time manually doing and worrying about?
I had never considered the bikeshedding that would occur if there was a color byte… We struggle with spaces versus tabs, how many spaces, braces (or not), semi-colon (or not).
It would definitely be the norm that there were languages that were dark or light mode, and both sides would be convinced they were right.
I had never considered that, but you're absolutely right. I would totally be the sort of person to manually idiosyncratically syntax highlight my code if it were feasible. I'm sure glad I don't feel compelled to do that...
Grouping code by a single blank line can make a big difference. This is a bit of a silly example, but there's tons of not-so-silly examples. You can't really represent that in an AST.
Line length is another. infinite line length doesn't work as screens aren't infinitely long. Automatic wrapping doesn't work because you want to break at specific points. An example is something like:
Grouping code … You can't really represent that in an AST.
<visual-group>…</visual-group>
Also, I wish that at least in current editors there would be a way to render \n\n as a half-height line. Full-height empty lines are too bold.
Automatic wrapping doesn't work because you want to break at specific points
You really want to have a set of buttons that switch between:
- a line of arguments
- a block of arguments
- a line/block of only non-default valued arguments
- arguments sorted by name
- …
And a hint on a default representation, with some default heuristics.
How does it know that x, y, width, and height are semantically grouped and best put on the same line? It doesn't. (from the other comment)
Look higher, parameters can be grouped at the declaration level. <params><related-params name=“coords”>…</>…</>. Now you can render a nice frame around these in block mode. Or not, depending on local renderer settings.
Everyone will hate it.
There’s always a way to make everyone hate something, especially if the solution is clueless about its problem. It doesn’t mean it should be done this way. Experiment and evolution could make it work, we just have to let people try instead of dismissing it so confidently.
1. Breaking at specific points is something that can be specified by the pretty-printer of the _viewer_ you are using. Think of existing auto-formatters and imagine that they're working over the view instead of the persisted form.
2. The AST can have pointers into advisory data (or the other way around, if desired, the "program data" can include the AST, but also other things as well) to note that there is an anonymous region here (C# and friends already have conventions for this _for the source code_ that Visual Studios understands - look at `#region` comments). This would let viewers choose their preferred representation for a region.
I've long dreamt of a system that compiles to native code, but stores a compressed SSA form (similar to SafeTSA or LLVM bitcode) in the binary for efficient runtime re-optimization based on profiling, somewhat similar to current Android Runtimes. One could then have several levels of debugging symbols, one that gives names to local variables represented by CFG nodes, and another that adds a compressed diff between some standardized decompiler output and the original source.
You could then decompile to some alternative syntax, but you'd lose any idiosyncratic formatting represented by the compressed diff.
Last time I looked, [Unison code] -> [entry in the AST DB] was a one-way process. Adding a function means writing it (with whatever style you like) and seeing if what you wrote constitutes a new function or an existing one. You can't fluff db entries back up in to human friendly code.
I don't see why it couldn't be done though, I think it just hasn't been a priority. Heck, you could have 100 different users collaborating in 100 different "languages", and so long as they serialized to the same AST and back, none of them would ever have to see the atrocious syntax which the other users prefer. Their editors and browsers could just render everything according to their users' preferences.
I can understand braces (to an extent...). What really confuses me are new languages that still require semicolons at the end of expressions/statements.
Speaking from a Rust-perspective, having semicolons at the end of statements makes perfect sense and is a brilliant design decision.
Note that I said 'statements', not 'expressions'.
A lot of the confusion here (and maybe yours, too) stems from this difference. In Rust, (almost) everything is an expression by default, and you turn it into a statement by adding a semicolon. This allows you (and the type checker) to very neatly distinguish between expressions and statements, which is great. It's a very nice and elegant approach imo.
What confuses me is languages that can split statements over multiple lines as long as certain conditions are met (such as the break occours inside braces). I rather have a semicolon at the end of the statement to make it more explicit.
I like semicolons, because it lets me use the brackets style I prefer (allman) instead of whatever the language devs think I should use. This is a really big issue for me trying to use Go, as they automatically insert a semicolon to the end of every line that doesn't end with a {, so the language forces you to use K&R, which I really dislike reading
In general, what helps parsers for disambiguation also tends to help human readers for disambiguation. In addition, some amount of redundancy helps to prevent errors when (for example) two statements were intended but they are parsed as one, or vice versa. Furthermore, consistency helps avoid errors, such as always ending a statement with a semicolon and not only when it would otherwise be ambiguous.
Non-human-readable primary representation may be not the best idea though. Would create a lot of friction when trying to process the data. Text is largely simple and obvious (yes I know there are complications but in 99% of cases you can ignore them without too much trouble) but DBs and ASTs are not. People read texts, but they can't read ASTs, at least not without assistance. So it'd be hard to use.
> Non-human-readable primary representation may be not the best idea though.
What if the AST is persisted as S-expressions, but then you have a different syntax to edit it? Algol-ish, Pascal-ish, C-ish, Python-ish: choose your poison (or even support multiple poisons and let the developer pick the one they prefer?)
This was actually the original plan with Lisp. Lisp was originally supposed to have two syntaxes, S-expressions and M-expressions, with M-expressions being Algol-like. However, the implementation of M-expressions was delayed, and people got so used to using S-expressions directly, they decided M-expressions were unnecessary and they were never implemented in mainstream Lisp. They were implemented in the Lisp 2 project, but that ended up being an evolutionary dead-end; various attempts at the idea have happened since but none of them really took off.
Lisp purists will argue M-expressions are unnecessary and S-expressions are all you need. However, S-expressions can make the language more foreboding to complete beginners, and even among experienced programmers, a decent percentage find them seriously off-putting. Maybe if the M-expression idea had been pursued more seriously, Lisp might be more popular today.
It’s all still bytes under the hood, albeit standard bytes. Personally I’ve not found the ability to open my css files in Word beneficial!
Many editors are already altering what is stored on disc before presenting it to you - type annotations, code folding, git info. I think we could do a lot if our default storage was the semantic representation of the code.
So much opportunity for anti-competitive behaviour. The "canonical representation" of C# could just be a memory dump of Visual Studio. Your code is held hostage forever. Everything you create belongs to Microsoft. You have to drink a verification can to keep using your IDE. Basically Unity/Adobe/etc, but for code. No thanks.
It was only ever true for "clr safe" code, or a subset of C#. In particular, since VB.NET didn't/doesn't have unsigned types, not all C# could be expressed in VB.NET, even after decompiling from IL. (Not sure what happens if you try and decompile to VB.NET code that uses unsigned, for example).
Seems like the biggest downside would be that sharing code becomes much harder. Currently it is easy to share code in as small of a portion as you'd like. If the canonical representation is an AST, it opens up a lot of problems around sharing pieces of a program. This seems like a very substantial downside.
Even simply "sharing" within your own systems, like copying blocks into notes or another program, would be a lot harder. Maybe I'm not knowledgeable enough here and this wouldn't be as thorny as it seems.
You can (sort of) do all of that with the Eclipse Modeling Framework[0].
Your AST is what EMF calls a "model".
By default the "backend" and ecosystem surrounding EMF is skewed towards Java for historical reasons, but there have been some prototypes with other languages as well.
You can serialize your AST in any way you like, although by default it relies on XMI files. You can implement your own textual concrete syntax, or rely on a database.
The EMF ecosystem has tools for implementing textual or "graphical" concrete syntaxes. You can combine them (e.g. usually a specific subset of your AST gets edited in a certain way that's best for your targetted end users).
The ecosystem also has tools for performing comparisons and plugging them into your editing means.
Of course all of this tooling requires a lot more work than an LSP server.
I believe, this was an idea in ALGOL, not sure which iteration.
I think, the reason it was never implemented was that more translation = more complicated debugging. It also means that programmers have a more distorted and incomplete model of the program they are writing, i.e. more bugs.
NB. Lisp, as originally envisioned by McCarthy, had one more translation layer (the translated version had square brackets instead of the parenthesis), but it didn't take off for, basically, the same reason.
So... while I understand the benefits you see from doing what you suggest, I think that at the same time the downside makes this not worth pursuing.
In the .NET universe, you can mechanically convert between C# and VisualBasic. This is more or less done by going through CIL ( Common Intermediate Language - .NET assembly essentially ). So, it is more or less what you are saying.
.NET decompilers are common. I have built a few toy languages and compilers on .NET. For one of them, I could decompile CIL into my language. So, I could view .NET libraries from other sources in my language.
I think this is essentially the same idea you are proposing.
It only works if the languages are similar though. Going between F# and C# does not always work as well for example.
> you could configure the syntax however you wanted. And sure, it would be persisted to disk in some canonical representation, maybe text, maybe a DB ... but you'd never need to worry about it, because all your tooling (VCS, diffs, refactoring tools, etc) would work with the AST, and you'd never need to worry about tabs or spaces ever again.
You are describing an entire industry of IDE Smell with an IDE monoculture.
Programs are tightly coupled to textual representations because a compiler is a textual representation transformer. If you deviate from the accepted textual form then the compiler is generally clueless to do anything - that is, it can’t read your intent.
People should recognize that tabs vs. spaces is more a question of editing (e.g.: What happens when you press Backspace behind an indentation? Does the caret move at uniform speed on key repeat or does it suddenly jump in places?) than of stored representation. You could even have an editor/viewer that lets you choose the display-width of spaces at the start of lines.
This is what Unison is supposed to do. As far as I know there is currently only one (Haskell-like) textual representation, but the program is stored in a binary representation of the AST, and once you've entered your code, it's all operations on that AST from then on, and adding different frontends should be not difficult.
Code is linked to textual representations because code... is text? The entire point is that there are algorithms to go from "print()" to something that actually makes the computer work. How else could it even possibly work?
I think we don't have any sort of flexible AST sort of thing because they're mostly not necessary. The hard problems of programming don't usually have much to do with syntax.
And if you're going to downvote, kindly explain why, thanks. I just want to know exactly how this thing is supposed to work...
I may recall incorrectly but AppleScript may be an example: some file formats are serialized ASTs. The editor displays it as textual code. A downside of this is that you can’t save a syntactically invalid file.
Sure it's an intuitive way of representing your data. Is it the most appropriate though?
See an example [0] about using Projectional Editing in order to use mathematical notations for formulas.
There are visual programming languages that chain together blocks, instead of raw text.
> The hard problems of programming don't usually have much to do with syntax.
I guess it depends on how you define hard. You are clearly talking about "a singular issue that needs to be solved", which really only effects a single developer / team and, to a lesser extent, those that use that solution. But if you consider something like syntax, you're now talking about something that much a much smaller impact _per developer_, but has that impact on _every_ developer. The syntax issue may have a much larger impact overall.
Thanks to LLMs, we are quite close to achieving this. I can write code in Python (sometimes even plain english), and GPT can convert it to Go or even Haskell if I like. The conversion is accurate 95% of the time on the first attempt in my use cases, and I expect this to improve further with more powerful models in the near future.
You can take it a step further, LLMs can already "execute" arbitrary non-existent languages, with non-existent data. Here are a couple of examples using a tool I wrote[1]:
% echo "nums 1 10 | filter even | to_words | map uppercase" | refab imagine
TWO
FOUR
SIX
EIGHT
TEN
% echo "with file '/tmp/top-ten-most-populous-cities.txt' do; cities = read; cities.each { |city| (city.name, city.utc_offset) }" | refab imagine
Tokyo, 9
Delhi, 5.5
Shanghai, 8
São Paulo, -3
Mumbai, 5.5
Mexico City, -6
Beijing, 8
Osaka, 9
Cairo, 2
New York, -5
For what it's worth, the tool isn't specialized for this, 'imagine' is just one of many prompts it can execute.
Of course the execution is non deterministic and at the moment only works for simple things, but you can imagine as LLMs get more capable and more integrated with tools this will matter less and less.
I largely agree, but I don't think the current experience is the right one.
I recently started writing a game in Godot. I don't know GodotScript, and I've found I don't like it very much in trying to learn. I turned to aider.chat to see if I could describe the functions, data structures, and systems I wanted and have it write them. I also tried writing in a more familiar language (...one with braces...) and having it translate those files.
It does pretty well, but it doesn't feel like software engineering. It's too hands-off and doesn't activate the same neurons. All the problem-solving and puzzle-solving is gone, and the successes are quite boring, and the failure modes are more irritating even if they're necessarily quicker to solve.
It's a weird experience. I'm moving so, so much faster than I would have on my own, but I don't enjoy it. It feels like cheating - I'm not actually ashamed of what I'm doing but I also won't take credit for writing the code.
However, what I'm getting at is this: If I could write the code in a syntax or even language that I prefer and have copilot or whatever translate it in near-real-time (without active prompting), that would be the best of both worlds. I'd still be a little sad at myself if I didn't learn the new language, but I also think this method would facilitate learning better than what I'm doing with aider (because I could see what my code turns into as I'm writing it, and learn that "translation").
I can confirm that it is a suitable use case for GPTs. I do GPT-assisted programming language design and experimentation. In some cases, GPT-4 can even generate a basic interpreter that allows me to test my new language.
Here is an example of GPT's output for Python with braces that was generated after just spending 10 seconds for the prompt:
def preprocess_braces(code: str) -> str:
lines = code.split('\n')
processed_lines = []
indent_level = 0
indent_str = ' ' # 4 spaces for indentation
for line in lines:
stripped_line = line.strip()
# Check for opening brace
if stripped_line.endswith('{'):
processed_lines.append(indent_str * indent_level + stripped_line[:-1].strip() + ':')
indent_level += 1
# Check for closing brace
elif stripped_line == '}':
indent_level -= 1
else:
processed_lines.append(indent_str * indent_level + stripped_line)
return '\n'.join(processed_lines)
# Example usage:
code_with_braces = """
def example_function() {
if True {
print("Hello, world!")
}
for i in range(5) {
print(i)
}
}
"""
processed_code = preprocess_braces(code_with_braces)
exec(processed_code) # This will execute the transformed Python code
print("Processed Code:\n", processed_code)
This is a great discussion with many a differing point of view.
To some, significant indentation is better.
Others — too used to braces — miss them dearly in Python.
Next ones, vie for the non-text source code, something to get us past these discussions altogether (editors working on .pyc files directly?).
For programs to be maintained, they need to be read, understood and improved. One—often undervalued—skill in programming is to write beautiful code, because that is more art than craft. And unfortunately, tools like Black prohibit the true artists from expressing themselves clearly with code formatting too. And to those, white-space or braces matters on a different level, and everything else is attempting to make up excuses for why one is better than other.
And while conceptual operations we do on the code seem simple on the surface, devising an editing tool that would do semantic operations on the AST is fricking hard and likely to be very non-ergonomic. Look at all the attempts to make code refactoring tooling: it's crazily complex and confusing that it's simpler to just go and grep for a string and fix anything you find.
As long as it's faster to use regular editing operations to shuffle code around, indent or unindent it (or wrap it with braces), tweak one thing here or there, simple text editors will mostly rule the world of programming.
Python encodes structure in only one way, using indentation. There is no redundant signal that can be checked for a discrepancy that could indicate a problem. GCC and Clang can diagnose when indentation is "misleading" because it doesn't match what the braces are saying.
Python's choice of representation is such that two different Python programs show zero differences under a white-space-suppressed diff.
The white-space suppressed diff is a useful tool for comparing programs when some sections of code have changed indentation but are otherwise the same; yet we cannot rely on it if we are using Python.
Python's syntax design is objectively poor on several purely technical points. In its favor, there are only handwaving pop psych arguments.
> There is no redundant signal that can be checked for a discrepancy that could indicate a problem.
What’s the problem with having no redundancy? By the same logic, we should require numerals to be spelled out in words so the compiler can check if the programmer didn’t accidentally write a different number.
>The white-space suppressed diff is a useful tool for comparing programs when some sections of code have changed indentation but are otherwise the same; yet we cannot rely on it if we are using Python.
yes, because in Python two programs with different indentation are not the same. what's the problem that you can't use tools intended for code without significant whitespace with code in Python?
> Python encodes structure in only one way, using indentation. There is no redundant signal that can be checked for a discrepancy that could indicate a problem. GCC and Clang can diagnose when indentation is "misleading" because it doesn't match what the braces are saying.
I've seen both misplaced braces (akin to mis-indenting blocks in Python), indentation not matching braces and other problems of the same sort in non-Python code. Readers of the code would misunderstand the code when badly indented, and might introduce bad braces as well (not everybody uses automatic formatters and linters either, esp as they will sometimes "quickly" edit code in their non-usual dev environment). Not to mention that some "braced" languages allow having single-line blocks without braces.
It's also not true that this is the only way Python encodes structure: new blocks generally only start with a ":", and you've got control flow keywords that allow for new blocks to start. One could argue that's a great feature disallowing you from introducing confusing spacing without actually having a new block started.
While I am fond of applying many of double-entry-accounting principles in programming to increase trust in what we write, I believe that's much better done with unit-tests, which can more clearly demonstrate the expectations for any code and read more like documentation.
Do you think all syntax in programming languages should have some sort of extra validation built-in? Eg. you should type in a constant twice (declare it as `const int a = 5` and then you have to set the value later as `a = 5` or you get an error?)?
As I said above, people will always find an "objective" excuse why their preference is better. But I've seen bad-block-boundaries in Python as much as I've seen it in other languages which use explicit block boundaries like braces (and I've done more Python over the last ~20 years). I've heard this argument a gazillion times, but hundreds of bugs due to that have simply failed to materialize while working on large projects with tens and hundreds of people.
Getting indentation right is _really_ not that hard, just like getting braces right is not that hard. I've yet to find someone who prefers their code to not be indented at all, and only rely on braces — at least not in a team setting where you can't simply reformat all of it.
I strongly agree with your main point that pythons syntax design is objectively poor, but content that even the grace you seek to extend to it is wrong
>Python encodes structure in only one way, using indentation
Except when you write a multiple line string using """ notation.
Or when you put things inside parenthesis, which i have seen as the preferred solution for method chaining.
Point is, python claims indentation is all that is needed, and then very quickly breaks it's own rule.
Most diff tools default to not suppressing leading whitespace. Trailing whitespace should be automatically removed and enforced by your code formatting tools, leaving no reason to suppress it.
As someone who has been diffing endless amounts of python through various diff tools, this is totally a non issue.
go fmt is really nice, that there is A standard for the language which is just included. No more arguments, just do what the including tooling desires.
Python... I'd love if it shipped with a formatter that converted indents to braces, and then had an option for expressing indent as spaces (with number of spaces per indent) OR tabs (same, default 1); then still kept the braces.
I don't understand why we don't make the aesthetic aspects of syntax (e.g. block delimitation) to be a feature of the editor rather than the source of truth for the code. For all unix profited from text I think we have the tooling necessary to move our storage and editors beyond it, and it's been obvious the entire time it comes with non-trivial liability converting to and from more reliably structured representations.
I think it boils down to “text is good _enough_” (80% of the value, 10% of the complexity if even that), and it’s a format that’s incredibly interoperable. You can use tools that aren’t just language agnostic, but are not even programming-specific: notepads, grep, git, sed…
I agree completely; I just think on a gut-level a solution would be validated with use despite the small marginal gain over text in complexity of tooling. Among RAGs, TreeSitter, and the success of LSPs, I think there's room here to synthesize some improvements.
While we're on the topic, if we store only syntactically valid programs, we can express diffs in terms of semantic refactoring rather than textual changes. This would enable stuff like preserving refactoring across merges, thereby bypassing conflicts that would arise under text merges. There are limits to this of course as you can still come up with conflicts, but anything to ameliorate the nightmare of manually fixing a textural merge.
"Code-as-Text" is way too universal and deeply ingrained to change in the short to medium term. I've worked a bit with low code / no code platforms and everything becomes a mess: little to no version control, search is bad, no import or possibility to generate components programmatically,...
However, I'm totally with you on having the editor show the code as you'd like. As much as I don't like tabs, at least the user could choose their preferred width for indentation. (A less disruptive Python-with-braces could be the editor showing braces but converting to spaces behind the scene.)
> However, I'm totally with you on having the editor show the code as you'd like.
Such editors would remove some categories bikeshedding, but would add brand new categories of bikeshedding.
* Why did/didn't you add an empty line (EDIT: or whatever no-op visual equivalent) after that `if` block? My editor needs blocks to be separated a certain way to be able to display them nicely grouped in logical blocks! This representation of AST is so limited that we can't store such differences in style, we need editors that work at some super-AST level!
* What do you mean my code is an unreadable mess with hundreds of operations? All I see in my editor is a nice single operation "copy fields from class X to struct Y". If your editor can't detect such an obvious thing from the AST and display it nicely, then find a better one.
* Some codebases not even bothering with functions because the founding engineers use a specific IDE that just lets them group code arbitrarily (no no that's not reinventing functions, that's what progress looks like!), and you can't use your favorite editor because that's not on the scope of that editor, so the editor's author politely tells you to use that other editor you dislike if you really need such a thing.
* I can probably come up with more scenarios if I spend more time thinking about it. And there's probably also more petty scenarios that I can't even imagine.
I'm not saying such editors wouldn't be nice. Everyone has their own preferences, and some people might work better this way. Just, don't expect them to reduce the amount of petty bikeshedding in projects, much less eliminate it.
Lispy languages have structural editing tools that make it a lot like working directly on an AST. It's a delight when you get used to it.
The spacing/linebreaks are all just auto formatted and mostly an afterthought. It would only be one step further to present the code with the users choice of block start/end sequences.
Working with paredit was certainly one of the main ingredients for structural editing (as opposed to textual editing) to click with me. I've been watching TreeSitter with great interest.
The freedom that text provides is not a blessing and a curse. Even small things like adding "paragraph break" lines in a run of code can make it more readable. Of course you can encode a limited number of these "style" features into your source AST but much like code formatters there is always a limit to helpful style that is preserved and irrelevant stuff that is rendered to each programmers preference.
Would the JVM ecosystem almost be a working example of this? Since there are a variety of languages with editor integration that all compile down to the same byte code, it feels pretty close to what you’re describing.
I believe that JVM bytecode is too low level for a source format. At the very least you would need to preserve some form of comments. Also I think JVM locals lose their names during compilation.
I think text based programming languages are a local minima that's it's hard climb out of. Partly because most languages aren't designed around being represented by structured data. And no one has any morden experience with that.
But do imagine a world were changing the name of a field in a struct results in a single diff message, 'struct foo field bar changed to baz'. And where a change set to to a library can be mechanically applied to to code that depends on it and it just works.
Braces, maybe. But I've always felt that semicolons at the end of lines were just noise. I can tell it's the end of the line because it's the end of the line. The few cases of ambiguity that happen can be solved by common sense (in the programmer or in the parser).
> But I've always felt that semicolons at the end of lines were just noise.
I felt this way until I engaged in code with very horizontal coding styles. Semicolons make it extremely easy to visually break up sequential statements from expressions consuming multiple lines.
The end of the line isn't always the end of the line, such is if there is a chain of methods and you are limited to 80 columns. Then you can just wrap on a . , and tell the language where you actually want the line to end
Haskell also inserts semicolons, but it has a different rule, which I think is a bit interesting.
If the current line starts on the same line that the previous expression started on, then a semicolon is inserted at the beginning of the current line. There are also some messy rules about opening and closing braces being inserted.
So, this:
do expr1
expr2
do expr3
expr4
Ends up being parsed as though it were written like this:
do { expr1
; expr2
do { expr3
; expr4
}
}
If the next line is indented, it's just taken as a continuation of the previous line's expression, so nothing needs to happen.
The rules are a little different. We have two-space indentation, like what is favored in Lisp, but there is a space after the opening brace. For symmetry, we put spaces between the closing ones. It works best if we don't "cuddle" the opening brace but put it on a new line.
There is a rhyme and reason to it, and consistency.
I've written some C programs that way, though not recently and can't point to any online examples.
However, I also wrote, and maintain, the Yacc grammar file in TXR Lisp in this style (just the actions in the grammar portion). For whatever reason, it works well in grammar files.
Swahili looks unfamiliar to me. Because I didn't grow up in southern Africa. Don't conflate subjective familiarity with objective simplicity - any "unfamiliar" concept only reflects on you, not your subject.
Glad that auto formatting tools like black and ruff (at least in Python world) are increasingly becoming the norm. It’s really nice to not think about whitespace or humor these silly arguments.
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”
Antoine de Saint-Exupéry
Unfortunately, we still live in an era where humans have to adapt to technology rather than the other way around. From my perspective, this is particularly true for programming.
For me, Python is a good (though not ideal) mix of simple syntax and power. The language is characterized by low redundancy, meaning it uses fewer unnecessary characters like semicolons or curly braces to mark the end of a line. If a human can recognize the end of a line without special characters, then the compiler should be able to as well.
As someone with ADHD, I find it particularly difficult not to get distracted by these and other superfluous details. These small distractions add up and can become very burdensome. Interestingly, I found it easier to program in Assembler and Modula than in languages like C++ (MSVC), PHP, or JavaScript – at least as long as the projects were small.
Even a brief look at Rust’s syntax causes me almost physical discomfort, no matter how great, powerful, and useful the language may be.
For this reason, I almost exclusively use the terminal for emails, calendar, and programming, even though complex GUIs can simplify some tasks.
Although Python is not perfect in terms of syntax, it offers a good balance. Perhaps one day, before the perfect programming language exists, we will be able to use AI and ML to explain to the computer what a program should do with simple language (better than ChatGPT right now), just like Captain Picard. In fantasy, a few letters, punctuation marks, and some grammar is all that is needed. This may lead to inaccuracies in human-to-human communication, but that does not mean the same problems must occur in communication with an intelligent compiler.
Making syntax as “human-readable” as possible should always be the highest priority. We could unlock so much potential this way.
> For me, Python is a good (though not ideal) mix of simple syntax and power. The language is characterized by low redundancy, meaning it uses fewer unnecessary characters like semicolons or curly braces to mark the end of a line. If a human can recognize the end of a line without special characters, then the compiler should be able to as well. […]
Even a brief look at Rust’s syntax
That’s a shame because you’re missing that these sigils actually have meaning there, because the semantics of the language are completely different: Python is statements-based with very limited scoping (global and function), Rust is expression based with block scoping.
As a result, blocks (paired braces) are a way to pack multiple statements into an expression e.g.
let v = {
let a = thing1();
let b = thing2();
thing3(a, b)
};
And `;` is not an alias for end-of-line, it’s a separator for statements.
And not aliasing end-of-line to end-of-statement is relevant to rust being expression oriented, it’s very common for expressions to span multiple lines, in that case Python requires either wrapping the entire thing in parenthesis or escaping the EOL with `\`.
Could you find other ways to do this? Sure, but then you have to make other tradeoffs e.g. wrap everything in matching symbols à la lisp, or make statements into special cases à la Haskell.
> the semantics of the language are completely different: Python is statements-based with very limited scoping (global and function), Rust is expression based with block scoping.
This is a red herring. Haskell, CoffeeScript, Nim, Lean, etc. are expression-oriented and use indentation like Python, while C(++), Java, JavaScript, etc. are statement-oriented and use braces.
> Making syntax as “human-readable” as possible should always be the highest priority. We could unlock so much potential this way.
I think there is a balance to strike here.
I often like to work with code by cutting and pasting sections around and then hitting the format hotkey to align everything.
I enjoy the guarantee that as long as the syntax is correct, it doesn't matter how I type out the code because I'll just hit the formatter hotkey immediately afterwards and it will apply the correct indentation and lay it all out nicely
Obviously that's impossible in Python and it makes working with it really frustrating to me, it feels so delicate, I almost don't want to touch the code because I'm always accidentally changing the indentation, it's a real limit of this "human-readable" syntax in my opinion.
That’s definitely possible in Python. Maybe not if you’re using the simplest text editor. But regardless of whether it’s NVIM or Visual Studio Code, you can easily jump to the end of a line, press Return, and the code will be inserted with the correct indentation. RUFF or BLACK do the rest, and they do it quite intelligently.
> Obviously that's impossible in Python and it makes working with it really frustrating to me, it feels so delicate, I almost don't want to touch the code because I'm always accidentally changing the indentation
Skill issue. I just press my “paste with correct indentation” hotkey and move on.
> Unfortunately, we still live in an era where humans have to adapt to technology rather than the other way around. From my perspective, this is particularly true for programming.
I think it's interesting that you say this, and point it towards Python.
Personally, with Python's significant whitespace, I feel more constrained writing code in a style that I prefer, with the computer requiring me to adapt to it, compared to other languages. I see code with braces and semi-colons more freeing because I get more control over the line structure.
At the end of the day, it's all stylistic personal preference. Python isn't the evolutionary ideal form for programming languages, it's just what some people prefer.
The Saint-Exupéry quote applies to Lisp, may in some manner apply to Python 2.7, but certainly not to Python 3.12.
Many people here argue that braces do provide some structure that helps in understanding and navigating the program. Python files over 100 lines with a lot of if-statements become syntactically unreadable.
So taking them away does not help.
Generally, minimalism (except for the Lisp-style one) is not always good. Python is called executable pseudo-code. Do academics use it to specify algorithms?
No, most still use some form of Pascal/Algol style syntax, which conveys the meaning much better.
> The Saint-Exupéry quote applies to Lisp, may in some manner apply to Python 2.7, but certainly not to Python 3.12.
Is the syntax not about 80% the same? And isn't most of that "same simple syntax" commonly used daily?
> Many people here argue that braces do provide some structure that helps in understanding and navigating the program. Python files over 100 lines with a lot of if-statements become syntactically unreadable.
That's probably True. But poor programming discipline and syntax that eases reading problematic code also contribute to this issue.
> So taking them away does not help.
For me it actually promotes better coding style and hygiene.
> Generally, minimalism (except for the Lisp-style one) is not always good.
The quote was about perfectionism, not minimalism.
Code is so incredibly hard to mentally grasp and every mental overload should be omitted to reflect on the logic.
There is a reason why there is one code basis and this should always be curated by a linter to uniformly enforce a standard.
There is still plenty of room for style and code organization.
I witnessed first hand many trench wars around seemingly small things like curly brackets in IF statements dealing with the question of one white space or none, because it appealed to personal preferences and before ESlint people would go to great length reformatting hundreds of LoCs just to get their right feeling of code syntax.
Weird. And git -diff was massive, as well as the code reviews.
(Un)Happy times. :D
>Maybe we'd have editors and viewers where you could configure the syntax however you wanted
I took this to mean that, in this fantasy universe, you could make any source file look however you want. Like tabs vs spaces and pure html vs html-with-css, this is about separating meaning from presentation. Is there a good reason to force the same visual representation on everyone?
But if the syntax was separate from the underlying representation, couldn't you just have your editor open it the way you want?
I’m all for keeping consistent flat utf-8 files. I’d hate for my code ultra simpleminded, possible to pen test with a pen and paper, python and sql code to be wrapped in a god awful xml or json or proprietary db markup and object hierarchical model of what code should look like.
Like I imagine trying to check in a jupyter notebook but worse.
For instance tabs vs spaces was decided and text editors accommodated this and despite what may be someone’s personal preference a uniform decision was made.
Humans can learn. We should use accessible formats and push for standards and keep those readable.
It would definitely be the norm that there were languages that were dark or light mode, and both sides would be convinced they were right.
Do you group things like:
Or: Or: Grouping code by a single blank line can make a big difference. This is a bit of a silly example, but there's tons of not-so-silly examples. You can't really represent that in an AST.Line length is another. infinite line length doesn't work as screens aren't infinitely long. Automatic wrapping doesn't work because you want to break at specific points. An example is something like:
Cramming as much as possible on as few lines as possible is just not going to work well.There's tons of cases.
That's why no one does it. Because it just won't work. Everyone will hate it.
<visual-group>…</visual-group>
Also, I wish that at least in current editors there would be a way to render \n\n as a half-height line. Full-height empty lines are too bold.
Automatic wrapping doesn't work because you want to break at specific points
You really want to have a set of buttons that switch between:
And a hint on a default representation, with some default heuristics.How does it know that x, y, width, and height are semantically grouped and best put on the same line? It doesn't. (from the other comment)
Look higher, parameters can be grouped at the declaration level. <params><related-params name=“coords”>…</>…</>. Now you can render a nice frame around these in block mode. Or not, depending on local renderer settings.
Everyone will hate it.
There’s always a way to make everyone hate something, especially if the solution is clueless about its problem. It doesn’t mean it should be done this way. Experiment and evolution could make it work, we just have to let people try instead of dismissing it so confidently.
That said, if you use classes, descriptive names, and appropriate comments, it won't matter how you group because your code will be self explanatory.
Finally, with today's wide-screen monitors on desktop, line length is less of a worry. Problem only arises when reading code on mobile devices.
The last workplace I was at had a soft wrap around 80 characters, but we upped that to 100 when functions and methods became almost vertical.
1. Breaking at specific points is something that can be specified by the pretty-printer of the _viewer_ you are using. Think of existing auto-formatters and imagine that they're working over the view instead of the persisted form.
2. The AST can have pointers into advisory data (or the other way around, if desired, the "program data" can include the AST, but also other things as well) to note that there is an anonymous region here (C# and friends already have conventions for this _for the source code_ that Visual Studios understands - look at `#region` comments). This would let viewers choose their preferred representation for a region.
https://news.ycombinator.com/item?id=40882133
You could then decompile to some alternative syntax, but you'd lose any idiosyncratic formatting represented by the compressed diff.
I don't see why it couldn't be done though, I think it just hasn't been a priority. Heck, you could have 100 different users collaborating in 100 different "languages", and so long as they serialized to the same AST and back, none of them would ever have to see the atrocious syntax which the other users prefer. Their editors and browsers could just render everything according to their users' preferences.
Edit: it appears that Unison has an issue for this feature: https://github.com/unisonweb/unison/issues/499
Note that I said 'statements', not 'expressions'.
A lot of the confusion here (and maybe yours, too) stems from this difference. In Rust, (almost) everything is an expression by default, and you turn it into a statement by adding a semicolon. This allows you (and the type checker) to very neatly distinguish between expressions and statements, which is great. It's a very nice and elegant approach imo.
What if the AST is persisted as S-expressions, but then you have a different syntax to edit it? Algol-ish, Pascal-ish, C-ish, Python-ish: choose your poison (or even support multiple poisons and let the developer pick the one they prefer?)
This was actually the original plan with Lisp. Lisp was originally supposed to have two syntaxes, S-expressions and M-expressions, with M-expressions being Algol-like. However, the implementation of M-expressions was delayed, and people got so used to using S-expressions directly, they decided M-expressions were unnecessary and they were never implemented in mainstream Lisp. They were implemented in the Lisp 2 project, but that ended up being an evolutionary dead-end; various attempts at the idea have happened since but none of them really took off.
Lisp purists will argue M-expressions are unnecessary and S-expressions are all you need. However, S-expressions can make the language more foreboding to complete beginners, and even among experienced programmers, a decent percentage find them seriously off-putting. Maybe if the M-expression idea had been pursued more seriously, Lisp might be more popular today.
Many editors are already altering what is stored on disc before presenting it to you - type annotations, code folding, git info. I think we could do a lot if our default storage was the semantic representation of the code.
And you will have easier time getting your changes merged into dotnet/runtime than into Python.
Even simply "sharing" within your own systems, like copying blocks into notes or another program, would be a lot harder. Maybe I'm not knowledgeable enough here and this wouldn't be as thorny as it seems.
Your AST is what EMF calls a "model". By default the "backend" and ecosystem surrounding EMF is skewed towards Java for historical reasons, but there have been some prototypes with other languages as well. You can serialize your AST in any way you like, although by default it relies on XMI files. You can implement your own textual concrete syntax, or rely on a database. The EMF ecosystem has tools for implementing textual or "graphical" concrete syntaxes. You can combine them (e.g. usually a specific subset of your AST gets edited in a certain way that's best for your targetted end users). The ecosystem also has tools for performing comparisons and plugging them into your editing means.
Of course all of this tooling requires a lot more work than an LSP server.
[0]: https://eclipse.dev/modeling/emf/
Deleted Comment
I think, the reason it was never implemented was that more translation = more complicated debugging. It also means that programmers have a more distorted and incomplete model of the program they are writing, i.e. more bugs.
NB. Lisp, as originally envisioned by McCarthy, had one more translation layer (the translated version had square brackets instead of the parenthesis), but it didn't take off for, basically, the same reason.
So... while I understand the benefits you see from doing what you suggest, I think that at the same time the downside makes this not worth pursuing.
.NET decompilers are common. I have built a few toy languages and compilers on .NET. For one of them, I could decompile CIL into my language. So, I could view .NET libraries from other sources in my language.
I think this is essentially the same idea you are proposing.
It only works if the languages are similar though. Going between F# and C# does not always work as well for example.
You are describing an entire industry of IDE Smell with an IDE monoculture.
https://en.wikipedia.org/wiki/Indent_(Unix)
Edit: I do agree and find your AST suggestion profound though!
I think we don't have any sort of flexible AST sort of thing because they're mostly not necessary. The hard problems of programming don't usually have much to do with syntax.
And if you're going to downvote, kindly explain why, thanks. I just want to know exactly how this thing is supposed to work...
I may recall incorrectly but AppleScript may be an example: some file formats are serialized ASTs. The editor displays it as textual code. A downside of this is that you can’t save a syntactically invalid file.
Sure it's an intuitive way of representing your data. Is it the most appropriate though? See an example [0] about using Projectional Editing in order to use mathematical notations for formulas.
[0]: http://voelter.de/data/pub/gemoc2014-voelterLisson-MPSNotati...
There are visual programming languages that chain together blocks, instead of raw text.
> The hard problems of programming don't usually have much to do with syntax.
I guess it depends on how you define hard. You are clearly talking about "a singular issue that needs to be solved", which really only effects a single developer / team and, to a lesser extent, those that use that solution. But if you consider something like syntax, you're now talking about something that much a much smaller impact _per developer_, but has that impact on _every_ developer. The syntax issue may have a much larger impact overall.
Of course the execution is non deterministic and at the moment only works for simple things, but you can imagine as LLMs get more capable and more integrated with tools this will matter less and less.
[1] https://github.com/lgastako/refab
I recently started writing a game in Godot. I don't know GodotScript, and I've found I don't like it very much in trying to learn. I turned to aider.chat to see if I could describe the functions, data structures, and systems I wanted and have it write them. I also tried writing in a more familiar language (...one with braces...) and having it translate those files.
It does pretty well, but it doesn't feel like software engineering. It's too hands-off and doesn't activate the same neurons. All the problem-solving and puzzle-solving is gone, and the successes are quite boring, and the failure modes are more irritating even if they're necessarily quicker to solve.
It's a weird experience. I'm moving so, so much faster than I would have on my own, but I don't enjoy it. It feels like cheating - I'm not actually ashamed of what I'm doing but I also won't take credit for writing the code.
However, what I'm getting at is this: If I could write the code in a syntax or even language that I prefer and have copilot or whatever translate it in near-real-time (without active prompting), that would be the best of both worlds. I'd still be a little sad at myself if I didn't learn the new language, but I also think this method would facilitate learning better than what I'm doing with aider (because I could see what my code turns into as I'm writing it, and learn that "translation").
Dead Comment
Here is an example of GPT's output for Python with braces that was generated after just spending 10 seconds for the prompt:
To some, significant indentation is better.
Others — too used to braces — miss them dearly in Python.
Next ones, vie for the non-text source code, something to get us past these discussions altogether (editors working on .pyc files directly?).
For programs to be maintained, they need to be read, understood and improved. One—often undervalued—skill in programming is to write beautiful code, because that is more art than craft. And unfortunately, tools like Black prohibit the true artists from expressing themselves clearly with code formatting too. And to those, white-space or braces matters on a different level, and everything else is attempting to make up excuses for why one is better than other.
And while conceptual operations we do on the code seem simple on the surface, devising an editing tool that would do semantic operations on the AST is fricking hard and likely to be very non-ergonomic. Look at all the attempts to make code refactoring tooling: it's crazily complex and confusing that it's simpler to just go and grep for a string and fix anything you find.
As long as it's faster to use regular editing operations to shuffle code around, indent or unindent it (or wrap it with braces), tweak one thing here or there, simple text editors will mostly rule the world of programming.
Python's choice of representation is such that two different Python programs show zero differences under a white-space-suppressed diff.
The white-space suppressed diff is a useful tool for comparing programs when some sections of code have changed indentation but are otherwise the same; yet we cannot rely on it if we are using Python.
Python's syntax design is objectively poor on several purely technical points. In its favor, there are only handwaving pop psych arguments.
What’s the problem with having no redundancy? By the same logic, we should require numerals to be spelled out in words so the compiler can check if the programmer didn’t accidentally write a different number.
yes, because in Python two programs with different indentation are not the same. what's the problem that you can't use tools intended for code without significant whitespace with code in Python?
I've seen both misplaced braces (akin to mis-indenting blocks in Python), indentation not matching braces and other problems of the same sort in non-Python code. Readers of the code would misunderstand the code when badly indented, and might introduce bad braces as well (not everybody uses automatic formatters and linters either, esp as they will sometimes "quickly" edit code in their non-usual dev environment). Not to mention that some "braced" languages allow having single-line blocks without braces.
It's also not true that this is the only way Python encodes structure: new blocks generally only start with a ":", and you've got control flow keywords that allow for new blocks to start. One could argue that's a great feature disallowing you from introducing confusing spacing without actually having a new block started.
While I am fond of applying many of double-entry-accounting principles in programming to increase trust in what we write, I believe that's much better done with unit-tests, which can more clearly demonstrate the expectations for any code and read more like documentation.
Do you think all syntax in programming languages should have some sort of extra validation built-in? Eg. you should type in a constant twice (declare it as `const int a = 5` and then you have to set the value later as `a = 5` or you get an error?)?
As I said above, people will always find an "objective" excuse why their preference is better. But I've seen bad-block-boundaries in Python as much as I've seen it in other languages which use explicit block boundaries like braces (and I've done more Python over the last ~20 years). I've heard this argument a gazillion times, but hundreds of bugs due to that have simply failed to materialize while working on large projects with tens and hundreds of people.
Getting indentation right is _really_ not that hard, just like getting braces right is not that hard. I've yet to find someone who prefers their code to not be indented at all, and only rely on braces — at least not in a team setting where you can't simply reformat all of it.
This must surely be a bad thing? How would this signal an unfortunate indentation change between:
And: ???>Python encodes structure in only one way, using indentation
Except when you write a multiple line string using """ notation.
Or when you put things inside parenthesis, which i have seen as the preferred solution for method chaining.
Point is, python claims indentation is all that is needed, and then very quickly breaks it's own rule.
As someone who has been diffing endless amounts of python through various diff tools, this is totally a non issue.
Python... I'd love if it shipped with a formatter that converted indents to braces, and then had an option for expressing indent as spaces (with number of spaces per indent) OR tabs (same, default 1); then still kept the braces.
While we're on the topic, if we store only syntactically valid programs, we can express diffs in terms of semantic refactoring rather than textual changes. This would enable stuff like preserving refactoring across merges, thereby bypassing conflicts that would arise under text merges. There are limits to this of course as you can still come up with conflicts, but anything to ameliorate the nightmare of manually fixing a textural merge.
However, I'm totally with you on having the editor show the code as you'd like. As much as I don't like tabs, at least the user could choose their preferred width for indentation. (A less disruptive Python-with-braces could be the editor showing braces but converting to spaces behind the scene.)
Such editors would remove some categories bikeshedding, but would add brand new categories of bikeshedding.
* Why did/didn't you add an empty line (EDIT: or whatever no-op visual equivalent) after that `if` block? My editor needs blocks to be separated a certain way to be able to display them nicely grouped in logical blocks! This representation of AST is so limited that we can't store such differences in style, we need editors that work at some super-AST level!
* What do you mean my code is an unreadable mess with hundreds of operations? All I see in my editor is a nice single operation "copy fields from class X to struct Y". If your editor can't detect such an obvious thing from the AST and display it nicely, then find a better one.
* Some codebases not even bothering with functions because the founding engineers use a specific IDE that just lets them group code arbitrarily (no no that's not reinventing functions, that's what progress looks like!), and you can't use your favorite editor because that's not on the scope of that editor, so the editor's author politely tells you to use that other editor you dislike if you really need such a thing.
* I can probably come up with more scenarios if I spend more time thinking about it. And there's probably also more petty scenarios that I can't even imagine.
I'm not saying such editors wouldn't be nice. Everyone has their own preferences, and some people might work better this way. Just, don't expect them to reduce the amount of petty bikeshedding in projects, much less eliminate it.
Lispy languages have structural editing tools that make it a lot like working directly on an AST. It's a delight when you get used to it.
The spacing/linebreaks are all just auto formatted and mostly an afterthought. It would only be one step further to present the code with the users choice of block start/end sequences.
Deleted Comment
But do imagine a world were changing the name of a field in a struct results in a single diff message, 'struct foo field bar changed to baz'. And where a change set to to a library can be mechanically applied to to code that depends on it and it just works.
I felt this way until I engaged in code with very horizontal coding styles. Semicolons make it extremely easy to visually break up sequential statements from expressions consuming multiple lines.
I'm not sure what you mean. Could you give an example?
https://go.dev/doc/effective_go#semicolons
If the current line starts on the same line that the previous expression started on, then a semicolon is inserted at the beginning of the current line. There are also some messy rules about opening and closing braces being inserted.
So, this:
Ends up being parsed as though it were written like this: If the next line is indented, it's just taken as a continuation of the previous line's expression, so nothing needs to happen.[0]: https://amelia.how/posts/parsing-layout.html
Make python a lisp - indentation is just the number of brackets.
Hy - http://hylang.org
Try it like this:
The rules are a little different. We have two-space indentation, like what is favored in Lisp, but there is a space after the opening brace. For symmetry, we put spaces between the closing ones. It works best if we don't "cuddle" the opening brace but put it on a new line.There is a rhyme and reason to it, and consistency.
I've written some C programs that way, though not recently and can't point to any online examples.
However, I also wrote, and maintain, the Yacc grammar file in TXR Lisp in this style (just the actions in the grammar portion). For whatever reason, it works well in grammar files.
https://www.kylheku.com/cgit/txr/tree/parser.y
{} are braces
This is English English
Swahili looks unfamiliar to me. Because I didn't grow up in southern Africa. Don't conflate subjective familiarity with objective simplicity - any "unfamiliar" concept only reflects on you, not your subject.
Antoine de Saint-Exupéry
Unfortunately, we still live in an era where humans have to adapt to technology rather than the other way around. From my perspective, this is particularly true for programming.
For me, Python is a good (though not ideal) mix of simple syntax and power. The language is characterized by low redundancy, meaning it uses fewer unnecessary characters like semicolons or curly braces to mark the end of a line. If a human can recognize the end of a line without special characters, then the compiler should be able to as well.
As someone with ADHD, I find it particularly difficult not to get distracted by these and other superfluous details. These small distractions add up and can become very burdensome. Interestingly, I found it easier to program in Assembler and Modula than in languages like C++ (MSVC), PHP, or JavaScript – at least as long as the projects were small.
Even a brief look at Rust’s syntax causes me almost physical discomfort, no matter how great, powerful, and useful the language may be.
For this reason, I almost exclusively use the terminal for emails, calendar, and programming, even though complex GUIs can simplify some tasks.
Although Python is not perfect in terms of syntax, it offers a good balance. Perhaps one day, before the perfect programming language exists, we will be able to use AI and ML to explain to the computer what a program should do with simple language (better than ChatGPT right now), just like Captain Picard. In fantasy, a few letters, punctuation marks, and some grammar is all that is needed. This may lead to inaccuracies in human-to-human communication, but that does not mean the same problems must occur in communication with an intelligent compiler.
Making syntax as “human-readable” as possible should always be the highest priority. We could unlock so much potential this way.
That’s a shame because you’re missing that these sigils actually have meaning there, because the semantics of the language are completely different: Python is statements-based with very limited scoping (global and function), Rust is expression based with block scoping.
As a result, blocks (paired braces) are a way to pack multiple statements into an expression e.g.
And `;` is not an alias for end-of-line, it’s a separator for statements.And not aliasing end-of-line to end-of-statement is relevant to rust being expression oriented, it’s very common for expressions to span multiple lines, in that case Python requires either wrapping the entire thing in parenthesis or escaping the EOL with `\`.
Could you find other ways to do this? Sure, but then you have to make other tradeoffs e.g. wrap everything in matching symbols à la lisp, or make statements into special cases à la Haskell.
This is a red herring. Haskell, CoffeeScript, Nim, Lean, etc. are expression-oriented and use indentation like Python, while C(++), Java, JavaScript, etc. are statement-oriented and use braces.
And in Python \n is not an alias for end-of-line, it’s a separator for statements.
I think there is a balance to strike here.
I often like to work with code by cutting and pasting sections around and then hitting the format hotkey to align everything.
I enjoy the guarantee that as long as the syntax is correct, it doesn't matter how I type out the code because I'll just hit the formatter hotkey immediately afterwards and it will apply the correct indentation and lay it all out nicely
Obviously that's impossible in Python and it makes working with it really frustrating to me, it feels so delicate, I almost don't want to touch the code because I'm always accidentally changing the indentation, it's a real limit of this "human-readable" syntax in my opinion.
Skill issue. I just press my “paste with correct indentation” hotkey and move on.
I think it's interesting that you say this, and point it towards Python.
Personally, with Python's significant whitespace, I feel more constrained writing code in a style that I prefer, with the computer requiring me to adapt to it, compared to other languages. I see code with braces and semi-colons more freeing because I get more control over the line structure.
At the end of the day, it's all stylistic personal preference. Python isn't the evolutionary ideal form for programming languages, it's just what some people prefer.
Many people here argue that braces do provide some structure that helps in understanding and navigating the program. Python files over 100 lines with a lot of if-statements become syntactically unreadable.
So taking them away does not help.
Generally, minimalism (except for the Lisp-style one) is not always good. Python is called executable pseudo-code. Do academics use it to specify algorithms?
No, most still use some form of Pascal/Algol style syntax, which conveys the meaning much better.
Is the syntax not about 80% the same? And isn't most of that "same simple syntax" commonly used daily?
> Many people here argue that braces do provide some structure that helps in understanding and navigating the program. Python files over 100 lines with a lot of if-statements become syntactically unreadable.
That's probably True. But poor programming discipline and syntax that eases reading problematic code also contribute to this issue.
> So taking them away does not help. For me it actually promotes better coding style and hygiene.
> Generally, minimalism (except for the Lisp-style one) is not always good. The quote was about perfectionism, not minimalism.