Stop Requiring CRLF Line Endings

LF won over CRLF, little endian over big, UTF-8 over UTF-16, spaces over tabs - the world converges but at the same time new frontiers arise and new battles will be fought.

below43 · a year ago

Spaces over tabs is one I don't understand - it's the most democratic way to format code (and it saves redundant bytes of code).

dlivingston · a year ago

For me, there is one single reason spaces are superior to tabs, and that reason is the most important: it ensures that you, the reader, are viewing my code exactly as I, the author, wrote it.

Consider this snippet, where tabs (⇥) equal four spaces (·) in my editor:

    if (someCondition) {
    ⇥   Foo foo = SomeClass::SomeFunctionCall(myParameter1,
    ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ··myParameter2,
    ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ⇥   ··myParameterN);
    }

Looks good.

Now you open this snippet in your IDE, where tabs are set to two spaces:

    if (someCondition) {
    ⇥ Foo foo = SomeClass::SomeFunctionCall(myParameter1,
    ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ··myParameter2,
    ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ··myParameterN);
    }

This matters more when the presentation is important, like I might write ASCII diagrams in my code, are have a large matrix with elements aligned, or similar.

bheadmaster · a year ago

The widespread adoption of code formatting tools has rendered the very argument somewhat pointless. I don't even know which identation most of my code uses - my editor autoformat takes care of that.

zamalek · a year ago

Momentum wins over superior solutions all the time. Just like SMTP, I feel like that tabs vs. spaces is a lost war.

jgalt212 · a year ago

In a world where tabs win, you then have a battle over how far to move the cursor which each tab.

> Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.

I routinely do this while drafting text and tables, and isn't this akin to what multi-cursor input achieves in many text editors?

johannes1234321 · a year ago

You are missing a subtle issue: let's say you want to type the numbers 1-3 above each other. If you send 1NL2NL3 the result will be a set of stairs

  1
   2
    3

Since after typing the digit the writing head is behind that character. You have to add a backspace to go back one column. 1NLBS2NLBS3 which then pushes the Cursor back to the right column.

Once you add more than one character per row its all over anyways.

speerer · a year ago

You are absolutely right that I didn't consider that. Thank you!

The reason is because I have just been typing into a table in a WYSIWYG editor which undoes the problem for me, returning to the beginning of each cell.So I tricked myself, because that is of course not what's being discussed in the article.

22c · a year ago

> Once you add more than one character per row its all over anyways.

Ah, but that can be solved by returning the carriage to the start of the line :)

Thus giving you the perfectly valid control sequence: LFCR!

extraduder_ire · a year ago

Wouldn't the backspace go back up to the previous line, since the only character between each of those numbers is the newline?

anotherhue · a year ago

Typing in tables, etc. Is a reasonable use case. Especially on a line feed printer or typewriter from which these movements descended.

mr_person · a year ago

True, if you assume a table here has 1 column. If you are printing multiple columns you print a full row at a time.

vasilvv · a year ago

> HTTP → RFC-2616 says in section 19.3 says "we recommend that applications ... recognize a single LF as a line terminator...." In other words it is perfectly OK for an HTTP client or server to accept CR-less HTTP requests or replies. It is not a violation of the HTTP standard to do so. Therefore they should.

The most up-to-date version of HTTP/1.1 spec is RFC 9112, which says:

> Although the line terminator for the start-line and fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR.

"MAY", of course, is different from "MUST" or "SHOULD", so I feel like the author's claim that implementations rejecting bare NLs are broken is at odds with the specification.

erik_seaberg · a year ago

This comes down to Postel's Law; they recommend liberally receiving what you conservatively cannot send. Also from RFC 2616 but not cited by the author:

> This flexibility regarding line breaks applies only to text media in the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries).

They aren't going to allow sending LF until at least one bump to a higher protocol version where every server MUST accept it.

weinzierl · a year ago

treve · a year ago

Alternatively don't introduce incompatibilities in decades old formats. Even if you can somehow convince everyone to accept \n, older software exists and this will only serve to further erode how well things work together. If there was a good way to go back to these format and fix things, the line ending would probably be pretty low on my list.

RagingCactus · a year ago

The article doesn't mention possible security implications. However, we already get lots of vulnerabilities exactly _because_ implementations disagree on delimiters. Examples for this are HTTP request smuggling[1, 2, 3] and SMTP smuggling[4].

As the references show, this is already a big source of vulnerabilities - trying to push for a change in standards would likely make the situation much worse. At the very least, old unmaintained servers will not change their behavior.

I think we should accept that this ship has sailed and leave existing protocols alone. Mandate LF and disallow CRLF in new protocols, that's fine, but I don't think we should open this particular Pandora's Box.

[1] Simple example that doesn't use CRLF/LF disagreement: https://portswigger.net/web-security/request-smuggling

[2] Complex example that uses CRLF/LF disagreement: https://portswigger.net/web-security/request-smuggling/advan... (see heading 'Request smuggling via CRLF injection')

[3] Random report on HackerOne I found where allowing LF created a vulnerability in NodeJS: https://hackerone.com/reports/2001873

[4] https://sec-consult.com/blog/detail/smtp-smuggling-spoofing-...

gwynforthewyn · a year ago

The author makes this assertion:

> CR by itself is occasionally useful for when you want to overwrite a line of text you have just written. LF, on the other hand, is completely useless. Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.

Just for curiosity, isn't that what Whisper does, the data file format for Graphite? I remember reading about it in https://aosabook.org/en/v1/graphite.html

I realise this isn't core to the author's point, and I mean the question to learn something rather than trying to be a pedant. I think it uses python byte offset semantics rather than CR, but for all I know the python implementation uses CR under the hood.

The article overall is a fun read! I love the idea of trying to identify simple pieces of legacy ideas and clear them out.

aftbit · a year ago

>Once all the software has been fixed (except for SMTP, which is unfixable, in a multitude of ways even beyond its choice of line endings), we can all stop worrying over "\r\n" and just send simple "\n" line terminators.

IMO the day where we don't program computers anymore will come before the day everyone gets to forget what /r means.

zzo38computer · a year ago

Most software I make, makes the CR optional (although some disallows it), and I do not usually use it in my own files, except DOS files; I still use CRLF in DOS files, as well as in existing files that already use CRLF for some reason.

There may be some situations where it is helpful to have them separate, although for most uses (including most internet protocols) it isn't.

But, I agree with the part about Windows that O_TEXT should not be used; you should use O_BINARY.