> HTTP → RFC-2616 says in section 19.3 says "we recommend that applications ... recognize a single LF as a line terminator...." In other words it is perfectly OK for an HTTP client or server to accept CR-less HTTP requests or replies. It is not a violation of the HTTP standard to do so. Therefore they should.
The most up-to-date version of HTTP/1.1 spec is RFC 9112, which says:
> Although the line terminator for the start-line and fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR.
"MAY", of course, is different from "MUST" or "SHOULD", so I feel like the author's claim that implementations rejecting bare NLs are broken is at odds with the specification.
This comes down to Postel's Law; they recommend liberally receiving what you conservatively cannot send. Also from RFC 2616 but not cited by the author:
> This flexibility regarding line breaks applies only to text media in the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries).
They aren't going to allow sending LF until at least one bump to a higher protocol version where every server MUST accept it.
LF won over CRLF, little endian over big, UTF-8 over UTF-16, spaces over tabs - the world converges but at the same time new frontiers arise and new battles will be fought.
For me, there is one single reason spaces are superior to tabs, and that reason is the most important: it ensures that you, the reader, are viewing my code exactly as I, the author, wrote it.
Consider this snippet, where tabs (⇥) equal four spaces (·) in my editor:
This matters more when the presentation is important, like I might write ASCII diagrams in my code, are have a large matrix with elements aligned, or similar.
The widespread adoption of code formatting tools has rendered the very argument somewhat pointless. I don't even know which identation most of my code uses - my editor autoformat takes care of that.
Alternatively don't introduce incompatibilities in decades old formats. Even if you can somehow convince everyone to accept \n, older software exists and this will only serve to further erode how well things work together. If there was a good way to go back to these format and fix things, the line ending would probably be pretty low on my list.
> Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.
I routinely do this while drafting text and tables, and isn't this akin to what multi-cursor input achieves in many text editors?
You are missing a subtle issue: let's say you want to type the numbers 1-3 above each other. If you send 1NL2NL3 the result will be a set of stairs
1
2
3
Since after typing the digit the writing head is behind that character. You have to add a backspace to go back one column. 1NLBS2NLBS3 which then pushes the Cursor back to the right column.
Once you add more than one character per row its all over anyways.
You are absolutely right that I didn't consider that. Thank you!
The reason is because I have just been typing into a table in a WYSIWYG editor which undoes the problem for me, returning to the beginning of each cell.So I tricked myself, because that is of course not what's being discussed in the article.
The article doesn't mention possible security implications. However, we already get lots of vulnerabilities exactly _because_ implementations disagree on delimiters. Examples for this are HTTP request smuggling[1, 2, 3] and SMTP smuggling[4].
As the references show, this is already a big source of vulnerabilities - trying to push for a change in standards would likely make the situation much worse. At the very least, old unmaintained servers will not change their behavior.
I think we should accept that this ship has sailed and leave existing protocols alone. Mandate LF and disallow CRLF in new protocols, that's fine, but I don't think we should open this particular Pandora's Box.
> CR by itself is occasionally useful for when you want to overwrite a line of text you have just written. LF, on the other hand, is completely useless. Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.
I realise this isn't core to the author's point, and I mean the question to learn something rather than trying to be a pedant. I think it uses python byte offset semantics rather than CR, but for all I know the python implementation uses CR under the hood.
The article overall is a fun read! I love the idea of trying to identify simple pieces of legacy ideas and clear them out.
>Once all the software has been fixed (except for SMTP, which is unfixable, in a multitude of ways even beyond its choice of line endings), we can all stop worrying over "\r\n" and just send simple "\n" line terminators.
IMO the day where we don't program computers anymore will come before the day everyone gets to forget what /r means.
Most software I make, makes the CR optional (although some disallows it), and I do not usually use it in my own files, except DOS files; I still use CRLF in DOS files, as well as in existing files that already use CRLF for some reason.
There may be some situations where it is helpful to have them separate, although for most uses (including most internet protocols) it isn't.
But, I agree with the part about Windows that O_TEXT should not be used; you should use O_BINARY.
The most up-to-date version of HTTP/1.1 spec is RFC 9112, which says:
> Although the line terminator for the start-line and fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR.
"MAY", of course, is different from "MUST" or "SHOULD", so I feel like the author's claim that implementations rejecting bare NLs are broken is at odds with the specification.
> This flexibility regarding line breaks applies only to text media in the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries).
They aren't going to allow sending LF until at least one bump to a higher protocol version where every server MUST accept it.
Consider this snippet, where tabs (⇥) equal four spaces (·) in my editor:
Looks good.Now you open this snippet in your IDE, where tabs are set to two spaces:
This matters more when the presentation is important, like I might write ASCII diagrams in my code, are have a large matrix with elements aligned, or similar.I routinely do this while drafting text and tables, and isn't this akin to what multi-cursor input achieves in many text editors?
Once you add more than one character per row its all over anyways.
The reason is because I have just been typing into a table in a WYSIWYG editor which undoes the problem for me, returning to the beginning of each cell.So I tricked myself, because that is of course not what's being discussed in the article.
Ah, but that can be solved by returning the carriage to the start of the line :)
Thus giving you the perfectly valid control sequence: LFCR!
As the references show, this is already a big source of vulnerabilities - trying to push for a change in standards would likely make the situation much worse. At the very least, old unmaintained servers will not change their behavior.
I think we should accept that this ship has sailed and leave existing protocols alone. Mandate LF and disallow CRLF in new protocols, that's fine, but I don't think we should open this particular Pandora's Box.
[1] Simple example that doesn't use CRLF/LF disagreement: https://portswigger.net/web-security/request-smuggling
[2] Complex example that uses CRLF/LF disagreement: https://portswigger.net/web-security/request-smuggling/advan... (see heading 'Request smuggling via CRLF injection')
[3] Random report on HackerOne I found where allowing LF created a vulnerability in NodeJS: https://hackerone.com/reports/2001873
[4] https://sec-consult.com/blog/detail/smtp-smuggling-spoofing-...
> CR by itself is occasionally useful for when you want to overwrite a line of text you have just written. LF, on the other hand, is completely useless. Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.
Just for curiosity, isn't that what Whisper does, the data file format for Graphite? I remember reading about it in https://aosabook.org/en/v1/graphite.html
I realise this isn't core to the author's point, and I mean the question to learn something rather than trying to be a pedant. I think it uses python byte offset semantics rather than CR, but for all I know the python implementation uses CR under the hood.
The article overall is a fun read! I love the idea of trying to identify simple pieces of legacy ideas and clear them out.
IMO the day where we don't program computers anymore will come before the day everyone gets to forget what /r means.
There may be some situations where it is helpful to have them separate, although for most uses (including most internet protocols) it isn't.
But, I agree with the part about Windows that O_TEXT should not be used; you should use O_BINARY.