FreeBSD's implementation of FILE is a nice object-oriented structure which anyone could derive from. Super-easy to make FILE point to a memory buffer or some other user code. I used that a bunch a long time ago.
Obviously making FILE opaque completely breaks every program that used this feature, so no surprise it was reverted.
In the FreeBSD case, far from breaking "every program" it breaks very little at all. In fact it broke 1 thing at the time. Unfortunately, that 1 thing happened to be sysinstall(8).
stdin, stdout, and stderr were already pointers rather than array element addresses, and the external symbol references to __stdinp, __stdoutp, and __stderrp did not change; compiled code using the old macros continued to work as the actual structure layout was not changed; compiled code using FILE* would have continued to work as the pointer implementation didn't change; compiled C++ code with C++ function parameter overloading would have continued to link as the underlying struct type did not change; source code using the ferror_unlocked() and suchlike function-like macros would have not needed changing as there were already ferror_unlocked() and suchlike functions and those remained.
Looking at things like https://reviews.freebsd.org/D4488 from 2015 there was definitely stuff in the ports tree that would have broken back in 2008. But that won't break now should this change be made again, and that's not base.
What actually broke was libftpio, a library that was in base up until 2011, and definitely won't break now, nearly 14 years after being removed for being orphaned after sysinstall(8) itself has gone away.
If you've ever done this to a C library, the first thing that you'll look at when someone else does it is not the FILE type, but how stdin, stdout, and stderr have changed.
The big breaking change is usually the historical implementation of the standard streams as addresses of elements of an array rather than as named pointers. (Plauger's example implementation had them as elements 0, 1, and 2 of a _Files[] array, for example.) It's possible to retain binary compatibility with unrecompiled code that uses the old getc/putc/feof/ferror/fclearerr/&c. macros by preserving structure layouts, but changing stdin, stdout, and stderr can make things not link.
You have this backwards. Source compatibility is not broken by such a change, with zero source changes required in order to accommodate it for almost all applications; and far from OpenBSD "never, ever, ever" maintaining binary compatibility Masahiko Yasuoka and Philip Guenther took deliberate steps in this very case to ensure as much binary compatibility as they could, retaining structure layouts as-is and retaining several symbols for library internals that macros used to reference, even those that won't be used by freshly recompiled applications.
The warning, and the bumping of several shared library major version numbers, is most definitely about the standard streams breaking binary, not source as you have it, compatibility. Any newly compiled binary that is using the C standard streams won't run on old shared libraries because of the new symbol references for __stdin, __stdout, and __stderr.
The MH and nmh mail clients used to directly look into FILE internals. If you look for LINUX_STDIO in this old version of the relevant file you can see the kind of ugliness that resulted:
It's basically searching an email file to find the contents of either a given header or the mail body. These days there is no need to go under the hood of libc for this (and this code got ripped out over a decade ago), but back when the mail client was running on elderly VAXen this ate up significant time. Sneaking in and reading directly from the internal stdio buffer lets you avoid copying all the data the way an fread would. The same function also used to have a bit of inline vax assembly for string searching...
The only reason this "works" is that traditionally the FILE struct is declared in a public header so libc can have some of its own functions implemented as macros for speed, and that there was not (when this hack was originally put in in the 1980s) yet much divergence in libc implementations.
Yes, it's not a good idea to do this. There are more questionable pieces in gnulib, like closing stdin/stdout/stderr (because fflush and fsync is deemed too slow, and regular close reports some errors on NFS on some systems that would otherwise go unreported).
Yes, that part of Gnulib has caused some problems previously. It is mostly used to implement <stdio_ext.h> functions on non-glibc systems. However, it is also needed for some buggy implementations of ftello, fseeko, and fflush.
> Yes, it's not a good idea to do this. There are more questionable pieces in gnulib, like closing stdin/stdout/stderr (because fflush and fsync is deemed too slow, and regular close reports some errors on NFS on some systems that would otherwise go unreported).
Hyrum's law strikes again. People cast dl_info and poke at internal bits all the time too.
glibc and others should be using kernel-style compiler-driven struct layout randomization to fight it.
Hyrum's Law applies: the API of any software component is the entire exposed surface, not just what you've documented. Hence, if you have FILE well-defined somewhere in a programmer-accessible header, somebody somewhere can and will poke at the internal bits in order to achieve some hack or optimization.
OTOH, when coding, I consider FILE to be effectively opaque in the sense that it probably is not portable, and that the implementers might change it at any time.
The OpenBSD answer to this is: fuck them they should've known better. The few pieces of software that do this and have an active port maintainer will get patched. The rest will stay broken until somebody cares to deal with the change.
Or functionality. Happens to me all the time I have some Java class that's marked Final, so instead of just extending the class and moving on, I have to copy/paste the entire class wholesale to accomplish my goal.
Personally I hate "nanny" languages that block you from accessing things. It's my computer, and my code, and my compiler. Please don't do things "for my own good", I can decide that for myself.
(And yes, I am aware of the argument that this lets the original programmer change the internals, in practice it's not such a big problem. Or the cure is worse than the problem - for example my copy/paste example.)
Another example is a private constant. Instead of allowing me to reference it, I have to copy it. How is that any better? If the programmer has to change how the constant works then they can do so, and at that point my code will break and I'll .... copy the constant. But until then I can just use the constant.
The standard doesn't specify any serviceable parts, and I don't think there are any internals of the struct defined in musl libc on Linux (glibc may be a different story). However, on OpenBSD, it did seem to have some user-visible bits:
If you expose it, someone will probably sooner or later use it, but probably not in any sane / portable code. On the face of it, it doesn't seem like a consequential change, but maybe they're mopping up after some vulnerability in that one weird package that did touch this.
Historically some FILE designs exposed the structure somewhere so that some of the f* methods could be implemented as macros or inline functions (e.g., `fileno()`).
I've seen old code do this over the years. When you consider for example that snprintf() didn't used to be standardized until the late 1990s. People would mock up a fake FILE* and use fprintf.
One issue – ISO C defines setvbuf (configure buffer mode and size of a stdio stream) but not getvbuf (get current buffer mode and size). On many platforms with non-opaque FILE, you can implement your own getvbuf by peering inside its undocumented fields.
I guess part of why it is not in the standard is that it is rarely requested functionality, but there are rare use cases where it may have value. And I think it is an unfortunate lack of orthogonality to have a setter but no corresponding getter.
stdio_ext.h offers some functionality like a "getvbuf", but not quite – e.g. __fbufsize tells you a stream's buffer size, and __flbf whether it is line-buffered – but it isn't clear how to distinguish fully buffered and unbuffered streams. And stdio_ext.h has never been standardised, it is an extension invented on Solaris and copied by Linux (and a few other platforms too, e.g. IBM z/OS).
In addition to "some code frobs internals", non-opaque FILE also allows for compatibility with code which puts FILE into a structure, since an opaque FILE doesn't have a size.
But code outside the standard library can’t do that, can it? fopen returns a pointer to a FILE, and you can’t know how a struct FILE should be copied.
You can’t just memcpy the bits and then mix calls to fread using pointers to the old and the new FILE struct, for example. I think the standard library need not even support calls using a pointer to a FILE struct it didn’t create.
Are there any POSIX or ISO guarantees on "FILE"? I think it's safe to assume that it isn't an incomplete type, but all functions that use it operate on pointers anyway. Storing a copy of a "FILE" object might result in each copy pointing to the same underlying file handle but having different internal state.
In SunOS 4.x `FILE` was not opaque, and `int fileno(FILE *)` was a macro, not a funciton, and the field of the struct that held the fd number was a `char`. Yeah, that sucked for ages, especially since it bled into the Solaris 2.x 32-bit ABI.
It was a then-important optimization to do the most common operations with macros since calling a function for every getc()/putc() would have slowed I/O down too much.
That's why there is also fgetc()/fputc() -- they're the same as getc()/putc() but they're always defined as functions so calling them generated less code at the callsite at the expense of always requiring a function call. A classic speed-vs-space tradeoff.
But, yeah, it was a mistake that it originally used a "char" to store the file descriptor. Back then it was typical to limit processes to 20 open files ( https://github.com/dspinellis/unix-history-repo/blob/Researc... ) so a "char" I'm sure felt like plenty.
In general, it is a bad practice. However, it can be useful for some low-level libraries. For example, https://github.com/fmtlib/fmt provides a type-safe replacement for `printf` that can write directly to the FILE buffer providing comparable or better performance to native stdio.
I don't know if I agree, but this is one shining example of what makes *bsd's great, not being afraid of change. Linux should take note. So much of Windows' headaches stem from not wanting to break things, and needing to support old client code.
>FILE Encapsulation: In previous versions, the FILE type was completely defined in <stdio.h>, so it was possible for user code to reach into a FILE and muck with its internals. We have refactored the stdio library to improve encapsulation of the library implementation details. As part of this, FILE as defined in <stdio.h> is now an opaque type and its members are inaccessible from outside of the CRT itself.
I am not disagreeing, but it is a design fault if you continually end up having to choose between breaking or improving things.
When you have major platform updates like windows NT rewrite back in 2000, windows 8 and now windows 11, they're opportunities to shed legacy things. The choice should have been to keep supporting a long-term-stable version of windows for security fixes (like XP or Win7) and get rid of tech designed to support old software.
Their problem now is they want everyone to be on win10 and then win11 and then whatever they come up with, or else.
You can carry legacy dependencies with you to new major versions or you can support old versions for security fixes long-term.
There isn't really much of "Linux" here - this code is in libc, so glibc, but that was built from portability, it isn't very Linux specific. Linux doesn't have an all encpmpassing community for userspace.
I see. I thought OpenBSD maintained their own downstream fork of glibc or something since the title/link are for their site/lists.
It may not be all encompassing,but I was referring to GNU/Linux. you can swap out bits and pieces, but what mainstream distros include by default, that's what I meant.
Ugh, no, it should not. As a user i prefer my existing programs to keep working whenever i update my OS and as a developer i prefer to work on new code than playing nanny with existing previously working code (working code here means the code did the task it was supposed to do) because some dependency broke itself.
it's already not working that way in Linux, it's just that there is no rhyme or reason to it. Even on stable Debian I have to have pipx/pyenv/venv for python and juggle multiple versions of rust and go because of mismatches between software and distro versions.
So many words in the commit message and the announcement article, yet not a single mention of the rationale? I have a bad feeling about their practice.
[1]: https://github.com/freebsd/freebsd-src/commit/c17bf9a9a5a3b5...
[2]: https://github.com/freebsd/freebsd-src/commit/19e03ca8038019...
[3]: https://github.com/freebsd/freebsd-src/blob/main/include/std...
Obviously making FILE opaque completely breaks every program that used this feature, so no surprise it was reverted.
stdin, stdout, and stderr were already pointers rather than array element addresses, and the external symbol references to __stdinp, __stdoutp, and __stderrp did not change; compiled code using the old macros continued to work as the actual structure layout was not changed; compiled code using FILE* would have continued to work as the pointer implementation didn't change; compiled C++ code with C++ function parameter overloading would have continued to link as the underlying struct type did not change; source code using the ferror_unlocked() and suchlike function-like macros would have not needed changing as there were already ferror_unlocked() and suchlike functions and those remained.
Looking at things like https://reviews.freebsd.org/D4488 from 2015 there was definitely stuff in the ports tree that would have broken back in 2008. But that won't break now should this change be made again, and that's not base.
What actually broke was libftpio, a library that was in base up until 2011, and definitely won't break now, nearly 14 years after being removed for being orphaned after sysinstall(8) itself has gone away.
* https://cgit.freebsd.org/src/commit/lib/libftpio?id=430f2c87...
The big breaking change is usually the historical implementation of the standard streams as addresses of elements of an array rather than as named pointers. (Plauger's example implementation had them as elements 0, 1, and 2 of a _Files[] array, for example.) It's possible to retain binary compatibility with unrecompiled code that uses the old getc/putc/feof/ferror/fclearerr/&c. macros by preserving structure layouts, but changing stdin, stdout, and stderr can make things not link.
And indeed that has happened here.
The warning, and the bumping of several shared library major version numbers, is most definitely about the standard streams breaking binary, not source as you have it, compatibility. Any newly compiled binary that is using the C standard streams won't run on old shared libraries because of the new symbol references for __stdin, __stdout, and __stderr.
Does anyone know why this change was done? Security reasons? Preparing for future changes?
https://cgit.git.savannah.gnu.org/cgit/nmh.git/tree/sbr/m_ge...
It's basically searching an email file to find the contents of either a given header or the mail body. These days there is no need to go under the hood of libc for this (and this code got ripped out over a decade ago), but back when the mail client was running on elderly VAXen this ate up significant time. Sneaking in and reading directly from the internal stdio buffer lets you avoid copying all the data the way an fread would. The same function also used to have a bit of inline vax assembly for string searching...
The only reason this "works" is that traditionally the FILE struct is declared in a public header so libc can have some of its own functions implemented as macros for speed, and that there was not (when this hack was originally put in in the 1980s) yet much divergence in libc implementations.
https://cgit.git.savannah.gnu.org/cgit/gnulib.git/tree/lib/s...
Yes, it's not a good idea to do this. There are more questionable pieces in gnulib, like closing stdin/stdout/stderr (because fflush and fsync is deemed too slow, and regular close reports some errors on NFS on some systems that would otherwise go unreported).
P.S. Hi Florian :)
Hyrum's law strikes again. People cast dl_info and poke at internal bits all the time too.
glibc and others should be using kernel-style compiler-driven struct layout randomization to fight it.
OTOH, when coding, I consider FILE to be effectively opaque in the sense that it probably is not portable, and that the implementers might change it at any time.
I am reminded of this fine article by Raymond Chen, which covers a similar situation on Windows way back when: https://devblogs.microsoft.com/oldnewthing/20031015-00/?p=42...
Or functionality. Happens to me all the time I have some Java class that's marked Final, so instead of just extending the class and moving on, I have to copy/paste the entire class wholesale to accomplish my goal.
Personally I hate "nanny" languages that block you from accessing things. It's my computer, and my code, and my compiler. Please don't do things "for my own good", I can decide that for myself.
(And yes, I am aware of the argument that this lets the original programmer change the internals, in practice it's not such a big problem. Or the cure is worse than the problem - for example my copy/paste example.)
Another example is a private constant. Instead of allowing me to reference it, I have to copy it. How is that any better? If the programmer has to change how the constant works then they can do so, and at that point my code will break and I'll .... copy the constant. But until then I can just use the constant.
https://github.com/openbsd/src/commit/b7f6c2eb760a2da367dd51...
If you expose it, someone will probably sooner or later use it, but probably not in any sane / portable code. On the face of it, it doesn't seem like a consequential change, but maybe they're mopping up after some vulnerability in that one weird package that did touch this.
I guess part of why it is not in the standard is that it is rarely requested functionality, but there are rare use cases where it may have value. And I think it is an unfortunate lack of orthogonality to have a setter but no corresponding getter.
stdio_ext.h offers some functionality like a "getvbuf", but not quite – e.g. __fbufsize tells you a stream's buffer size, and __flbf whether it is line-buffered – but it isn't clear how to distinguish fully buffered and unbuffered streams. And stdio_ext.h has never been standardised, it is an extension invented on Solaris and copied by Linux (and a few other platforms too, e.g. IBM z/OS).
So it wouldn't surprise me, that a few folks would do some tricks with FILE internals.
You can’t just memcpy the bits and then mix calls to fread using pointers to the old and the new FILE struct, for example. I think the standard library need not even support calls using a pointer to a FILE struct it didn’t create.
You certainly shouldn't, but sadly this is something which people do.
It was a then-important optimization to do the most common operations with macros since calling a function for every getc()/putc() would have slowed I/O down too much.
That's why there is also fgetc()/fputc() -- they're the same as getc()/putc() but they're always defined as functions so calling them generated less code at the callsite at the expense of always requiring a function call. A classic speed-vs-space tradeoff.
But, yeah, it was a mistake that it originally used a "char" to store the file descriptor. Back then it was typical to limit processes to 20 open files ( https://github.com/dspinellis/unix-history-repo/blob/Researc... ) so a "char" I'm sure felt like plenty.
I'm curious to take a closer look at fmtlib/fmt, which APIs treat FILE as non-opaque?
Edit: ah, found some of the magic, I think: https://github.com/fmtlib/fmt/blob/35dcc58263d6b55419a5932bd...
I'm curious how much speedup is gained from this.
>FILE Encapsulation: In previous versions, the FILE type was completely defined in <stdio.h>, so it was possible for user code to reach into a FILE and muck with its internals. We have refactored the stdio library to improve encapsulation of the library implementation details. As part of this, FILE as defined in <stdio.h> is now an opaque type and its members are inaccessible from outside of the CRT itself.
https://devblogs.microsoft.com/cppblog/c-runtime-crt-feature...
Quite acceptable for not having the headache for things breaking.
When you have major platform updates like windows NT rewrite back in 2000, windows 8 and now windows 11, they're opportunities to shed legacy things. The choice should have been to keep supporting a long-term-stable version of windows for security fixes (like XP or Win7) and get rid of tech designed to support old software.
Their problem now is they want everyone to be on win10 and then win11 and then whatever they come up with, or else.
You can carry legacy dependencies with you to new major versions or you can support old versions for security fixes long-term.
It may not be all encompassing,but I was referring to GNU/Linux. you can swap out bits and pieces, but what mainstream distros include by default, that's what I meant.
Ugh, no, it should not. As a user i prefer my existing programs to keep working whenever i update my OS and as a developer i prefer to work on new code than playing nanny with existing previously working code (working code here means the code did the task it was supposed to do) because some dependency broke itself.