Hopefully this isn't too hard of a rant, but file descriptors are... neither files nor descriptors. They literally don't "describe" anything about anything, and whatever they refer to doesn't need to be a file either. They're literally the exact opposite - 100% opaque integers, as opaque as anything in a program could possibly be, that could refer to pretty much any kernel object.
Why they were ever called file descriptors in the first place (esp. given the one thing they lack is any description), and why they couldn't just start being called "handles" (like on Windows) or "object IDs" or something else that at least makes some modicum of sense, is beyond me.
> file descriptors are... neither files nor descriptors
This is a pretty modern and userspace-centric view of file descriptors IMHO. The "description" part is in the file description table on the kernel side of the userspace/kernel boundary (see struct file[1]) for what are extremely obvious reasons.
Up to V7 UNIX (1973-1979[2,3]), the file description table could literally only reference a file on disk, UNIX domain/TCP/UDP sockets weren't introduced until 4.2BSD.
Your gentle rant makes sense from a modern point of view of course, but we need to keep in mind that the UNIX design is over 50 years old now. You could argue that the 4.2BSD people should have used different tables/names rather than overload the file description table but that ship has sailed and here we are.
> Why they were ever called file descriptors in the first place
Human laziness, no doubt. What you refer to as "file descriptor" is really a pointer to a file descriptor. It is likely that over time "file descriptor pointer" or "file descriptor handle" was shortened, and then ultimately accepted into the lexicon.
My understanding is that the descriptor is the pointer, the description is the pointee (i.e. the entry in the table). The distinction is sometimes quite relevant, for things like dup(2).
edit: there must actually be a double indirection from the fd to a table of pointers to file description, as the table is process private while the description is shared (and multiple fds can refer to the same description).
They describe the state of a stream. Which are flags and a current offset at a minimum. On linux you can examine /proc/<pid>/fdinfo/<fd> to see what it 'describes' precisely.
The file descriptor doesn't describe anything. It's like your social security number, it's merely a unique identifier for something else. Does your SSN describe you as a human? What do I know about you as a human after I hear your SSN? Absolutely nothing.
This whole article is terribly confusing. Take this paragraph for example:
Now, your process might depend on other system resources like input and output; as this event is also a process, it also has a file descriptor, which will be attached to your process in the file descriptor table.
What event? Are input and output an event? Why is this event its own process? Input and output are not a process are they?
Also, does a process have its own file descriptor table? That was never mentioned before and this reads like it is already known.
This sort of stuff goes on in my head throughout the entire article...
It's also still unclear to me what happens if multiple processes try to access the same file. Do file descriptors help to lock files during writing?
The writing style just sucks and it reads like a LinkedIn post with every sentence in its own paragraph. It tries to be approachable, but it uses blurry undefined terms and overly-simplistic analogies.
Starting with 101, 102 in the first example, for some reason.
When a process or I/O device makes a successful request, the kernel returns a file descriptor to that process
I/O device?
By default, three types of standard POSIX file descriptors exist in the file descriptor table
Types?
Apart from them, every other process has its own set of file descriptors, but few of them (except for some daemons) also utilize the above-mentioned file descriptors to handle input, output, and errors for the process.
What?
It makes an impression of a poor translation of a pretty low-effort article, tbh. You’re better off just reading the corresponding APUE section, which you must have read anyway.
* File descriptor (for as long as it's open)
* File descriptor number (can be replaced by close+open or dup\*; there are also special values like `AT_FDCWD`, `OPENMAX` (not necessarily equal to `FD_SETSIZE` or what `ulimit` limits you to)
* Open file description (unchanged across dup and fork ...)
* Files (e.g. separate open() calls that happen to refer to the same file, whether by the same name or not)
* File names
* file descriptors to special files, to things that are not files, to files outside of the current chroot, to files from a directory that has been mounted over, to files that have been moved or deleted
* what exactly does mmap hold on to?
* recvmsg cmsg can make file descriptors appear. Fortunately this is the most common use of cmsg, but remember you can get more than you request (but IIRC there's no clean way to get the number given, the API is underspecified)
* There's really nothing special about file descriptors 0, 1, and 2; they're just a strong convention that processes tend to have open at fork time. In practice, you can live without stdin and stdout, but stderr can be written to by all sorts of library functions.
> The lsof command is used to list the information about running processes in the system and can also be used to list the file descriptor under a specific PID.
> For that, use the “-d” flag to specify a range of file descriptors, with the “-p” option specifying the PID. To combine this selection, use the “-a” flag.
> $ lsof -a -d 0-2147483647 -p 11472
That works. A nicer way to do it:
htop > F3 (search by name to find the process) > Enter (to select) > L (open the list of file descriptors) > F4 (filter by resource name)
It lists all open files and other resources of the process.
Get the pid of the process you want to inspect, and while its running, execute `ls -lh /proc/<pid>/fd/`. It will list the open file descriptors for that process.
You're welcome! I found that after KDE replaced ksysguard with system monitor. The former could list all open files of the proecss, but the latter can't. So I was digging for some good tool and found this obscure htop feature :)
> For example, if you open a “example_file1.txt” file (which is nothing but a process)
I’m very confused by this use of “process”. This isn’t one? There’s a note further down talking about how closing it will make it available to other processes that doesn’t make sense either.
I think of file descriptors as a void* across address spaces
In C you often use void* for opaque handles.
But you can't have a user space pointer into the kernel, since it's in a different address space. So you instead have an integer that's unique within a particular process, and then a per-process table in the kernel that points to the real data structures.
DOS and Windows calls them file handles, but they serve the same function. DOS even has the same 0, 1, 2 for in, out, err. Of course they all inherited this design from Unix.
Of course they all inherited this design from Unix.
As far as Microsoft is concerned, Windows and MSDOS inherited this design from XENIX, which was Microsoft's clone of V7 UNIX. That XENIX was later sold to the Santa Cruz Operation, then later again SCO XENIX was renamed to SCO UNIX.
It's fascinating to wonder how different the computing world would be today if Microsoft had used their XENIX as the underlying base for Windows, instead of the way they did do it.
Linux would probably have never got off the ground, and remained a curiosity like the MINIX it was based upon initially. And Microsoft would completely own the UNIX world. Thank God they didn't.
no, not a clone, xenix was licensed unix v7, licensed from AT&T. They did not have a license to use the trademark unix, so they called their version xenix.
unix was already widely available when Linus started tinkering, he wanted to play with source, and BSD's were still tangled up in copyright. Had closed source Windows been based on unix, linux's open source hegemony would have toppled Windows too
Why they were ever called file descriptors in the first place (esp. given the one thing they lack is any description), and why they couldn't just start being called "handles" (like on Windows) or "object IDs" or something else that at least makes some modicum of sense, is beyond me.
This is a pretty modern and userspace-centric view of file descriptors IMHO. The "description" part is in the file description table on the kernel side of the userspace/kernel boundary (see struct file[1]) for what are extremely obvious reasons.
Up to V7 UNIX (1973-1979[2,3]), the file description table could literally only reference a file on disk, UNIX domain/TCP/UDP sockets weren't introduced until 4.2BSD.
Your gentle rant makes sense from a modern point of view of course, but we need to keep in mind that the UNIX design is over 50 years old now. You could argue that the 4.2BSD people should have used different tables/names rather than overload the file description table but that ship has sailed and here we are.
[1]https://chenshuo.com/notes/kernel/file-descriptor-table/
[2]https://www.tuhs.org/cgi-bin/utree.pl?file=V4/nsys/file.h
[3]https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/include/sy...
Human laziness, no doubt. What you refer to as "file descriptor" is really a pointer to a file descriptor. It is likely that over time "file descriptor pointer" or "file descriptor handle" was shortened, and then ultimately accepted into the lexicon.
In what sense is (say) a signalfd a "file"? How about a pidfd?
edit: there must actually be a double indirection from the fd to a table of pointers to file description, as the table is process private while the description is shared (and multiple fds can refer to the same description).
Now, your process might depend on other system resources like input and output; as this event is also a process, it also has a file descriptor, which will be attached to your process in the file descriptor table.
What event? Are input and output an event? Why is this event its own process? Input and output are not a process are they?
Also, does a process have its own file descriptor table? That was never mentioned before and this reads like it is already known.
This sort of stuff goes on in my head throughout the entire article...
It's also still unclear to me what happens if multiple processes try to access the same file. Do file descriptors help to lock files during writing?
Starting with 101, 102 in the first example, for some reason.
When a process or I/O device makes a successful request, the kernel returns a file descriptor to that process
I/O device?
By default, three types of standard POSIX file descriptors exist in the file descriptor table
Types?
Apart from them, every other process has its own set of file descriptors, but few of them (except for some daemons) also utilize the above-mentioned file descriptors to handle input, output, and errors for the process.
What?
It makes an impression of a poor translation of a pretty low-effort article, tbh. You’re better off just reading the corresponding APUE section, which you must have read anyway.
* All of the following exist and are distinct:
* file descriptors to special files, to things that are not files, to files outside of the current chroot, to files from a directory that has been mounted over, to files that have been moved or deleted* what exactly does mmap hold on to?
* recvmsg cmsg can make file descriptors appear. Fortunately this is the most common use of cmsg, but remember you can get more than you request (but IIRC there's no clean way to get the number given, the API is underspecified)
* There's really nothing special about file descriptors 0, 1, and 2; they're just a strong convention that processes tend to have open at fork time. In practice, you can live without stdin and stdout, but stderr can be written to by all sorts of library functions.
* seriously, use `O_CLOEXEC` by default people!
* representing FD ownership can be tricky
> For that, use the “-d” flag to specify a range of file descriptors, with the “-p” option specifying the PID. To combine this selection, use the “-a” flag.
> $ lsof -a -d 0-2147483647 -p 11472
That works. A nicer way to do it:
htop > F3 (search by name to find the process) > Enter (to select) > L (open the list of file descriptors) > F4 (filter by resource name)
It lists all open files and other resources of the process.
Get the pid of the process you want to inspect, and while its running, execute `ls -lh /proc/<pid>/fd/`. It will list the open file descriptors for that process.
It's documented in man htop btw.
I’m very confused by this use of “process”. This isn’t one? There’s a note further down talking about how closing it will make it available to other processes that doesn’t make sense either.
In C you often use void* for opaque handles.
But you can't have a user space pointer into the kernel, since it's in a different address space. So you instead have an integer that's unique within a particular process, and then a per-process table in the kernel that points to the real data structures.
As far as Microsoft is concerned, Windows and MSDOS inherited this design from XENIX, which was Microsoft's clone of V7 UNIX. That XENIX was later sold to the Santa Cruz Operation, then later again SCO XENIX was renamed to SCO UNIX.
It's fascinating to wonder how different the computing world would be today if Microsoft had used their XENIX as the underlying base for Windows, instead of the way they did do it.
Linux would probably have never got off the ground, and remained a curiosity like the MINIX it was based upon initially. And Microsoft would completely own the UNIX world. Thank God they didn't.
no, not a clone, xenix was licensed unix v7, licensed from AT&T. They did not have a license to use the trademark unix, so they called their version xenix.
unix was already widely available when Linus started tinkering, he wanted to play with source, and BSD's were still tangled up in copyright. Had closed source Windows been based on unix, linux's open source hegemony would have toppled Windows too