In my opinion, the fact that procfs is the only API for so many things is one of the biggest problems with Linux. BSDs have sysctl(), macOS has mach_* functions and, of course, Windows has a real API too.
Plain text interfaces lead to complicated, potentially insecure code (especially in C!), they're prone to race conditions and slow.
I wish it was possible to retrieve that information using real syscalls. I think it's a better approach than, for example, inventing a faster way to read procfs: https://lwn.net/Articles/813827/
Even if they insist on a file based interface (it is a UNIX, so fair enough), in modern times it would be nice if they used a "real" data format. Yeah, it's not like JSON parsers have never had bugs, but on average they'll be MUCH better than everyone and their mother hand rolling a C based bespoke parser. Obviously you'd need a new name to not break backwards compatibility.
> Yeah, it's not like JSON parsers have never had bugs, but on average they'll be MUCH better than everyone and their mother hand rolling a C based bespoke parser.
Currently, but if this idea started when Linux was become popular the real data format would have been XML. It might have been nice at the time, but today we would have laughed at it and said how outdated and silly it looks probably.
Indeed. Parsing files is a less robust way compared to calling some APIs or at least parsing some files with a schema (e.g. JSON or XML). For example, uptime(1) on Linux:
Many of the APIs on Windows are pretty trivial to interact with using PowerShell commandlets. Similarly, many SaaS based tools have CLIs to interact with their arbitrarily complex APIs.
You can still have easy abstractions while providing a way around them for times they don't work well (acquiring structured data)
MacOS’ KERN_PROCARGS2 sysctl is an exception to this, it is very unintuitive to parse and every single piece of code that tries to parse the results that I’ve found on the internet has been wrong, including those from Apple, Google, and Microsoft. I wound up making a library to do it (https://getargv.narzt.cam/) because apparently people need help.
Linux does have libproc, which is meant (IIUC) to mirror the BSD-style libproc. It wouldn't surprise me if it's just parsing the same files under the hood, however, and correspondingly has the same bugs. But then again, bugs in one place is potentially a better state of affairs than bugs in many places (?).
Having C functions isn't all that much better. You have replaced a crufty text format with crufty data structures full of paddings, unions, bitfields, VLAs, unaligned nested structs and other crazy stuff. Look at ioctls or cmsg. With C structs + 3rd-party kernel drivers you can even get UB because the driver returns data that is invalid under the struct definition (e.g. incorrect alignment, invalid bools).
eBPF recently added the ability to look through internal data structures through iterators [0] so instead of parsing text we can run a program that traverses through all the task_structs and pushes the exact information we want to userspace in the form the developer wants.
So, alongside other tradeoffs, it's more flexible than syscalls.
I wish /proc|/sys would just agree on serialization format and just serialize the data into some defined format instead of having a bunch of files that all need their own parser
While procfs has a lot of historical baggage, sysfs is rather specific about the layout and providing only a single value per file, as plain ASCII, rather than using anything complex that has to be parsed. Structure is implemented via the filesystem.
In return, the kernel side API for sysfs is also a lot cleaner and allows to more-or-less expose individual variables as tuning knobs for a driver.
Of course there are edge cases, and there are e.g. some binary interfaces as well (e.g. for providing direct register access, or implementing a firmware upload interface for a device).
ABI compat issues aside, I think that implementing "a standardized [structured] record format" as suggested in the comments here is a rather bad idea, going into exactly the wrong direction by adding complexity rather than reducing it, which would definitely cause even more parsing related issues in the long run.
>While procfs has a lot of historical baggage, sysfs is rather specific about the layout and providing only a single value per file, as plain ASCII, rather than using anything complex that has to be parsed. Structure is implemented via the filesystem.
I'd rather have structured file than to have open 30k files (for say conntrack)
Hell, just example from the article, /proc/<PID>/stat has 52 parameters. That would be 52 opens and reads with single value per file.
> ABI compat issues aside, I think that implementing "a standardized [structured] record format" as suggested in the comments here is a rather bad idea, going into exactly the wrong direction by adding complexity rather than reducing it, which would definitely cause even more parsing related issues in the long run.
It's literally the opposite. You have to implement it once on kernel side and once in userspace vs every special format that currently needs
I've been working on a library[1] that aims to have fairly complete support for the procfs filesystem, so that you can hide away these annoying parsing quirks. But for some casual usage of /proc/ where you only need one tiny bit of information, it's often better to just roll your own parser instead of bringing in a 3rd party library. It's these small one-off cases that would really benefit from a standardized serialization format like you propose.
It would be great if the kernel itself provided a header only definition of such a format, so you could focus on the data and not the parsing. Would also be able to integrate into their extensive testing infrastructure.
FWIW: sysfs tried to do this already. In general each node corresponds to one "thing", with a reasonably standard set of stringification schemes, and with a path that acts as a self-describing schema. Obviously in practice it ends up that every driver or subsystem ends up doing funny nonsense (e.g. uevent nodes have their own sub-schema with shell-style variables, etc...).
You can't really prevent that. People do funny nonsense in other self-describing data formats like JSON and XML all the time too. There's only so much you can do with a framework.
But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.
> You can't really prevent that. People do funny nonsense in other self-describing data formats like JSON and XML all the time too. There's only so much you can do with a framework.
Sure but you will get more of that if the convention is too simplistic. "one file per value" breaks really fast, just cat /proc/net/nf_conntrack or even just proc/<pid>/stats and see just how many values single entry (file/connection) has.
Doesn't need to be some ASN.1 monstrosity, could be simple conventions like "this is how key/value proc/sys file should look, this is how tabular file should look etc."
Make all escaping use same syntax, make every table separator be \t etc.
> But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.
While we're wishing in one hand, how is it that our programs still take an input of an array of strings, that get escaped and unescaped and split randomly by our shell scripts?
> This allows any sudoers user to obtain full root privileges
The way most sudoers files are set up, if you're in the wheel or sudo group, you're only a "sudo -i" from a root command prompt, so I'm not sure I see why this is a vulnerability. Can anyone elaborate?
The /proc/<pid>/* hierarchy has always been a bit of a mess to parse.
/proc/<pid>/maps is similarly frustrating: there's no clear distinction between "special" maps (like the stack) and a file that might just happen to be named `[stack]`. Similarly, the handling for a mapped region on a deleted file is simply to append " (deleted)"[1].
The system level fix is to create a structured record format. That could mean quoting all the records or maybe Linux should finally adopt a standardized format like JSON.
Fortunately `jc`[0] does parse `/proc/<pid>/stat` correctly. I, of course, originally implemented it the naive/incorrect way until a contributor fixed it. :)
$ cat /proc/2001/stat | jc --proc
{"pid":2001,"comm":"my program with\nsp","state":"S","ppid":1888,"pgrp":2001,"session":1888,"tty_nr":34816,"tpg_id":2001,"flags":4202496,"minflt":428,"cminflt":0,"majflt":0,"cmajflt":0,"utime":0,"stime":0,"cutime":0,"cstime":0,"priority":20,"nice":0,"num_threads":1,"itrealvalue":0,"starttime":75513,"vsize":115900416,"rss":297,"rsslim":18446744073709551615,"startcode":4194304,"endcode":5100612,"startstack":140737020052256,"kstkeep":140737020050904,"kstkeip":140096699233308,"signal":0,"blocked":65536,"sigignore":4,"sigcatch":65538,"wchan":18446744072034584486,"nswap":0,"cnswap":0,"exit_signal":17,"processor":0,"rt_priority":0,"policy":0,"delayacct_blkio_ticks":0,"guest_time":0,"cguest_time":0,"start_data":7200240,"end_data":7236240,"start_brk":35389440,"arg_start":140737020057179,"arg_end":140737020057223,"env_start":140737020057223,"env_end":140737020059606,"exit_code":0,"state_pretty":"Sleeping in an interruptible wait"}
Plain text interfaces lead to complicated, potentially insecure code (especially in C!), they're prone to race conditions and slow.
I wish it was possible to retrieve that information using real syscalls. I think it's a better approach than, for example, inventing a faster way to read procfs: https://lwn.net/Articles/813827/
Currently, but if this idea started when Linux was become popular the real data format would have been XML. It might have been nice at the time, but today we would have laughed at it and said how outdated and silly it looks probably.
[0] https://kellyjonbrazil.github.io/jc/docs/parsers/proc
% strace uptime 2> /tmp/strace && grep proc /tmp/strace
17:35:24 up 3 days, 7:47, 1 user, load average: 2.29, 1.85, 1.56
openat(AT_FDCWD, "/usr/lib/libprocps.so.8", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/proc/self/auxv", O_RDONLY) = 3
openat(AT_FDCWD, "/proc/sys/kernel/osrelease", O_RDONLY) = 3
openat(AT_FDCWD, "/proc/self/auxv", O_RDONLY) = 3
openat(AT_FDCWD, "/proc/uptime", O_RDONLY) = 3
openat(AT_FDCWD, "/proc/loadavg", O_RDONLY) = 4
You can still have easy abstractions while providing a way around them for times they don't work well (acquiring structured data)
it sounds fishy, but just because sysctl is a mess doesn't necessarily imply that structured kernel interfaces are a bad idea
eBPF recently added the ability to look through internal data structures through iterators [0] so instead of parsing text we can run a program that traverses through all the task_structs and pushes the exact information we want to userspace in the form the developer wants.
So, alongside other tradeoffs, it's more flexible than syscalls.
[0] https://developers.facebook.com/blog/post/2022/03/31/bpf-ite...
In return, the kernel side API for sysfs is also a lot cleaner and allows to more-or-less expose individual variables as tuning knobs for a driver.
Of course there are edge cases, and there are e.g. some binary interfaces as well (e.g. for providing direct register access, or implementing a firmware upload interface for a device).
ABI compat issues aside, I think that implementing "a standardized [structured] record format" as suggested in the comments here is a rather bad idea, going into exactly the wrong direction by adding complexity rather than reducing it, which would definitely cause even more parsing related issues in the long run.
I'd rather have structured file than to have open 30k files (for say conntrack)
Hell, just example from the article, /proc/<PID>/stat has 52 parameters. That would be 52 opens and reads with single value per file.
> ABI compat issues aside, I think that implementing "a standardized [structured] record format" as suggested in the comments here is a rather bad idea, going into exactly the wrong direction by adding complexity rather than reducing it, which would definitely cause even more parsing related issues in the long run.
It's literally the opposite. You have to implement it once on kernel side and once in userspace vs every special format that currently needs
[1] https://github.com/eminence/procfs
You can't really prevent that. People do funny nonsense in other self-describing data formats like JSON and XML all the time too. There's only so much you can do with a framework.
But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.
Sure but you will get more of that if the convention is too simplistic. "one file per value" breaks really fast, just cat /proc/net/nf_conntrack or even just proc/<pid>/stats and see just how many values single entry (file/connection) has.
Doesn't need to be some ASN.1 monstrosity, could be simple conventions like "this is how key/value proc/sys file should look, this is how tabular file should look etc."
Make all escaping use same syntax, make every table separator be \t etc.
> But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.
eh, just mount it in /proc2
All sensible ones allow you to just pass an array of parameters to command execution and not worry about spaces in them
> https://www.openwall.com/lists/oss-security/2017/05/30/16
https://www.openwall.com/lists/oss-security/2022/12/22/5
The way most sudoers files are set up, if you're in the wheel or sudo group, you're only a "sudo -i" from a root command prompt, so I'm not sure I see why this is a vulnerability. Can anyone elaborate?
/proc/<pid>/maps is similarly frustrating: there's no clear distinction between "special" maps (like the stack) and a file that might just happen to be named `[stack]`. Similarly, the handling for a mapped region on a deleted file is simply to append " (deleted)"[1].
[1]: https://github.com/woodruffw/procmaps.rs/blob/79bd474104e9b3...
Time to let go of the everything is a stream of unorganized characters
/s