I'm getting pretty tired of this false advertisement. This is not a C compiler. It doesn't have many crucial features required to compile most larger C programs. I do have to say it's impressive what they have squeezed into 512 byte.
This is technically true! What is posted here is not a C compiler. It is an implementation of a subset of C.
I'd prefer that the title honestly mentions that like: A compiler for a subset of C that fits in the 512 byte boot sector.
It is still a remarkable feat. But honestly, when I read the original title I was in complete disbelief that someone could implement a whole C compiler in 512 bytes.
But with the new context that it is a subset of C (not the whole C), the initial great surprise is gone. It is still very impressive though.
Or even, interpreter. It compiles and executes on the fly, in ram, function by function. It doesn't even compile the whole input but just a bit and immediately executes that bit before moving to the next bit, and doesn't save the compilation result anywhere. To me, that's an interpreter.
So it's a c subset interpreter.
And a very cool thing. This is not a denegration or critique at all, just terminology.
I think it's perfectly fine for a bootstrapper to be a drastic subset. They all already are drastically limited in countless other ways anyways like not knowing how to use any of the crazy hardware, networking, etc. A forth bootloader is a full turing language that can eventually do anything, but it itself can do almost nothing initially besides use bios-provided features and start interpreting code which then provides more functionality.
To be fair, the first sentence states that it supports a subset of C.
>SectorC is a C compiler written in x86-16 assembly that fits within the 512 byte boot sector of an x86 machine. It supports a subset of C that is large enough to write real and interesting programs.
The post title could include this, but perhaps it's a little verbose.
In any case, agreed it's impressive to fit it in 512 bytes!
i'm sorry to nitpick but it's the second sentence which mentions it only supports a subset, not the first sentence. and the first sentence calls it a "C compiler" without qualification
Yes, it's a great topic but since it had significant attention in the last year, it counts as a dupe for now. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.
When we talk about these sector$LANG implementations, I am guessing we are talking about the boot sector that BIOS recognizes, right?
Does the 512 byte limit for a boot sector exist in UEFI too? I don't know much about UEFI so if someone could educate me about how the boot sector and its size limit differs in UEFI, I'd love to know.
No, UEFI loads PE executables from a special partition called the EFI System Partition or ESP. There's no real size restriction there as far as I know.
Before the ESP is accessed, there is no standardized way to customize the boot process. You could put these kinds of sectorX toys into the firmware directly, which would come with more constraints, but it would be vendor-specific.
There is a platform-independent VM running a special EFI byte code that is part of the EFI specification, which allows you to extend the UEFI system with things like additional drivers, but those are also loaded from the ESP.
Thanks for the answer! I've got some more questions now. Sorry, but if anyone is willing to take a stab at these questions, it'd be helpful to me.
1. IIUC PE executables are Windows executables. So a Linux system that targets UEFI ends up writing a PE executable to the EFI System Partition?
2. I know that some UEFIs (or is it all?) support BIOS boot sector as backward compatibility feature? How does that work? If I write a "hello world" program in pure machine code in the 1st sector of the boot disk, would UEFI read that and execute that? How would it even know whether what's in the first sector is valid code or garbage? By checking the magic 0x55 0xaa at the end of the boot sector?
A classical PC master boot record does not actually have 512 byte for code as it also contains the partition table and a signature, you have 446 bytes for code. Not sure what exactly the BIOS validates, you might be able to get away with an invalid partition table. In general there is not really any limit unless you want to be compatible with something existing, you can define whatever disk layout you like. At worst you will have to load additional sectors yourself because the BIOS has no clue where you put them. I no longer remember what a floppy boot sector looks like, how much room you have there.
It's been a long time since I've done ASM but do I understand it right that this implementation compiles each function and then executes it immediately? Or does it really compile the whole source code and then execute the binary generated?
And where is the compiled binary saved? Is it kept temporarily in memory itself for immediate execution? Or is the compiled binary saved back to the disk?
If someone could point me to the right sections of the code that answer these questions, it'd be of great help! Thanks!
Looks like a recursive-descent parser that emits instructions in memory as it parses. Then it executes them immediately (sectorc.s):
;; done compiling, execute the binary
execute:
push es ; push the codegen segment
push word [bx] ; push the offset to "_start()"
push 0x4000 ; load new segment for variable data
pop ds
retf ; jump into it via "retf"
This is technically true! What is posted here is not a C compiler. It is an implementation of a subset of C.
I'd prefer that the title honestly mentions that like: A compiler for a subset of C that fits in the 512 byte boot sector.
It is still a remarkable feat. But honestly, when I read the original title I was in complete disbelief that someone could implement a whole C compiler in 512 bytes.
But with the new context that it is a subset of C (not the whole C), the initial great surprise is gone. It is still very impressive though.
So it's a c subset interpreter.
And a very cool thing. This is not a denegration or critique at all, just terminology.
I think it's perfectly fine for a bootstrapper to be a drastic subset. They all already are drastically limited in countless other ways anyways like not knowing how to use any of the crazy hardware, networking, etc. A forth bootloader is a full turing language that can eventually do anything, but it itself can do almost nothing initially besides use bios-provided features and start interpreting code which then provides more functionality.
>SectorC is a C compiler written in x86-16 assembly that fits within the 512 byte boot sector of an x86 machine. It supports a subset of C that is large enough to write real and interesting programs.
The post title could include this, but perhaps it's a little verbose.
In any case, agreed it's impressive to fit it in 512 bytes!
Deleted Comment
tccboot https://bellard.org/tcc/tccboot.html clocks in at a ginormous 138kB by comparison.
'Can we boot to linux from source in 512b' is the wrong question to ask ;)
https://github.com/Mati365/ts-c-compiler
Does the 512 byte limit for a boot sector exist in UEFI too? I don't know much about UEFI so if someone could educate me about how the boot sector and its size limit differs in UEFI, I'd love to know.
Before the ESP is accessed, there is no standardized way to customize the boot process. You could put these kinds of sectorX toys into the firmware directly, which would come with more constraints, but it would be vendor-specific.
There is a platform-independent VM running a special EFI byte code that is part of the EFI specification, which allows you to extend the UEFI system with things like additional drivers, but those are also loaded from the ESP.
1. IIUC PE executables are Windows executables. So a Linux system that targets UEFI ends up writing a PE executable to the EFI System Partition?
2. I know that some UEFIs (or is it all?) support BIOS boot sector as backward compatibility feature? How does that work? If I write a "hello world" program in pure machine code in the 1st sector of the boot disk, would UEFI read that and execute that? How would it even know whether what's in the first sector is valid code or garbage? By checking the magic 0x55 0xaa at the end of the boot sector?
And where is the compiled binary saved? Is it kept temporarily in memory itself for immediate execution? Or is the compiled binary saved back to the disk?
If someone could point me to the right sections of the code that answer these questions, it'd be of great help! Thanks!