One of my best hacks back in the day was an 8080 emulator, for the 8088/8086 back in the time when what you were really trying to do was run the 8080 code as fast as possible (because the first PCs were slow, and emulating 8080 code was much slower than running 8080 code on a native machine).
Invariably at the time the 8080 emulators would have a dispatch loop, fetching the next opcode and dispatching through a jump table to some code to emulate each of the 256 opcodes. One problem with this was that you couldn't dispatch this way without altering the CPU flags, so you'd also have to save and restore those with LAHF/SAHF (fast) or PUSHF/POPF (slow). In general you'd need about 10 8086 instructions to emulate 1 8080 instruction, and in the early 1980s this meant your emulated CP/M program would run much slower than on your old CP/M computer.
My emulator would emulate 1 8080 opcode with 4 8086 opcodes as follows; An 8080 instruction, say 0x94 = sub h would be emulated with code loaded at address 0x9494. That code would be;
sub bh,dl ;bh = bh-dl, emulate h with bh, a with dl
lodsb ;al = *si++, get next opcode, increment emulated pc
mov ah,al ;eg 0x94 -> 0x9494
jmp ax ;jump to next instruction
This is a classic example of trading space for time, your emulator is sparsely distributed through 64K of RAM. So you needed 128K to emulate your 64K 8080.
Shower thought the next day, OMG I got the subtraction round the wrong way! Should be (of course);
sub dl,bh ;dl = dl-bh, emulate A with dl, H with bh
Not all opcodes are one to one like this one. For example 0x3e = MVI A,imm (move an immediate value to the A register) would be;
lodsb ;al = *si++, get the immediate value
mov dl,al ;dl = al, A = imm
This time I've omitted the 3 instruction postscript (lodsb; mov ah,al; jmp ax) to all opcodes.
While I'm here adding extra information, note another strength of my approach; The si, al and ah registers are the only resources consumed for emulation machinery. The di, bx, cx, dl and f registers are available to emulate the DE, HL, BC, A and F registers without any memory accesse (the dh register was wasted sadly). Remember the host CPU was the 16 bit 8088/8086 machine which also had 'segment' registers to allow access to 20 bits of addressable memory rather than 16 (so 1M not 64K). The segment registers had a role in emulation, basically setting up a 64K 'code segment' to run the emulator and another 64K shared 'stack and 'data' segment to emulate 64K of memory for the emulated 8080. Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.
I was wondering about the registers, why not use the "standard" mapping for BC/DE/HL => CX/DX/BX, and DI for A?
One disadvantage of this would be you have to "xchg ax,di" before and after every accumulator op, but that is a single byte opcode (and on the 8088, instruction fetch is the main bottleneck). Just from intuition, it seems like for common code sequences like "MOV D,H ! MOV E,L", having the byte registers directly addressable would be a win. But maybe you profiled how common the various instructions are in real code?
>Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.
Not necessarily. Modern x86 CPUs running in 64-bit mode still support 16-bit code/data segments (the only thing not supported is 64 bit OS + V86 mode, and you don't need that for 8080 emulation). At least on Linux, this is made available to userspace via the modify_ldt syscall.
Of course, when you have 16-bit code and a CP/M emulator, might as well also make it run CP/M-86 programs (most of which do not use the kind of segment arithmetic that would require V86 mode). Then the 8080 CPU emulator would be just another code segment, and probably the easier part to write, compared to the code necessary to emulate the CP/M filesystem on Linux.
Yes, pretty close so full translation is easy if you have the source code. If not, emulation is the normal way. My emulator was faster than native 8080 code on everything but vanilla 8088 PCs, so it's more than sufficient.
I address translation in depth in my Retro Sargon project, an exercise in getting a vintage Z80 (8080 superset) assembly language program working on modern PCs. https://github.com/billforsternz/retro-sargon See the "Yet More Details" section in the README.
I think the 8086 was designed such that 8080 assembly code could be assembled as 8086 machine code. So I'd guess the instruction emulation would probably be reasonably efficient.
They seem to be re-organizing the website structure [1] as well as updating the submitting system [2], and lately they were making some progress [3], but still need a lot of formatting work.
Seems like they are stretched a bit thin, and no doubt this is a hobby project.
This is a great steelman for where dogfooding does not apply - imagine if their website was actually an obfuscated C webserver with the content imbedded in it, and therefore impossible to update lol.
I've been waiting for the competition to open up again, I think I have an entry good enough. There does appear to be some activity on their GitHub repo, but I have no idea when the next one will be. Does anyone here have any gossip about when the next one might be? Wild rumor would also suffice.
The biggest factor I think is that the judging process cannot be easily parallelized after the initial filtering and traditionally needed multiple in-person judging sessions---there are several photos in the official Twitter account. (2020 judges didn't meet in person for the obvious reason, but still used conference calls.)
Related, 8086tiny, a 2013 IOCCC winner. About twice as much source code, and if not cheating then certainly leaning heavily on the rules, but very impressive as it's a full x86 emulator with graphics and several peripherals.
`a[b]` is semantically equivalent to `*(a + b)`, so it is unexpectedly commutative. Let's swap them all and add some spacing. Ah, also some assignment expressions have been reused as a part of other expressions, which I will also pull out:
X = *Y & 7,
L = X & 1,
o = X / 2 & 1,
O = l[32] = 0,
c = y,
t = c & 7,
a = c / 8 & 7,
T = Y[1] >> 6,
g = ~-T ? y : (n) y,
d = BX = y,
--l[64]
Now it should be much more clear that a large chunk of this code is instruction decoding. At this point we need to expand macros to make more sense out of it:
#define Z short
#define a(_)*(_*)&
#define y a(Z)Y[++O]
#define n char
It would need more reading to determine the exact meaning of `Y`, but given that its type is `unsigned char*`, it should be some sort of pointer, most likely the instruction pointer. The macro `y`, which expands to `*(short*)&Y[++O]`, increases `O` so it looks like reading from IP+0, IP+1, ... while not actually advancing the IP.
So the mysterious code can be now mostly decoded:
X = *Y & 7, // [IP+0] = ?????XXX
L = X & 1, // = ???????L
o = X / 2 & 1, // = ??????o?
O = l[32] = 0, // Reset `O` for further uses of `y`
c = y, // [IP+1] = yyyyyyyy
t = c & 7, // = ?????ttt
a = c / 8 & 7, // = ??aaa???
T = Y[1] >> 6, // = TT??????
g = ~-T ? y : (char) y, // Read [IP+2] and do something
d = BX = Y[++O], // [IP+3] = dddddddd
--l[64]
[IP+1] surely looks like the ModR/M byte, so you can guess that `g` is a displacement even if you don't know what the heck is `~-T` (equals to `T - 1` in twos' complement). [IP+0] is too generic to pick the exact instruction, but my guess is that this code is shared for a lot of similar opcodes to save the code footprint.
The description implies they found some clever way to trick iocccsize at the time. Using the current version of iocccsize, it weighs in at 3842, well over the max of 2053.
The BIOS file doesn't just contain what you'd expect from a normal PC BIOS, but also contains several tables used by the emulator for decoding of x86 instructions. Normally those tables would be in code as they're necessary regardless of BIOS implementation; they're presumably in the BIOS to save space.
Is there a competition where you win by showing something both compact and extraordinary but not obfuscated ? Like zlib compression in 100 lines or some song recognition using FFT in X lines, or a working ssh client in Y lines ?
Some of the winners of the last few years have been extraordinary - complete arcade games, with physics, in just 10 lines of BASIC. This competition definitely highlights the issue we have with software development, which is that we, as a species, don't generally get really good at it for decades...
Wow, that sent me down a rabbit hole -- but this line from the homepage makes it sound like he is _not_ part of that family?
> If you're looking for information about the Mexican scientist Oscar Toledo Esteva, then please go to the History page of the Familia Toledo organization site.
The simplest emulator in the world would be like this one:
unsigned char memory[65536];
int reg[8];
int main(void)
{
int pc = 0;
/* In this point you read something into memory */
while (1) {
instruction = memory[pc];
if (instruction == 0x76) break; /* HLT */
pc = (pc + 1) & 0xffff;
if ((instruction & 0xc0) == 0x40) /* Handle MOV r,r (opcodes 0x40-0x7f) */
reg[(instruction >> 3) & 7] = reg[instruction & 7];
}
return 0;
}
What it does? It simply reads each instruction from memory (PC is the Program Counter), and increments PC continuously limiting it to 16-bit.
It does a comparison for each possible instruction, for this example I only handled the most basic one of 8080 (MOV r,r) and HLT, but this is already 64 cases of the 256 possible!
Then I added iteratively chunks of opcodes to emulate and used language C macros to reduce the code size. For example, there is #define d(e) to join two 8-bit registers into a 16-bit value, and #define Q to read the accumulator.
In the end, to stay inside the size limit of the IOCCC, I removed the tree of 'if' statements and replaced it with a tree of trinary operators ?: (a way of doing 'if' inside an expression) and this is what the macros C and S does.
The 8080 instruction set opcodes have logical bit patterns, so you can identify the operation and operands through bit masking instead of requiring a huge LUT. Beyond that you can simplify a lot by sacrificing timing accuracy.
you can also write one using a 256-byte switch, which ain't so big. i wrote an 8080 emulator in fortran77 on a dec10 that way, back in the 1980s. but of course size wasn't an issue. it did prove to be quite portable - to vax and ibm vm/cms. we used it for teaching assembler programming.
After simplifying and reducing the code as far as possible, it is rewritten in a compressed format using macros and without whitespace. Look at the #define statements.
Not sure how to use brew without touching the llvm or Xcode etc., I switch to lima (search it under this news site for more details). It works (after doing sudo ... build essentials etc.).
For the BASIC, not sure how to quit the environment after learning it in 1979! Seems Ctrl-Z give you a seg-fault and it quit ... :-) But not sure where is the core-dump generated ;-p
For the CPM, must do the chmod +w A B as said (BUT NOT chmod +w A then chmod +w B ?!!). Still cannot exit the CPM until I dir and find the program HALT.COM (wow .COM) and A>HALT ... all good. :-)
I have seen a program, I think for IOCCC, where the code was aligned in 4 character columns that seemed to naturally fit the code, so it would have code arranged like:
for( long i=0; i<n; i++)
{int a=1, b=2, c=3, d=4;
for the whole program.
I lost it though and have been completely unable to find it, despite searching very hard.
Has anyone seen something like this before? I would be forever grateful for any pointer towards it.
Invariably at the time the 8080 emulators would have a dispatch loop, fetching the next opcode and dispatching through a jump table to some code to emulate each of the 256 opcodes. One problem with this was that you couldn't dispatch this way without altering the CPU flags, so you'd also have to save and restore those with LAHF/SAHF (fast) or PUSHF/POPF (slow). In general you'd need about 10 8086 instructions to emulate 1 8080 instruction, and in the early 1980s this meant your emulated CP/M program would run much slower than on your old CP/M computer.
My emulator would emulate 1 8080 opcode with 4 8086 opcodes as follows; An 8080 instruction, say 0x94 = sub h would be emulated with code loaded at address 0x9494. That code would be;
This is a classic example of trading space for time, your emulator is sparsely distributed through 64K of RAM. So you needed 128K to emulate your 64K 8080.While I'm here adding extra information, note another strength of my approach; The si, al and ah registers are the only resources consumed for emulation machinery. The di, bx, cx, dl and f registers are available to emulate the DE, HL, BC, A and F registers without any memory accesse (the dh register was wasted sadly). Remember the host CPU was the 16 bit 8088/8086 machine which also had 'segment' registers to allow access to 20 bits of addressable memory rather than 16 (so 1M not 64K). The segment registers had a role in emulation, basically setting up a 64K 'code segment' to run the emulator and another 64K shared 'stack and 'data' segment to emulate 64K of memory for the emulated 8080. Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.
One disadvantage of this would be you have to "xchg ax,di" before and after every accumulator op, but that is a single byte opcode (and on the 8088, instruction fetch is the main bottleneck). Just from intuition, it seems like for common code sequences like "MOV D,H ! MOV E,L", having the byte registers directly addressable would be a win. But maybe you profiled how common the various instructions are in real code?
>Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.
Not necessarily. Modern x86 CPUs running in 64-bit mode still support 16-bit code/data segments (the only thing not supported is 64 bit OS + V86 mode, and you don't need that for 8080 emulation). At least on Linux, this is made available to userspace via the modify_ldt syscall.
Of course, when you have 16-bit code and a CP/M emulator, might as well also make it run CP/M-86 programs (most of which do not use the kind of segment arithmetic that would require V86 mode). Then the 8080 CPU emulator would be just another code segment, and probably the easier part to write, compared to the code necessary to emulate the CP/M filesystem on Linux.
I address translation in depth in my Retro Sargon project, an exercise in getting a vintage Z80 (8080 superset) assembly language program working on modern PCs. https://github.com/billforsternz/retro-sargon See the "Yet More Details" section in the README.
Looking forward to seeing more editions in the future, and submitting another entry myself...
[1] https://www.ioccc.org/years.html#2020
[1] https://github.com/ioccc-src/temp-test-ioccc [2] https://github.com/ioccc-src/mkiocccentry [3] https://fosstodon.org/@ioccc/112023138011006567
There are currently only two active maintainers. I guess any help is welcome.
This is a great steelman for where dogfooding does not apply - imagine if their website was actually an obfuscated C webserver with the content imbedded in it, and therefore impossible to update lol.
Original entry: https://www.ioccc.org/years.html#2013_cable3
"Deobfuscated": https://github.com/adriancable/8086tiny
(Not mine, just a fan)
--64[T=1[O=32[L=(X=*Y&7)&1,o=X/2&1,l]=0,t=(c=y)&7,a=c/8&7,Y]>>6,g=~-T?y:(n)y,d=BX=y,l]
The difference between bf and c sometimes is small
So the mysterious code can be now mostly decoded:
[IP+1] surely looks like the ModR/M byte, so you can guess that `g` is a displacement even if you don't know what the heck is `~-T` (equals to `T - 1` in twos' complement). [IP+0] is too generic to pick the exact instruction, but my guess is that this code is shared for a lot of similar opcodes to save the code footprint.How so?
https://www.ioccc.org/2013/cable3/hint.html
I think once you try to get something small enough the code will naturally tend towards obfuscation.
https://gkanold.wixsite.com/homeputerium/copy-of-rules
Some of the winners of the last few years have been extraordinary - complete arcade games, with physics, in just 10 lines of BASIC. This competition definitely highlights the issue we have with software development, which is that we, as a species, don't generally get really good at it for decades...
Fantastic
https://news.ycombinator.com/item?id=28337064
> If you're looking for information about the Mexican scientist Oscar Toledo Esteva, then please go to the History page of the Familia Toledo organization site.
https://nanochess.org/index.html
Most of the instructions are changing registers in various ways, which is pretty simple. The Appendix A has all of them: https://altairclone.com/downloads/manuals/8080%20Programmers...
[1] https://www.ioccc.org/2019/lynn/hint.html
unsigned char memory[65536];
int reg[8];
int main(void)
{
}What it does? It simply reads each instruction from memory (PC is the Program Counter), and increments PC continuously limiting it to 16-bit.
It does a comparison for each possible instruction, for this example I only handled the most basic one of 8080 (MOV r,r) and HLT, but this is already 64 cases of the 256 possible!
Then I added iteratively chunks of opcodes to emulate and used language C macros to reduce the code size. For example, there is #define d(e) to join two 8-bit registers into a 16-bit value, and #define Q to read the accumulator.
In the end, to stay inside the size limit of the IOCCC, I removed the tree of 'if' statements and replaced it with a tree of trinary operators ?: (a way of doing 'if' inside an expression) and this is what the macros C and S does.
Not sure how to use brew without touching the llvm or Xcode etc., I switch to lima (search it under this news site for more details). It works (after doing sudo ... build essentials etc.).
For the BASIC, not sure how to quit the environment after learning it in 1979! Seems Ctrl-Z give you a seg-fault and it quit ... :-) But not sure where is the core-dump generated ;-p
For the CPM, must do the chmod +w A B as said (BUT NOT chmod +w A then chmod +w B ?!!). Still cannot exit the CPM until I dir and find the program HALT.COM (wow .COM) and A>HALT ... all good. :-)
I have seen a program, I think for IOCCC, where the code was aligned in 4 character columns that seemed to naturally fit the code, so it would have code arranged like:
for the whole program.I lost it though and have been completely unable to find it, despite searching very hard.
Has anyone seen something like this before? I would be forever grateful for any pointer towards it.
https://www.ioccc.org/years.html#1995_heathbar
https://www.ioccc.org/years.html#2014_endoh1
https://www.ioccc.org/years.html#2006_sloane
May be related, I have an index that lists winning entries with some form of ASCII art, one of those could be what you were looking for.
https://docs.google.com/spreadsheets/d/e/2PACX-1vRg3T5z-QN66...