Why fastDOOM is fast - Readit News

> The MPV patch of v0.1 is without a doubt build 36 (e16bab8). The "Cripy optimization" turns status bar percentage rendering into a noop if they have not changed. This prevents rendering to a scrap buffer and blitting to the screen for a total of 2 fps boost. At first I could not believe it. I assume my toolchain had a bug. But cherry-picking this patch on PCDOOMv2 confirmed the tremendous speed gain.

Good example of how the bottlenecks are often not where you think they are, and why you have to profile & measure (which I assume Viti95 did in order to find that speedup so early on). The status bar percentage?! Maybe there's something about the Doom arch which makes that relatively obvious to experts, but I certainly would've never guessed that was a bottleneck a priori.

robocat · 6 months ago

Example: "Our app was mysteriously using 60% CPU and 25% GPU. It turned out this was due to a tiny CSS animation [of an equaliser icon]"

https://www.granola.ai/blog/dont-animate-height

rplnt · 6 months ago

I remember Slack eating up my CPU when it had to display several animanted emojis at once. Use some 20+ emojis and the intel mb pro couldn't handle it. Luckily they knew and there was an option to disable the animation. Now I have no idea if they fixed it since or it is one of those things that was "fixed" by M1.

Would like to see a write up on how it's even possible to achieve that when PCs from 20-30 years ago had no issue with such task.

grrowl · 6 months ago

Or more recently, NPM's infamous "Progress bar noticeably slows down npm install #11283" issue

[1]: https://github.com/npm/npm/issues/11283

ericmcer · 6 months ago

Why was the solution to optimize the animation instead of... using a static asset?

ortsa · 6 months ago

This reminds me of having to go into spotify's package files to track down their version of this animation (an animated svg of a bar chart) and kill it, because it would destroy performance on my PC so badly that it was affecting other programs, causing hitches and freezes.

The animation's still there — and my PC is better now, so it doesn't stutter — but I'm willing to bet it's still burning waaay too many watts, for something so trivial.

emmanueloga_ · 6 months ago

This is why it pays out to turn on "Paint Flashing" [1] in the web console, even if just from time to time.

1: https://web.dev/articles/simplify-paint-complexity-and-reduc...

Dwedit · 6 months ago

CSS animations make Flash look efficient by comparison.

Dead Comment

inDigiNeous · 6 months ago

Reminds me of the performance optimization somebody discovered in Super Mario World for SNES, where displaying the player score in was very inefficient, taking about 1/6 of the frametime allocated.

"SMW is incredibly inefficient when it displays the player score in the status bar. In the worst case (playing as Luigi, both players with max score), it can take about a full 1/6 of the entire frame to do so, lowering the threshold for slowdown. Normally, the actual amount of processing time is roughly proportional to the sum of the digits in Mario's score when playing as Mario, and the to the sum of the digits in both players' scores when playing as Luigi. This patch optimizes the way the score is stored and displayed to make it roughly constant, slightly faster than even in the best case without."

https://www.smwcentral.net/?p=section&a=details&id=35746

barbariangrunge · 6 months ago

As a gamedev, those slowdowns are common. Ui rendering, due to transparency, layering and having to redraw things, and especially from triggering allocations, can be a real killer. Comparing old vs new before allowing it to redraw is really helpful. I found layers and transparency was a killer in css as well in one project, but that was more about reducing layers there

lomase · 6 months ago

Drawing to the screen is a IO operation, is going to be slow.

RankingMember · 6 months ago

My favorite example of this is the GTA Online insane loading time issue that ended up being due to poor handling of a 10MB json file (and was finally tracked down by someone outside their org). Took a 6 minute load time down to just under 2 minutes:

https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

pjs_ · 6 months ago

Reminds me of the incident where npm was 2X slower than it should have been because of the fancy terminal progress bar:

https://news.ycombinator.com/item?id=10974929

ognarb · 6 months ago

I had a similar case when I work on a Matrix client (NeoChat) and I and the other devs were wondering why loading an account was so slow. Removing the loading spinner made it so much faster, because the animation to render the loading spinner uses 100% cpu.

jiggawatts · 6 months ago

A common one for server apps is logging, especially to the console.

It’s far more expensive than people assume and in many cases is single-threaded. This can make logging the scalability bottleneck!

qingcharles · 6 months ago

The original Doom must have been heavily profiled by id though, surely? Obviously there is a bunch of things that were missed, but I was in game dev at Doom time and profiling was half the job back then.

pjc50 · 6 months ago

Well yes - it was an incredible feat of performance engineering. It's just that since then someone managed to make an extra three thousand commits worth of micro optimizations.

ramses0 · 6 months ago

Until razer comes in and lands a patch authored by an intern that tanks framerates to 1/3rd doing glowy USB shit every frame...

https://www.reddit.com/r/Doom/comments/8a1m9s/psa_deactivate...

fabiensanglard · 6 months ago

What tools did you have in 1993? From what I understood, id Software used NeXT because the tooling was not up to the task.

rasz · 6 months ago

There is no evidence of any profiling in source code. It wasnt a thing you did in 1993. The best you could do was compile whole game with new changes, run benchmark loop and compare results.

on_the_train · 6 months ago

I'm currently at the second job in my life where updating the progress bar takes up a tremendous percentage of the overall performance. Because our "engineers" have never used a profiler. At a large international tech giant :(

Deleted Comment

Cthulhu_ · 6 months ago

I do understand it though, at bigger companies you're less likely to want / need to worry about smaller things. Of course, I'm willing to bet money that they implemented a progress bar themselves instead of using an off-the-shelf one.

aeyes · 6 months ago

He ported the optimization from the Crispy Doom fork. Since this is one of the first changes in the repo I bet that this was a known issue at the time.

PlunderBunny · 6 months ago

Back at the turn of the century we found that a performance sensitive part of our WIN32 app was adversely affected by reading a setting from an ini file - in Windows 2000, it was significantly slower than on earlier versions of Windows. The setting was just to determine whether to enable logging for that particular part of the app.

Cthulhu_ · 6 months ago

Ironically, it was rumoured Monster Hunter Wilds gets a performance boost if you fix a typo in an .ini file (https://steamcommunity.com/app/2246340/discussions/0/5962682...?), but that was debunked (the typo is restored when reopening the game, the .exe uses the same typo so it's not that the wrong value is being read).

smat · 6 months ago

While this gives an impressive boost in performance, it also means that frametimes are around 10% longer when the status bar has to be updated.

Overall this can mean that in some situations the game feels not as smooth as before due to these variations.

Essentially when considering real time rendering the slowest path is the most critical to optimize.

flykespice · 6 months ago

Yep, in a real situation the player would be constantly moving around collecting hp/ammo/weapon, losing health/ammo to monsters... all these would cause the status bar to be frequently updated.

I don't think the benchmark accounts that.

slavboj · 6 months ago

At one point the bottleneck to the Siri iOS client was rendering the animated glowy ball.

> To get the big picture of performance evolution over time, I downloaded all 52 releases of fastDOOM, PCDOOMv2, and the original DOOM.EXE, wrote a go program to generate a RUN.BAT running -timedemo demo1 on all of them, and mounted it all with mTCP's NETDRIVE.

I'm probably not the real target audience here, but that looked interesting; I didn't think there were good storage-over-network options that far back. A little searching turns up https://www.brutman.com/mTCP/mTCP_NetDrive.html - that's really cool:)

> NetDrive is a DOS device driver that allows you to access a remote disk image hosted by another machine as though it was a local device with an assigned drive letter. The remote disk image can be a floppy disk image or a hard drive image.

jandrese · 6 months ago

> I didn't think there were good storage-over-network options that far back.

Back in school in the early 90s we had one computer lab where around 25 Mac Plus machines were daisy chained via AppleTalk to a Mac II. All of the Plus machines mounted their filesystem from the Mac II. It was painfully slow, students lost 5-10 minutes at the start of class trying to get the word processor started. Heck, the Xerox Altos also used network mounts for their drives.

If you have networking the first thing someone wants to do is copy files, and the most ergonomic way is to make it look just like a local filesystem.

DOS was a bit behind the curve because there was no networking built-in, so you had to do a lot of the legwork yourself.

ben7799 · 6 months ago

There was already relatively deep penetration of this stuff in the corporate world and universities way back in the early 1990s.

Where I want to school we had AFS. You could sit down at any Unix workstation and login and it looked like your personal machine. Your entire desktop & file environment was there and the environment automatically pointed all your paths at the correct binaries for that machine. (While we were there I remember using Sun, IBM, and SGI workstations in this environment.)

When Windows came on campus it felt like the stone ages as none of this stuff worked and SMB was horrible in comparison.

These days it feels like distributed file systems are used less and less in lieu of having to upload everything to various web based cloud systems.

In some ways it feels like everything has become less and less friendly with the loss of desktop apps in favor of everything in the browser.

I guess I do use OneDrive, but it doesn't seem particularly good, even compared to 1990s options.

bluGill · 6 months ago

Appletalk was horribly slow - 230.4 kbit/s. Ethernet was already 10mbit at the time (but a lot more expensive). General best practice would have been having the world processor installed on each machine and only saving files across the network, which would have made performance acceptable - but at the cost of needing a hard drive in all those plus machines (I don't recall the price a a harddrive at the time, but I'm guessing closer to $1000 for 20mb, compared to multi tb drives for around $100 today)

somat · 6 months ago

There is a neat trick where ipxe can netboot dos from an iscsi target, with no drivers or config dos gets read write access to a network share(well, not a share, if you share it it gets corrupted fast, a network block device). it feels magical but I think ipxe is patching the bios to make disk access go over iscsi.

leshokunin · 6 months ago

I’m curious: were there NAS’ or WebDAV mount in the DOS era? Obviously there was FTP and telnet and such. Just curious if remote mounts was a thing, or if the low bandwidth made it impossible

sedatk · 6 months ago

Yes, there was Novell Netware that let you mount remote drives, and there were even file locking APIs in DOS to organize simultaneous access to files. In fact, DOOM's multiplayer code relied on part of Novell Netware stack (IPXODI and LSL). The remote mounts were mainly used on LANs though, not over Internet.

bombcar · 6 months ago

Yes, it's basically what Netware was, and Novell was a HUGE company.

SMB (samba) is also from the DOS era. Most people only know of it from Windows, though.

There were various other ways to make network "drives" as the DOS drive interface was very simplistic and easy to grab onto.

It was rare to find this stuff until Win95 make network connections "free" (before then, you had to buy the networking hardware and often the software, separately!).

diet_mtn_dew · 6 months ago

A network redirector interface (for 'redirecting' local resource access over a network) was added at least by DOS 3.1 in 1985, possibly earlier in 3.0 (1984)

[1] https://www.os2museum.com/wp/redirectors-and-dos-3-0/

bitwize · 6 months ago

WebDAV didn't come out until the back half of the 90s, and it was slow to be adopted at first.

Back in the day, you could author a web page directly in GruntPage, and publish it straight to your web server provided said server had the FPSE (FrontPage Server Extensions), a proprietary Microsoft add-on, installed. WebDAV was like the open-standards response to that. Eventually in later versions of FrontPage the FPSE was deprecated and support for WebDAV was provided.

pjc50 · 6 months ago

WebDAV itself dates from 1999, well into the Windows 95 era. The Novell system pre dates it by a lot.

gwern · 6 months ago

yjftsjthsd-h · 6 months ago

ndegruchy · 6 months ago

The linked GitHub thread with Ken Silverman is gold. Watching the FastDOOM author and Ken work through the finer points of arcane 486 register and clock cycle efficiencies is amazing.

Glad to see someone making sure that Doom still gets performance improvements :D

kridsdale1 · 6 months ago

I haven’t thought of KenS in ages but back in the 90s I was super active in the Duke3D modding scene. Scripting it was literally my first “coding”.

So in a way, I owe my whole career and fortune to KenS. Cool.

vunderba · 6 months ago

I feel like Duke 3D was probably the first mainstream accessible moddable FPS. Doom of course had plenty of level editors, but Duke Nukem brought the ability to alter and script AI as editable plaintext CON files, and of course any skills you learned on the BUILD engine were transferrable to any number of other games (Shadow Warrior, Blood, etc.)

Also shout out to anyone who remembers "wackplayer" - Duke's equivalent of the BEEP keyword.

nurettin · 6 months ago

His blog was the first page I "surfed". Talking about duke3d map editor and his big project using voxels.

badsectoracula · 6 months ago

AFAIK the CON scripting language (used in the *.CON files in DN3D) wasn't made by Ken Silverman but by the Duke Nukem 3D team at 3D Realms. I think it was Todd Replogle who wrote the CON stuff.

ehaliewicz2 · 6 months ago

Last year I emailed Ken Silverman about an obscure aspect of the Build Engine while working on a similar 2.5D rendering engine. He answered the question like he worked on it yesterday.

phire · 6 months ago

There are some real gems in there.

I especially liked the idea of CR2 and CR3 as scratchpad registers when memory access is really slow (386SX and cacheless 386DXs). And the trick of using ESP as a loop counter without disabling interrupts (by making sure it always points to a valid stack location) is just genius.

Yes! I know nothing about low level programming, but the idea of using a register that you don't need for a fast 'memory' location is particularly clever.

unleaded · 6 months ago

One feature of FastDOOM I haven't seen mentioned here are all the weird video modes, some interesting examples:

- IBM MDA text mode: https://www.youtube.com/watch?v=Op2tr2lGK6Y

- EGA & Plantronics ColorPlus: https://www.youtube.com/watch?v=gxx6lJvrITk

- Classic blue & pink CGA: https://youtu.be/rD0UteHi2qM

- CGA, 320x200x16 with 'ANSI from Hell' hack: https://www.youtube.com/watch?v=ut0V1nGcTf8

- Hercules: https://www.youtube.com/watch?v=EEumutuyBBo

Most of these run worse than with VGA, presumably because of all the color remapping etc

toast0 · 6 months ago

> - EGA & Plantronics ColorPlus: https://www.youtube.com/watch?v=gxx6lJvrITk

Any love for Tandy Graphics Adapter? I'd hate to have to run in CGA :( would need a 286 build for my Tandy 1000 TL/2, if it was still alive.

That's awesome, just a great demonstration why these aspects of the game should be separated. It reminds me of the "modern" Clean Architecture for back-end applications.

tecleandor · 6 months ago

The IBM MDA text mode is terrible... Love it!

jakedata · 6 months ago

"IBM PS/1 486-DX2 66Mhz, "Mini-Tower", model 2168. It was the computer I always wanted as a teenager but could never afford"

Wow - by 1992 I was on my fourth homebuilt PC. The KCS computer shows in Marlborough MA were an amazing resource for tinkerers. Buy parts, build PC and use for a while, sell PC, buy more parts - repeat.

By the end of 1992 I was running a 486-DX3 100 with a ULSI 487 math coprocessor.

For a short period of time I arguably had the fastest PC - and maybe computer on campus. It outran several models of Pentium and didn't make math mistakes.

I justified the last build because I was simulating a gas/diesel thermal-electric co-generation plant in a 21 page Excel spreadsheet for my honors thesis. The recalculation times were killing me.

Degree was in environmental science. Career is all computers.

wk_end · 6 months ago

"Wow"? Is it really necessary to give this guy a hard time for being unable to afford the kind of computers you had in 1992?

Anyway, there's no such thing as a "DX3". And the first 100MHz 486 (the DX4) came out in March of 1994, so I don't see how you were running one at the end of 1992.

My family's first computer - not counting a hand-me-down XT that was impossibly out-of-date when we got it in 1992 or so - was a 66MHz 486-DX2, purchased in early 1995.

I can't quite explain why, but as a matter of pride it's still upsetting - decades later - to see someone weirdly bragging about an impossible computer that supposedly outran mine despite a three year handicap.

thereticent · 6 months ago

Is that really what "wow" means here? I took it more as "wow, I've been around forever / I must be old now" or something similarly tame.

bpoyner · 6 months ago

That definitely brought back memories. Around '92, being a poor college student I took out a loan from my credit union for about $2,000 to buy a 486 DX2-50. For you younger people, that's about $4,000+ in today's money for a pretty basic computer. I dual booted DOS and Linux on that bad boy.

antod · 6 months ago

A 486DX and a 487? I thought the 487 was only useful for the SX chips?

...looked it up, apparently the standard 487 was a full 486DX that disabled and replaced the original 486SX. Was this some sort of other unusually awesome coprocessor I hadn't heard of?

486sx33 · 6 months ago

Doubled throughput of certain calculations in certain tasks if motherboard supported it

Possibly something software like maple could take advantage of

cantrecallmypwd · 6 months ago

Doesn't make any sense, perhaps it's AI-generated nonsense. There was a DX4 100 but no such thing as a "DX3". The 486 included an FPU so there'd be no reason to have a "487" which was a complete replacement for the 486SX chips. There were Pentium Overdrives but those were CPU replacements on the 486DX.

The 486sx had a 16 bit external bus interface so it could work with 386 chipsets. The DX processors had a full 32 bit bus and correspondingly better throughput. The 486 never included an integrated FPU, you had to add a separate co-processor for that. I could go on about clock multipliers and base frequencies but I'll spare you.

ForOldHack · 6 months ago

"It outran several models of Pentium and didn't make math mistakes." Total bragging rights. Total. You owned them. Good job.

mmphosis · 6 months ago

On top of releasing often, Viti95 displayed outstanding git discipline where one commit does one thing and each release was tagged.

https://fabiensanglard.net/fastdoom/#:~:text=one%20commit%20...

kingds · 6 months ago

> I was resigned to playing under Ibuprofen until I heard of fastDOOM

i don't get the ibuprofen reference ?

kencausey · 6 months ago

Guess: headache from low frame rate?

Indeed.

If the author reads this: John Carmack's last name was mistyped as "Carnmack" throughout the document.

Thank you for taking the time to report it. It has now been fixed.

mkl · 6 months ago

Another typo s/game/gave/: "Another reason John game me".

CamperBob2 · 6 months ago

Speaking of Carmack, can you (or someone) elaborate on this quote?

>DOOM cycles between three display pages. If only two were used, it would have to sync to the VBL to avoid possible display flicker.

How does triple buffering eliminate VBL waits, exactly? There was no VBL interrupt on a standard VGA, was there?