From a cursory look at the code, I don't see any use of fused-multiply-add (FMA) which would likely help with precision issues in a number of places with the float version. The "problem in more detail"[1] readme specifically calls out computation of `x^2 - y^2` as a source for error, and that has methods that dramatically reduce error with FMAs[2].
The author needs to learn about techniques for rendering deep zooms into the Mandelbrot set. Since 2013 it has been possible to render images that are 2^-hundreds across using mostly double precision arithmetic, apart from a few anchor points calculated with multiprecision (hundreds of bits) arithmetic.
The deep zoom mathematics includes techniques for introspecting the iterations to detect glitches, which need extra multiprecision anchor points to be calculated.
I've always wondered how those "Ten minute Mandelbrot zoom" videos worked, because there's no way double would last that long at a tolerable zoom rate.
The perturbation technique is interesting. Calculating just a few points with super high precision and then filling the pixels in between by adding an offset and continuing with lower precision halfway though the calculation seems plausible at a glance, but I'll have to read that more carefully later.
I wish someone made up a better explicit FMA syntax than std::fma(-c, d, cd) ... maybe something along the lines of ((-c * d + cd)) with special brackets or (-c) ⊠ d + cd with a special multiplication symbol.
And if only we could effectively use FMAs from javascript...
I wouldn't really say that's a universally held notion, especially in the modern world. Historically, yes, but these days it stands more on its own. In similar vein, you could say biology is a subfield of physics, but most people don't think of them that way.
To a librarian, Computer Science and Library Science are just different aspects of information systems, which are (a librarian need hardly tell HN) humanity's highest calling, thus justifying their shared Dewey classification of 0.
It's not really philosophically, though. As a pure mathematician, I would not consider CS a subfield of math. The spirit of math is really studying abstract axiomatic systems, and while CS in theory is sort of like that, in actual practice, CS is too tied and focused on the actual instantiation of computers to really have the spirit of math.
I think that statement held true before widespread availability of computing; when most computer science was really theoretical. I once skimmed a few chapters of a graduate-level textbook on Category Theory and realized that it was the foundation of object-oriented programming.
The biggest issue is that a lot of "Computer Science" is really applied software engineering; much like confusing physics and mechanical engineering.
Or, a different way to say it: Most students studying "Computer Science" really should be studying "Software Engineering."
More practically, I have a degree in Computer Science. When I was in school, it was clear that most of my professors couldn't program their way out of a paper bag, nor did they understand how to structure a large software program. It was the blind leading the blind.
Of course these terms aren’t well-defined, but some (me) would say both math and CS are subfields of “formal systems” (but “formal systems” could be named CS, which would put it at the top of the hierarchy)
…and I thought math branched off from computing science sometime in the 19th century. Before that, what was called „math“ was mostly algorithms to compute stuff.
This post prompted me to look into the theoretical question of the computability of the Mandelbrot set. And I found this excellent answer: https://math.stackexchange.com/q/4611850
Another way to explore floating point representation can also be to compare rounding modes, as this can impact a lot of what is generated for floating point operations. Most systems are round to nearest even, but round to zero, round towards positive and round towards negative also exist.
Bonus points if you also check results for x87 floats.
For Mandelbrot, presumably you can do exact rational arithmetic (all it needs is multiplications and add, and you presumably start with a floating-point coordinate, which is eminently rational). It will be very slow with many iterations, of course, since the fractions will rapidly spin out of control.
Edit: I see the supplementary material addresses this: “One could attempt to avoid the problem by using arbitrary-precision floating point representation, or rational representation (using two integers, for numerator and denominator, i.e., keep everything as fractions), but that quickly leads to so many significant digits to multiply in every iteration that calculations become enormously slow. For many points in a Mandelbrot Set Image, the number of significant digits in a trajectory tends to double with each iteration, leading to O(n^2) time complexity for evaluating one point with n iterations. Iterations must then be kept very low (e.g., 20-30 on a modern PC!), which means that many higher-iteration points cannot be evaluated. So accumulated round-off error is hard to avoid.” 20-30 sounds crazy low even for O(n²), though.
Edit 2: This part seems wrong: “Cycle detection would work if floating point quantities could be compared exactly, but they cannot.” They certainly can. If you detect a cycle, you will never escape _given your working precision_. Floats can cycle just as well as fixed-point numbers can. But I suppose true cycles (those that would be cycles even in rationals) are extremely unlikely.
> The name is a portmanteau of fractal and integer, since the first versions of Fractint used only integer arithmetic (also known as fixed-point arithmetic), for faster rendering on computers without math coprocessors. Since then, floating-point arithmetic and arbitrary-precision arithmetic modes have been added.
Hm, what about intervals of rationals, and then approximating the intervals with a slightly wider interval in order to make the denominators smaller? I guess the intervals for the real and imaginary components might grow too quickly?
if 0 < a_L < a < a_R , 0 < b_L < b < b_R , then (a + b i)^2 = (a^2 - b^2) + 2 a b i ,
and a_L^2 - b_R^2 < a^2 - b^2 < a_R^2 - b_L^2
and 2 a_L b_L < 2 a b < 2 a_R b_R ,
I mean, 30 means 2^30 sig digits. That means any operation on those are a bit expensive in pure memory throughput. Squaring that, let's use an FFT then it's just O(n log n) for the next step, with n being number of digits here. Or 30 * 2^30 ops, let's call it 2^35 ops. We're at 32 MFlops for a single dot. Let's only do a 1280*768 image, that's 32 GFlops to compute that image.
That's a perfectly nice rate, let's pretend it's half a second because all memory access was just perfectly sequential. (It's not, it's more).
Each additional step more than doubles the time spent. Which means even at 6 more iterations, we're staring at half a minute compute time. Nobody waits half a minute for anything. (Unless it's a web page downloading JS)
Now add the fact that I lied, memory access can't be fully sequential here. But even if it was, we're at least spilling into L3 cache, and that means some 40 cycles penalty every time we do. We'll do that a lot. So multiply time with 10 or so (I'm still hopelessly optimistic we have some efficiencies.), and suddenly even 30 iterations cost you 5 seconds.
And so, yes, it's really expensive even for modern PCs.
If you have a GPU, you can shuffle it off to the GPU, they're pretty good at FFTs - but even that's going to buy you, if we're really really optimistic, another 20 or so iterations before it bogs down. Doubling digits every cycle will make any hardware melt quickly.
and see these chaos zones that look like a field of random dots that probably miss a lot of the structure. The reason why we know those areas are dense with unstable periodic orbits is because they are also dense with periodic orbits. It doesn’t seem feasible to accurately track a system for long enough to believe those.
The second[1] C program I read in my life was Fractint[2]. It is called that way because it calculated fractals with integer math[3].
At that time integer math was not only faster than floating point, but the only thing available to most of us. Processors like the 80386 did not come with any floating point facilities whatsoever. The FPU was a separate unit (aptly called a coprocessor at the time) which was expensive and hard to obtain where I lived.
[1] The first one was DKBTrace, which lives on as POVRay. Thank you David K. Buck, where ever you are now. I learned a lot from you.
[3] Technically we should call it fixed point I guess, but in the end it was all done with the integer instructions of the CPU. So integer math is not completely wrong.
I still have the entire manual that I printed and bound circa 1996. I saw a program on PBS about fractals and we had just got dial up internet. I found fractint and cut my programming teeth with that and a copy of Borland that I convinced my parents to buy for me. Running on a 75MHz Compaq. Being a teenager at that time was just magical.
Edit: Just noticed you mentioned Povray too. Are you me? Wow, lol.
"Processors like the 80386 did not come with any floating point facilities whatsoever."
The 80286 needed a 80287 - I bought one to run AutoCAD (1MB RAM!)
I can't quite remember the exact order of events but either or both 80386 and 80486 had a co-pro initially. Then Intel re-invented the IBM magic screwdriver. They dropped the 80 prefix and added SX or DX and an i prefix for a laugh. SX had no co-pro or it was disabled and DX was the full fat model.
They really started to screw the punter with the 586 ... Pentium. The trend has continued - Core iBollocks.
The 8086/8088/80186/80188, 80286 & 80386 all required an external chip for floating point.
The 80386DX was the full 32-bit data bus version. The 80386SX was a cost reduced version with a 16-bit data bus (akin to the difference between the 8086 & 8088).
The 80486DX had an FPU; the 80486SX had a disabled or missing FPU (depending on vintage). You could pair the 80486SX with an 80487 FPU, which was just a full 80486DX in a different pinout that took over from the crippled part.
While some Intel doco did start calling them Intel386 & Intel486, they didn't drop the 80 until Pentium. Intel lost a court case trying to prevent clone makers from calling their chips '80486' or whatever because the judge ruled you can't keep your competitor from using a number.
Weitek made floating point units that paired to the 386 and 486. Some motherboards supported this, and a small amount of software did (early Autocad).
Correcting one down thread, the 486 is more than just a 386+387. There a cache, it's pipelined, and there are some changes to the ISA. 4x the transistors.
Yeah, Fractint was great. The VGA color palette cycling was trippy.
In the mid 1980's I typed in a Mandlebrot set program from a magazine. It was for the Apple 2, written in BASIC. It took around 12 hours to generate a 280x192 monochrome Mandlebrot.
If you need to use 128-bit floats for something, note that your processor probably supports them natively and that there are standard C/C++ types for them.
(The boost thing the article links says their quad precision type is “functionally equivalent” to standard types, but uses a different bit layout.)
Which prompts the question, what is quad precision floating point arithmetic needed for that IBM considered it prudent to implement it in hardware of a mainframe? Who are the customers paying for such? Naively perhaps, I would expect that financial calculations (ought to) rely on fixed-point arithmetic and afaiu simulations of physical systems get away with double precision, if one is careful.
I've read this a few places recently but I can't figure out how? https://godbolt.org/z/Mbjs73qar -- using doubles I get AVX instructions but using 128-bit floats calls some very hefty runtime support functions.
A related question I've wondered is, can you rearrange the formulae to reduce precision loss, without having to actually use more bits of precision in the data type?
Mandelbrot specifically, or computations in general? For the latter, absolutely yes. See Herbie for automated tool for doing it https://herbie.uwplse.org/
[1] https://github.com/ProfJski/FloatCompMandelbrot/blob/master/... [2] https://pharr.org/matt/blog/2019/11/03/difference-of-floats
The deep zoom mathematics includes techniques for introspecting the iterations to detect glitches, which need extra multiprecision anchor points to be calculated.
https://mathr.co.uk/blog/2021-05-14_deep_zoom_theory_and_pra...
https://dirkwhoffmann.github.io/DeepDrill/docs/Theory/Mandel...
In my experience of non-deep-zoom rendering, and contrary to the authors arguments, period detection works well for speeding up renders. It appeared to be fairly safe from false positives. https://dotat.at/@/2010-11-16-interior-iteration-with-less-p... https://dotat.at/prog/mandelbrot/cyclic.png
The perturbation technique is interesting. Calculating just a few points with super high precision and then filling the pixels in between by adding an offset and continuing with lower precision halfway though the calculation seems plausible at a glance, but I'll have to read that more carefully later.
Thanks for sharing!
And if only we could effectively use FMAs from javascript...
Naive implementation is easy, but there are many ways to improve performance.
The biggest issue is that a lot of "Computer Science" is really applied software engineering; much like confusing physics and mechanical engineering.
Or, a different way to say it: Most students studying "Computer Science" really should be studying "Software Engineering."
More practically, I have a degree in Computer Science. When I was in school, it was clear that most of my professors couldn't program their way out of a paper bag, nor did they understand how to structure a large software program. It was the blind leading the blind.
Systems software has to deal with a lot of physics and engineering stuff (speed of light, power, heat, mean time to component failure, etc, etc.)
Edit: I see the supplementary material addresses this: “One could attempt to avoid the problem by using arbitrary-precision floating point representation, or rational representation (using two integers, for numerator and denominator, i.e., keep everything as fractions), but that quickly leads to so many significant digits to multiply in every iteration that calculations become enormously slow. For many points in a Mandelbrot Set Image, the number of significant digits in a trajectory tends to double with each iteration, leading to O(n^2) time complexity for evaluating one point with n iterations. Iterations must then be kept very low (e.g., 20-30 on a modern PC!), which means that many higher-iteration points cannot be evaluated. So accumulated round-off error is hard to avoid.” 20-30 sounds crazy low even for O(n²), though.
Edit 2: This part seems wrong: “Cycle detection would work if floating point quantities could be compared exactly, but they cannot.” They certainly can. If you detect a cycle, you will never escape _given your working precision_. Floats can cycle just as well as fixed-point numbers can. But I suppose true cycles (those that would be cycles even in rationals) are extremely unlikely.
https://www.fractint.org/
From wikipedia:
> The name is a portmanteau of fractal and integer, since the first versions of Fractint used only integer arithmetic (also known as fixed-point arithmetic), for faster rendering on computers without math coprocessors. Since then, floating-point arithmetic and arbitrary-precision arithmetic modes have been added.
if 0 < a_L < a < a_R , 0 < b_L < b < b_R , then (a + b i)^2 = (a^2 - b^2) + 2 a b i , and a_L^2 - b_R^2 < a^2 - b^2 < a_R^2 - b_L^2 and 2 a_L b_L < 2 a b < 2 a_R b_R ,
uh,
(a_R^2 - b_L^2) - (a_L^2 - b_R^2) = (a_R^2 - a_L^2) + (b_R^2 - b_L^2) = 2 (a_R - a_L) ((a_L + a_R)/2) + 2 (b_R - b_L) ((b_L + b_R)/2)
And, a_R b_R - a_L b_L = a_R b_R - a_R b_L + a_R b_L - a_L b_L = a_R (b_R - b_L) + (a_R - a_L) b_L
So, looks like the size of the intervals are like, (size of the actual coordinate) * (size of interval in previous step), times like, 2?
I mean, 30 means 2^30 sig digits. That means any operation on those are a bit expensive in pure memory throughput. Squaring that, let's use an FFT then it's just O(n log n) for the next step, with n being number of digits here. Or 30 * 2^30 ops, let's call it 2^35 ops. We're at 32 MFlops for a single dot. Let's only do a 1280*768 image, that's 32 GFlops to compute that image.
That's a perfectly nice rate, let's pretend it's half a second because all memory access was just perfectly sequential. (It's not, it's more).
Each additional step more than doubles the time spent. Which means even at 6 more iterations, we're staring at half a minute compute time. Nobody waits half a minute for anything. (Unless it's a web page downloading JS)
Now add the fact that I lied, memory access can't be fully sequential here. But even if it was, we're at least spilling into L3 cache, and that means some 40 cycles penalty every time we do. We'll do that a lot. So multiply time with 10 or so (I'm still hopelessly optimistic we have some efficiencies.), and suddenly even 30 iterations cost you 5 seconds.
And so, yes, it's really expensive even for modern PCs.
If you have a GPU, you can shuffle it off to the GPU, they're pretty good at FFTs - but even that's going to buy you, if we're really really optimistic, another 20 or so iterations before it bogs down. Doubling digits every cycle will make any hardware melt quickly.
I thought that 0.5 ought to be rounded to 1, mirroring 0.0 rounded to 0 !
But I can now see how this only works for reals, while anything non-continuous (even rationals ?) could end up with compounding errors...
https://mathworld.wolfram.com/Henon-HeilesEquation.html
and see these chaos zones that look like a field of random dots that probably miss a lot of the structure. The reason why we know those areas are dense with unstable periodic orbits is because they are also dense with periodic orbits. It doesn’t seem feasible to accurately track a system for long enough to believe those.
At that time integer math was not only faster than floating point, but the only thing available to most of us. Processors like the 80386 did not come with any floating point facilities whatsoever. The FPU was a separate unit (aptly called a coprocessor at the time) which was expensive and hard to obtain where I lived.
[1] The first one was DKBTrace, which lives on as POVRay. Thank you David K. Buck, where ever you are now. I learned a lot from you.
[2] https://fractint.org/
[3] Technically we should call it fixed point I guess, but in the end it was all done with the integer instructions of the CPU. So integer math is not completely wrong.
Edit: Just noticed you mentioned Povray too. Are you me? Wow, lol.
The 80286 needed a 80287 - I bought one to run AutoCAD (1MB RAM!)
I can't quite remember the exact order of events but either or both 80386 and 80486 had a co-pro initially. Then Intel re-invented the IBM magic screwdriver. They dropped the 80 prefix and added SX or DX and an i prefix for a laugh. SX had no co-pro or it was disabled and DX was the full fat model.
They really started to screw the punter with the 586 ... Pentium. The trend has continued - Core iBollocks.
ref https://en.wikipedia.org/wiki/X87#80387
The 80386DX was the full 32-bit data bus version. The 80386SX was a cost reduced version with a 16-bit data bus (akin to the difference between the 8086 & 8088).
The 80486DX had an FPU; the 80486SX had a disabled or missing FPU (depending on vintage). You could pair the 80486SX with an 80487 FPU, which was just a full 80486DX in a different pinout that took over from the crippled part.
While some Intel doco did start calling them Intel386 & Intel486, they didn't drop the 80 until Pentium. Intel lost a court case trying to prevent clone makers from calling their chips '80486' or whatever because the judge ruled you can't keep your competitor from using a number.
Weitek made floating point units that paired to the 386 and 486. Some motherboards supported this, and a small amount of software did (early Autocad).
Correcting one down thread, the 486 is more than just a 386+387. There a cache, it's pipelined, and there are some changes to the ISA. 4x the transistors.
In the mid 1980's I typed in a Mandlebrot set program from a magazine. It was for the Apple 2, written in BASIC. It took around 12 hours to generate a 280x192 monochrome Mandlebrot.
Deleted Comment
(The boost thing the article links says their quad precision type is “functionally equivalent” to standard types, but uses a different bit layout.)
[1] https://en.wikipedia.org/wiki/Quadruple-precision_floating-p...
https://mathr.co.uk/blog/2021-05-14_deep_zoom_theory_and_pra...
IIRC it is calculating a few points at high precision and then estimating the rest.