I found the following note for -ffinite-math-only and -fno-signed-zeros quite worrying:
The program may behave in strange ways (such as not evaluating either the true or false part of an if-statement) if calculations produce Inf, NaN, or -0.0 when these flags are used.
I always thought that -ffast-math was telling the compiler. "I do not care about floating point standards compliance, and I do not rely on it. So optimize things that break the standard".
But instead it seems like this also implies a promise to the compiler. A promise that you will not produce Inf, NaN, or -0.0.
Whilst especially Inf and NaN can be hard to exclude.
This changes the flag from saying "don't care about standards, make it fast" to "I hereby guarantee this code meets a stricter standard" where also it becomes quite hard to actually deduce you will meet this standard. Especially if you want to actually keep your performance. Because if you need to start putting all divisions in if statements to prevent getting Inf or NaN, that is a huge performance penalty.
The whole rationale for the standard for floating-point operations is to specify the FP operations with such properties that a naive programmer will be able to write programs which will behave as expected.
If you choose any option that is not compliant with the standard, that means, exactly as you have noticed, that you claim that you are an expert in FP computations and you know how to write FP programs that will give the desired results even with FP operations that can behave in strange ways.
So yes, that means that you become responsible to either guarantee that erroneous results do not matter or that you will take care to always check the ranges of input operands, as "okl" has already posted, to ensure that no overflows, underflows or undefined operations will happen.
> So yes, that means that you become responsible to either guarantee that erroneous results do not matter or that you will take care to always check the ranges of input operands, as "okl" has already posted, to ensure that no overflows, underflows or undefined operations will happen.
This is why I dislike the fact that it removes simple methods for checking if values are NaN, for example.
I do not find it ergonomic to disable these checks.
Generally if you're seeing a NaN/Inf something has gone wrong, It's very difficult to gracefully recover from and if you tried I think you would lose both sanity and performance!
Regarding performance, the cost of a real division is about 3-4 orders worse performance than an if statement that is very consistent, but the usual way is to have fast/safe versions of functions, where you need performance and can deduce if something is always/never true, create/use the fast function but by default everything uses the slower/safer function.
An example use of Inf is bounds on a value. Say you want to express that x sometimes has upper and/or lower bounds. With Inf you can simply write
l <= x <= u
accepting that l is sometimes -Inf and u is sometimes +Inf. Without Inf, you get four different cases to handle. This is particularly handy when operations get slightly more complex, like transforming
l' <= x + b <= u'
into the canonical form above, for some finite b. With Inf you can simply write
l = l' - b
u = u' - b
and things will work as one expects. Again, without Inf, multiple cases to handle correctly.
I often use NaNs on purpose to designate “unknown”, and its properties of “infecting” computations mean that I don’t have to check every step - just at the end of the computation.
R goes even farther, and uses one specific NaN (out of the million-billions) to signal “not available”, while other NaNs are just NaNs.
Its properties, used properly, make code much more simple and sane.
NaN usually indicates a bug or algorithmic problem, but not always. I have had cases where I need to detect NaN, present a useful message to the user even though everything is working as designed.
The checks could be expensive though, more expensive than the gains from this optimization. Often it is better to let the calculation proceed and use isnan to check the final result.
That's where more advanced type systems shine. You get to encode things like "this number is not null" and a safe division function will only take "not nulls" as the input.
The workflow would then be:
1. prove that the input is not null (if input == 0) yielding a "not null number"
2. running your calculation (no check needed for multiplication or division, if you add something you have to reprove that the number is not null)
"Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."
The obvious implication is that is that arguments and results are always finite and as a programmer you are responsible of guaranteeing the correct preconditions.
Certainly do not use finite-math-only when dealing with external data.
> I always thought that -ffast-math was telling the compiler. "I do not care about floating point standards compliance, ... So optimize things that break the standard". But instead it seems like this also implies a promise to the compiler [that] you will not produce Inf, NaN, or -0.0.
I see it as both, as with many such flags¹n though as you say it does put the onus on the developer to avoid inputs the optimisation may cause problems for. You are saying “I promise that I'm being careful not to do things that will be affected by these parts of the standard, or won't blame you for undefined behaviour if I do, so you can skip those rules if it helps you speed things up”.
[1] like those that enable optimisations which can cause significant breakage if pointer aliasing is happening
That's the way a number of those special flags work. They are a contract to the compiler saying "act like this can't happen, I promise it won't".
The picking and choosing specific flags enable might be better in general. The flag `-funsafe-math-optimizations` seems to be useful as general-purpose, although as this article mentions, some of those optimizations need the more unsafe ones to become active.
I would say that `-ffast-math` is a dangerously safe-sounding flag, `-funsafe-fast-math` would be better IMO.
Not only math optimizations are like this. Strict aliasing is a promise to the compiler to not make two pointers of different types point to the same address.
You technically make that promise by writing in C (except a narrow set of allowed conversions) but most compilers on standard settings do let you get away with it.
I imagine it means that the compiler could turn an if(x<6)-else into two free standing if statement like if(x<6), if(x>=6). Which will logically work under these assumptions.
In Julia you can do `@fastmath ...`, which is basically just a find-and-replace of math operations and does not propagate into functions called in that code block, even when they are ultimately inlined. So what it does is:
julia> @macroexpand @fastmath x + y
:(Base.FastMath.add_fast(x, y))
And maybe that's a good thing, because the scope of @fastmath is as limited as it gets.
Yeah, I really like that Julia and LLVM allow applying it on a per-operation basis.
Because most of LLVM's backends don't allow for the same level of granularity, they do end up propagating some information more than I would like.
For example, marking an operation as fast lets LLVM assume that it does not result in NaNs, letting nan checks get compiled away even though they themselves are not marked fast:
julia> add_isnan(a,b) = isnan(@fastmath(a+b))
add_isnan (generic function with 1 method)
julia> @code_llvm debuginfo=:none add_isnan(1.2,2.3)
define i8 @julia_add_isnan_597(double %0, double %1) #0 {
top:
ret i8 0
}
meaning it remains more dangerous to use than it IMO should be.
For this reason, LoopVectorization.jl does not apply "nonans".
with gcc you can also use #pragma gcc optimize or __attribute__(optimize(...)) for a similar effect.
It is not 100% bug free (at least it didn't use to) and often it prevents inlining a function into another having different optimization levels (so in practice its use has to be coarse grained).
Try it with and without the pragma, and adding -ffast-math to the compiler command line. It seems that with the pragma sqrt(x) * sqrt(x) becomes sqrt(x*x), but with the command line version it is simplified to just x.
Fascinating to learn that something labelled 'unsafe' stands a reasonable chance of making a superior mathematical/arithmetic choice (even if it doesn't match on exactly to the unoptimized FP result).
If you're taking the ratio of sine and cosine, I'll bet that most of the time you're better off with tangent....
Excellent post I agree, but isn’t it more about how they can’t optimize? Compilers can seem magic how they optimize integer math but this is a great explanation why not to rely on the compiler if you want fast floating point code.
• People imagine enabling fast float by "scope", but there's no coherent way to specify that when it can mix with closures (even across crates) and math operators expand to std functions defined in a different scope.
• Type-based float config could work, but any proposal of just "fastfloat32" grows into a HomerMobile of "Float<NaN=false, Inf=Maybe, NegZeroEqualsZero=OnFullMoonOnly, etc.>"
• Rust doesn't want to allow UB in safe code, and LLVM will cry UB if it sees Inf or Nan, but nobody wants compiler inserting div != 0 checks.
• You could wrap existing fast intrinsics in a newtype, except newtypes don't support literals, and <insert user-defined literals bikeshed here>
It must be clearly understood which of the flags are entirely safe and which need to be under 'unsafe', as a prerequisite.
Blanket flags for the whole program do not fit very well with Rust, while point use of these flags is inconvenient or needs new syntax.. but there are discussions about these topics.
Also maybe you woild like to have more granular control over which parts of your program has the priority on speed and which part favours accuracy. Maybe this could be done with a seperate type (e.g. ff64) or a decorator (which would be useful if you want to enable this for someone elses library).
The whole point of ALGOL derived languages for systems programming, is not being fast & loose with anything, unless the seatbelt and helmet are on as well.
I don't know if most Rust programmers would be happy with any fast and loose features making it into the official Rust compiler.
Besides, algorithms that benefit from -ffast-math can be implemented in C and with Rust bindings automatically generated.
This solution isn't exactly "simple", but it could help projects keep track of the expectations of correctness between different algorithm implementations.
Rust should have native ways to express this stuff. You don't want the answer to be "write this part of your problem domain in C because we can't do it in Rust".
Denormal floats are not a purpose, they are not used intentionally.
When the CPU generates denormal floats on underflow, that ensures that underflow does not matter, because the errors remain the same as at any other floating-point operation.
Without denormal floats, underflow must be an exception condition that must be handled somehow by the program, because otherwise the computation errors can be much higher than expected.
Enabling flush-to-zero instead is an optimization of the same kind as ignoring integer overflow.
It can be harmless in a game or graphic application where a human will not care if the displayed image has some errors, but it can be catastrophic in a simulation program that is expected to provide reliable results.
Providing a flush-to-zero option is a lazy solution for CPU designers, because there have been many examples in the past of CPU designs where denormal numbers were handled without penalties for the user (but of course with a slightly higher cost for the manufacturer) so there was no need for a flush-to-zero option.
Flushing denormals to zero only matters if your calculations are already running into the lower end of floating-point exponents (and even with denormals, if they're doing that, they're going to run into lost precision anyway sooner or later).
The useful thing denormals do is make the loss of precision at that point gradual, instead of sudden. But you're still losing precision, and a few orders of magnitude later you're going to wind up at 0 either way.
If your formulas are producing intermediate results with values that run into the lower end of FP range, and it matters that those values retain precision there, then you're either using the wrong FP type or you're using the wrong formulas. Your code is likely already broken, the breakage is just rarer than in flush-to-zero mode.
So just enable FTZ mode, and if you run into issues, you need to fix that code (e.g. don't divide by values that can be too close to 0), not switch denormals on.
Thanks for this explanation. But yeah, I meant : are there applications where use of denormals vs. flush-to-zero is actually useful... ? If your variable will be nearing zero, and e.g. you use it as a divider, don't you need to handle a special case anyway ?
Just like you should handle your integer overflows.
I'm seeing denormals as an extension of the floating point range, but with a tradeoff that's not worth it. Maybe I got it wrong ?
If you want maximum possible accuracy for FP computation, you need to pay the performance price that denormals incur. In many cases, that will likely be a long running computation where responsiveness from a user-perspective and/or meeting realtime deadlines is not an issue. Any time "I want the best possible answer that we can compute", you should leave denormals enabled.
By contrast, in media software (audio and/or video) denormals getting flushed to zero is frequently necessary to meet performance goals, and ultimately has no impact on the result. It's quite common for several audio FX (notably reverb) to generate denormals as the audio fades out, but those values are so far below the physical noise floor that it makes no difference to flush them to zero.
It's not that you decide specifically to use them, but sometimes they sort of happen very naturally. For example, if you evaluate a Gaussian function f(x)=e^(-x^2), for moderate values of x the value f(x) is denormal.
The program may behave in strange ways (such as not evaluating either the true or false part of an if-statement) if calculations produce Inf, NaN, or -0.0 when these flags are used.
I always thought that -ffast-math was telling the compiler. "I do not care about floating point standards compliance, and I do not rely on it. So optimize things that break the standard".
But instead it seems like this also implies a promise to the compiler. A promise that you will not produce Inf, NaN, or -0.0. Whilst especially Inf and NaN can be hard to exclude.
This changes the flag from saying "don't care about standards, make it fast" to "I hereby guarantee this code meets a stricter standard" where also it becomes quite hard to actually deduce you will meet this standard. Especially if you want to actually keep your performance. Because if you need to start putting all divisions in if statements to prevent getting Inf or NaN, that is a huge performance penalty.
If you choose any option that is not compliant with the standard, that means, exactly as you have noticed, that you claim that you are an expert in FP computations and you know how to write FP programs that will give the desired results even with FP operations that can behave in strange ways.
So yes, that means that you become responsible to either guarantee that erroneous results do not matter or that you will take care to always check the ranges of input operands, as "okl" has already posted, to ensure that no overflows, underflows or undefined operations will happen.
This is why I dislike the fact that it removes simple methods for checking if values are NaN, for example. I do not find it ergonomic to disable these checks.
Regarding performance, the cost of a real division is about 3-4 orders worse performance than an if statement that is very consistent, but the usual way is to have fast/safe versions of functions, where you need performance and can deduce if something is always/never true, create/use the fast function but by default everything uses the slower/safer function.
That’s a bold claim.
R goes even farther, and uses one specific NaN (out of the million-billions) to signal “not available”, while other NaNs are just NaNs.
Its properties, used properly, make code much more simple and sane.
Cumbersome but not that difficult: Range-check all input values.
The workflow would then be:
Easy.
"Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."
The obvious implication is that is that arguments and results are always finite and as a programmer you are responsible of guaranteeing the correct preconditions.
Certainly do not use finite-math-only when dealing with external data.
so... never?
I see it as both, as with many such flags¹n though as you say it does put the onus on the developer to avoid inputs the optimisation may cause problems for. You are saying “I promise that I'm being careful not to do things that will be affected by these parts of the standard, or won't blame you for undefined behaviour if I do, so you can skip those rules if it helps you speed things up”.
[1] like those that enable optimisations which can cause significant breakage if pointer aliasing is happening
The picking and choosing specific flags enable might be better in general. The flag `-funsafe-math-optimizations` seems to be useful as general-purpose, although as this article mentions, some of those optimizations need the more unsafe ones to become active.
I would say that `-ffast-math` is a dangerously safe-sounding flag, `-funsafe-fast-math` would be better IMO.
You technically make that promise by writing in C (except a narrow set of allowed conversions) but most compilers on standard settings do let you get away with it.
Do you mean `if-else if` or `if-else`? Because the second would work and would always short-circuit to else.
Deleted Comment
Because most of LLVM's backends don't allow for the same level of granularity, they do end up propagating some information more than I would like. For example, marking an operation as fast lets LLVM assume that it does not result in NaNs, letting nan checks get compiled away even though they themselves are not marked fast:
meaning it remains more dangerous to use than it IMO should be. For this reason, LoopVectorization.jl does not apply "nonans".It is not 100% bug free (at least it didn't use to) and often it prevents inlining a function into another having different optimization levels (so in practice its use has to be coarse grained).
https://gcc.godbolt.org/z/voMK7x7hG
Try it with and without the pragma, and adding -ffast-math to the compiler command line. It seems that with the pragma sqrt(x) * sqrt(x) becomes sqrt(x*x), but with the command line version it is simplified to just x.
If you're taking the ratio of sine and cosine, I'll bet that most of the time you're better off with tangent....
but better support dies in endless bikeshed of:
• People imagine enabling fast float by "scope", but there's no coherent way to specify that when it can mix with closures (even across crates) and math operators expand to std functions defined in a different scope.
• Type-based float config could work, but any proposal of just "fastfloat32" grows into a HomerMobile of "Float<NaN=false, Inf=Maybe, NegZeroEqualsZero=OnFullMoonOnly, etc.>"
• Rust doesn't want to allow UB in safe code, and LLVM will cry UB if it sees Inf or Nan, but nobody wants compiler inserting div != 0 checks.
• You could wrap existing fast intrinsics in a newtype, except newtypes don't support literals, and <insert user-defined literals bikeshed here>
Blanket flags for the whole program do not fit very well with Rust, while point use of these flags is inconvenient or needs new syntax.. but there are discussions about these topics.
At least it doesn't need the errno ones since rust don't use errno.
Besides, algorithms that benefit from -ffast-math can be implemented in C and with Rust bindings automatically generated.
This solution isn't exactly "simple", but it could help projects keep track of the expectations of correctness between different algorithm implementations.
From
to the form What happens when x or y are NaN?My read of this is that comparisons involving NaN on either side always evaluate to false.
In the first one if X or Y is NaN then you'll get do_something_else, and in the second one you'll get do_something.
As far as why one order would be more optimal than the other, I'm not sure. Maybe something to do with branch prediction?
[0] https://www.gnu.org/software/libc/manual/html_node/Infinity-...
When the CPU generates denormal floats on underflow, that ensures that underflow does not matter, because the errors remain the same as at any other floating-point operation.
Without denormal floats, underflow must be an exception condition that must be handled somehow by the program, because otherwise the computation errors can be much higher than expected.
Enabling flush-to-zero instead is an optimization of the same kind as ignoring integer overflow.
It can be harmless in a game or graphic application where a human will not care if the displayed image has some errors, but it can be catastrophic in a simulation program that is expected to provide reliable results.
Providing a flush-to-zero option is a lazy solution for CPU designers, because there have been many examples in the past of CPU designs where denormal numbers were handled without penalties for the user (but of course with a slightly higher cost for the manufacturer) so there was no need for a flush-to-zero option.
The useful thing denormals do is make the loss of precision at that point gradual, instead of sudden. But you're still losing precision, and a few orders of magnitude later you're going to wind up at 0 either way.
If your formulas are producing intermediate results with values that run into the lower end of FP range, and it matters that those values retain precision there, then you're either using the wrong FP type or you're using the wrong formulas. Your code is likely already broken, the breakage is just rarer than in flush-to-zero mode.
So just enable FTZ mode, and if you run into issues, you need to fix that code (e.g. don't divide by values that can be too close to 0), not switch denormals on.
I'm seeing denormals as an extension of the floating point range, but with a tradeoff that's not worth it. Maybe I got it wrong ?
By contrast, in media software (audio and/or video) denormals getting flushed to zero is frequently necessary to meet performance goals, and ultimately has no impact on the result. It's quite common for several audio FX (notably reverb) to generate denormals as the audio fades out, but those values are so far below the physical noise floor that it makes no difference to flush them to zero.
Wait, what? What is x + -0.0 then? What are the special cases? The only case I can think of would be 0.0 + -0.0.
https://en.wikipedia.org/wiki/Signed_zero#Arithmetic