What I am big on is forcing developers to make deliberate choices. That's why I like React's policy of naming functionality "dangerouslySetInnerHTML" or "__SECRET_DOM_DO_NOT_USE_OR_YOU_WILL_BE_FIRED".
If you add usages for these in a PR I'm reviewing without justification, it's not getting merged.
So why not make cryptographically unsafe random unsafeRandom() or shittyRandom() or iCopyPastedThisFromStackOverflowRandom()?
This assumes writing crypto code is the most common use case for random numbers.
How often do you write crypto code?
vs
How often do people use random numbers + threshold for A/B tests? How often do game developers use random numbers for gameplay variety? How often is random used for animation variety? Do these use cases need the overhead of a cryptography RNG?
A former employer had the same issue as in the article - the security team implemented an automated vulnerability scanner in our github enterprise instance, and it spammed comments and marked a review as requiring changes if it edited any merge request which touched a file which used java.util.Random. It lasted a day before the security team was made turn it off as on our team (and many others), literally 0 uses of random numbers were those requiring a secure random.
I genuinely don't see the reason why non-cryptographic random number generators exist outside of niche applications.
The main arguments I've seen are speed and determinism.
However, a cryptographically secure, deterministic PRNG can be built from hash or block cipher primitives that have hardware acceleration, making them quite fast. Seed (and potentially periodically re-seed) it from a strong source of randomness, and you've got a fast and cryptographically secure non-deterministic PRNG.
I thought that "classic" PRNGs like the widespread Mersenne Twister even had issues that can cause practical problems when used in certain kinds of simulations (Monte Carlo, possibly) that rely on large amounts of random numbers, but I haven't been able to find a clear source for this.
I'm certainly defaulting to secure ones, and I'm surprised modern languages and libraries don't do this by default for their standard randomness functions.
> It lasted a day before the security team was made turn it off as on our team (and many others), literally 0 uses of random numbers were those requiring a secure random.
can concur, currently approaching 400k SLOC of C++ in the repo. A few dozens different places crop up where random is needed (with a quick and dirty grepping). Literally 0% is for secure stuff. Most of it has to be as fast as possible (and very low quality, as it just needs to be random / noisy enough to look random for human perception)
This just kind of proves GP's point. Random APIs usually tell you what the RNG is, but not the why/how. Most people don't care if it's /dev/(u)random, Mersenne twister, PCG, LFSR, LCG, RDRAND, etc. They care about roughly 4 attributes:
- Is it good for crypto
- Is it fast
- Is it reproducible
- Is it portable
But fundamentally, it's about the use case and interface:
- I need secure random (strong, slow, secure)
- I need Monte Carlo (good enough, fast, reproducible)
- I need chaotic behavior for my game/stress test/back off protocol (usually can be barely random, fast, reproducible)
I think calling the last case InsecureRandom or RandomEnough is reasonable to convey "don't use me for secure purposes".
Interestingly a major aspect of video game speed running is figuring out how the game generates random numbers then exploiting the knowledge. For example speedrunners
avoid all random battles in an rpg with this tactic. I'm not arguing games need true random for the record.
What does that have to do with being verbose and letting developers know they are using an insecure method when it would apply to them?
If I'm writing code and using rng for gameplay variety, and then I notice that I have to use a function called "insecureRandom", at the very least I'm going to read up on an interesting aspect of computing and be a little more informed at the end of the day.
The other thing is, true randomness doesn't seem random to humans. Which is why spotify and others had to modify shuffle. So true random might not be appropriate for the use case.
The creation of a trueRandom function certainly seems to solve this problem more than taking away a useful tool for cases where pseudo-random is good enough.
It's also a good idea to give safer things shorter names.
So make random() a CSPRNG (and an alias for SecureRandom() for people who want to be explicit) while InsecureFastRandom() is just what it says and has no other name. Then if you really need performance over unpredictability, it's there, but nobody is confused about what they're getting. And lazy people who don't like to type or pay close attention get the safe one.
random() should be the most universally applicable random which includes making it as secure as possible. Non-universally applicable randoms should be named accordingly.
Then we’ll end up with a csprng getting used in a tight loop iterating over every pixel in a raytracer...
“Lazy people who don’t want to type” are not the sort of people I want writing the code I might use or interact with that requires cryptographically secure random numbers...
Yeah, and if I remember correctly one neat technique for exploiting security vulnerabilities in Firefox was to use them in order to set turn_off_all_security_so_that_viruses_can_take_over_this_computer to true, with obvious results.
Rust's "unsafe" is a pretty bad name and completely different reasoning behind using it. It doesn't mark something as "dangerously unsafe, don't use", to a consumer it indicates "exercise caution" and to a compiler it just allows 5 things:
Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of unions
The point of "unsafe" in rust is to highlight which area requires more human attention... not to discourage its usage.
`dangerouslySetInnerHTML` is literally dangerous and allows XSS if used with outside input.
It also is faster than the other variant. The same is true for `random()`. Both can be used when you know what are you doing to gain some performance.
Meanwhile, `unsafe` rust by itself is not different from safe rust in terms of speed. You have no choice, but to use it places it supposed to be used.
I agree, with the caveat that use of random for cryptography is actually a domain specific use case.
It's probably okay to leave the function as it is and just drill into people that if you're doing cryptography, you either need to know exactly what you're doing all the way down to the hardware or you need to leave it a task for somebody else more specialized than you. I, for one, never assume random() is cryptographically secure, but it might be because I grew up programming during the era where random was computed off of clock cycles since CPU startup because there wasn't much other cheap entropy to lay a hand on ("battery-backed onboard date clock?! Oh, look who has AKERS money!").
Beginner friendliness is something to remember, too. There are half a dozen words you could use to describe pseudoRandom(). Random() is easy for a first year or non-professional to remember.
Most of the time the people who write and name the functions don't know it's not secure or safe. So you would still need to ban random when the new name is implemented.
The "ban" can be evaded by telling semgrep to ignore it for one line. https://semgrep.dev/docs/ignoring-findings/ This doesn't really scale though - if someone bans it with a different tool, you'd have to tell each tool to ignore this line.
I have required parameter to push our app to production called: YES_I_HAVE_ALREADY_MERGED_THE_LIB_REPOS_AND_WAITED_FOR_THEM_TO_COMPLETE_BEFORE_MERGING_THE_APP_REPOS
Gets the point across and will still work when I'm long gone.
So you’re not big on bans but if you use dangerouslySetInnerHTML then it’s definitely not getting merged? Is that not a ban? Do you just not like when tooling enforces it?
No, as I said, it would raise a red flag. That flag can be lowered by justification, e. g. if you add types or constraints to only allow safe-enough parameters etc.
The root problem here is the notion that you need to choose between "strong and slow" randomness vs. "weak and fast" randomness. If every language's random() was strong and fast, most developers would never have to think about it.
"Strong" randomness is often too slow because every time you ask for new entropy, you make a syscall. The solution is to use 32 bytes of strong randomness to seed a userspace CSPRNG. You can generate gigabytes of secure entropy per second in userspace. If you need deterministic entropy, just use the same seed.
This isn't a one-size-fits-all solution, of course. If you only need to generate a few keys now and then, it's marginally safer to make a separate syscall for each of them. If you're targeting some tiny SoC, then sure, use xorshift instead. But what we care about is the common case, and right now the common case is a developer choosing the weak, deterministic RNG because it's faster and has a more convenient API and the secure RNG says "for cryptographic purposes" and well this usecase doesn't seem like cryptography, it's just a simple load balancer...
> The solution is to use 32 bytes of strong randomness to seed a userspace CSPRNG
All cryptographic randomness generation should be performed by the kernel.
You always have to think about security because if you don't think about security you're going to get hacked. By all means, name the insecure randomness generation function ‘insecure_random’. It does help. But secure-by-default helps you only marginally because when building secure software you don't get to just use the defaults; you have to think about what they're doing.
You have to (for example) know and think about timing attacks even if you're using a cryptographic primitives library that's hardened against them, because it's really easy to introduce timing dependence into your own code and none of Daniel Bernstein or Tanja Lange’s careful designs will save you.
That's fair. There's no silver bullet for security. But we should not let the perfect be the enemy of the good. Everyone writing non-trivial systems should have some understanding of security; but the more components we make secure-by-default, the less those developers need to learn.
> You can generate gigabytes of secure entropy per second in userspace.
I haven't thought about this before, so please have patience:
I guess the "secure" qualifier does a lot of work in this sentence? That there's 32 bytes of "true entropy", but "secure entropy" is theoretically weaker but practically just as strong with reasonable assumptions about an attacker's computing resources.
So I'd guess the "secure" qualifier must mean something like "given any quantity of derived pseudorandom information, the seed bytes can't be efficiently deduced? Pretty neat. (I had a knee-jerk disagreement until I re-read your post and saw that you said "32 bytes", not "32 bits". Quite plausible -- and cool -- that we have a good solution with just a small amount more seed randomness though.)
To answer the question as to when you should use cryptographic random(), ask yourself "What is the worst that could happen if someone guesses the result of random()?"
If the answer is "I don't know," go cryptrographic. You'll save your butt if you didn't know it was important.
If the answer is along the lines of "someone could impersonate a user, or leak information they shouldn't see," for the love of all that is holy, use cryptographic. This is basically every scenario where you are using random to generate an ID of some kind, and while it's only truly critical if that ID is all you need for validation, it does provide another layer of security even if you also require other information to match before giving out elevated access.
If the answer is "it defeats the algorithm I'm trying to do" (think something like ASLR, where you're randomizing the offsets of addresses so that attackers don't know where things are located), well, the reason why you need to use cryptographic should be blindingly obvious.
If the answer is instead "they can reproduce my results," well, you shouldn't use cryptographic in this case. And that's not a lot of cases: Monte Carlo simulations, testing, fuzzing are the obvious poster children for this category, and indeed reproducibility in these cases tends to be a highly valuable feature rather than an anti-feature.
Cryptographic random is almost never harmful to your application, and almost always provides some benefit in reducing guessability of your system. You should err on the side of using cryptographic random(), and only not use it when you are sure that guessability will not harm security in any way and you know that the cryptographic nature actively harms your application.
Server-side folks generate random identifiers and shared secrets all the time. Yes, it's niche, but not "extremely" and you don't use a crypto library for this (you use secure random!)
Say you're making an online game, and you need an RNG on your server. Above all, this RNG needs to be unpredictable, or someone will easily game it. Most non-cryptographic PRNGs are very predictable, so it's dangerous to use them.
I think this is a scenario that (a) isn't "extremely niche," and (b) warrants CSPRNGs.
It's not that you shouldn't be using it necessarily, it's just that for many cases (games, procedural generation, graphics, many kind of simulations) it's unnecessary and slow. In my experience if someone doesn't know if they need a cryptographicly secure random(), or if a given random() implementation is secure then they (a) don't need it or (b) are trying to implement something they shouldn't be.
The cryptographic randomness has practically no downside if you use it for non-cryptogrpahic purposes. Not true the other way round. And I'm inclined to say given how many misconceptions around randomness there are around, I don't think people are good at knowing whether they need secure randomness.
The only possible justification for insecure randomness would be performance, but you'd need to generate a lot of random numbers to even be able to measure that.
> The cryptographic randomness has practically no downside if you use it for non-cryptogrpahic purposes
Cryptographic randomness is typically slower than other forms of randomness.
In all of the programming I've done in my career, I've only needed cryptographic randomness a few times. For the rest, a fast pseudorandom number generator seeded by the clock was the correct choice.
My counter would be that if someone "doesn't know whether they need secure randomness" then the problem is not that random() is not secure, it's the fact that someone is doing something they really should not be doing in the first place.
Obligatory mention here for the fine folks of systemd, who have made a properly seeded CSPRNG a requirement for merely booting a system and then kept bricking peoples systems when it turns out finding that seed at boot time is a non-trivial problem. All for what, avoiding collisions in some hash table implementation?
I don't really care for the browser application, if you made a TLS connection in the first place obviously you better have the randomness and might as well make random() use that, but someone explicitly using a CSPRNG in a native application is a huge code smell on the level of implementing your own crypto.
As much as it's overkill for most people, I'm a fan of safe defaults so I say let random() be slow and good. It's better to find out your code is slow due to a slow random() than to find out it's broken because you didn't know and thought random() was really random.
If you need a fast source of randomness, for some Monte Carlo algorithm for example, then you know this and can pick a deliberate pseudo-random generator that fits your needs.
I worked on a Monte Carlo path tracer. Early on we swapped out the random number generator from the standard random(). Initially not for speed, but due to the poor distribution.
After optimizing other areas it became a bottleneck and we swapped it out again for a faster one.
It is. The question was how often that's the case. If 50% of the uses of random() are bad, then getting those fixed may be worth the cost of annoying the authors of the legitimate 50%.
It turned out to be much less useful than that. So they got rid of it.
Indeed, I use something like it from a vendor supplied C math library for a noise generator on an embedded app, where I really just care about its crude statistical behavior.
But short of saying "banned," any review of security critical code should include an explanation of where the random numbers are coming from and why they're trusted. Or in general for any code review: Why do you believe your numbers?
> Eg. for randomised algorithms you need a fast source of randomness.
Though normal random() implementations are LCGs which have poor distributions when you either only look at the least significant bits or project them into multiple dimensions.
As a result they may make some randomized algorithms perform poorly!
Honestly, the one thing that got me into graphics (from physics and math) was just the incredible amount of: "you can literally do anything so long as you make it pretty in the end."
I took that as a life philosophy and it's been pretty great so far.
random typically works by storing/modifying some state, so every call with the same argument results in different numbers.
In shaders, the same code is executed in parallel for potentially every pixel. Storing state would mean pixels could only be calculated serially, slowing things down.
Hence you need a random-ish function that depends only on its input. Very low RNG quality is not a blocker, as long as things look good.
So in shaders you see a lot of random generators which simply take the pixel coordinate or something else that distinguishes 2 pixels and do some nonsense operations on them.
It's written in GLSL which is a C like language for shaders designed to be executed on the GPU.
The 2D vector used for input is to represent pixel coordinates mapped from 0 to 1 or -1 to 1 on both axes.
The magic numbers are nothing special, they are just large numbers to make the result unpredictable for the given input.
The top level function is a fract which takes the fractional part of a number. So if the result of the inner computation is twisted enough it will be hard to trace it back to the original values. There are lots of variations for these one liners, most of the do a great job to produce noise.
Just make Math.random() cryptographically secure, now all your apps are fixed, and no existing code broken. I can't imagine anything relying on Math.random() being "less" random than a CSRNG.
Why must CSRNGs always have alternative obtuse APIs. We're still stuck on C style srand() + rand().
Cryptography is so ubiquitous now that failure to provide cryptographically secure random numbers should be viewed as a hardware flaw.
Some folks purposely want random-ish results. When OpenBSD was changing the behaviour of its legacy POSIX random functions it was observed:
This API is used in two patterns:
1. Under the assumption it provides good random numbers.
This is the primary usage case by most developers.
This is their expectation.
2. A 'seed' can be re-provided at a later time, allowing
replay of a previous "random sequence", oh wait, I mean
a deterministic sequence...
They went through the code, especially the third-party packages/ports, to identify uses:
> Differentiating pattern 1 from pattern 2 involved looking at the seed being given to the subsystem. If the software tried to supply a "good seed", and had no framework for re-submitting a seed for reuse, then it was clear it wanted good random numbers. Those ports could be eliminated from consideration, since they indicated they wanted good random numbers.
> This left only 41 ports for consideration. Generally, these are doing reseeding for reproduceable effects during benchmarking. Further analysis may show some of these ports do not need determinism, but if there is any doubt they can be mindlessly modified as described below.
This is exactly what I mean about being stuck in the C mindset. You're looking at the problem though the lens of what this giant pile of ancient C software does.
Why should newer languages take the same approach to APIs that these old code bases did? It's not like we're porting all those programs to JS. Those C APIs were written long before hardware could provide fast good randomness, heck even before cryptography was standard practice instead of a special use case.
Not to mention in JS you can't even seed the random number generator. If you want predictable "random" numbers, you should have to jump through additional hoops. By default random numbers should be cryptographically secure.
EDIT: It's also worth mentioning that from your reported dataset, 41 of 8800 programs analyzed used srand to get a repeatable set of "random" numbers. That's 0.47%. I'm happy to break less than half a percent of software if it helps prevent the far more ubiquitous failures of software using insecure random numbers.
Unless I'm wrong, the only valid reason to not use a better random number generator is for performance / simplicity, which then demands benchmarks and evaluation.
Seeded random is a glorious thing in the right circumstances. As an example, I've used it for 'random' testing sequences (jumbling up a list of inputs) but in a way I can later re-run EXACTLY the same test.
It's also useful for other data generation tasks where the output can basically be saved as a seed, making it lightweight and easy to store - it could be written it on a scrap of paper in seconds.
Maybe it's a bad name though - it should be called seededRandom() or semiRandom() or deterministicRandom(). Or perhaps it should be true random is no seed is set. Hard to know. Maybe the true random only needs to be the seed to a deterministic random and reset on a frequent basis in some cases.
Then there's the category of casual random that doesn't matter, like random colours just for the sake of it. It doesn't need to be a secure safe random.
And... assuming that any random function is truly random is a mistake anyway. Basing on hardware, and it may fail. Base it on software and where's the source of entropy. Add to that the possibility of bugs/defects in the implementation, and it's possible that it might not be as random as it needs to be. It's better to assume ALL RNGs are PRNGs, with the caveat that some are decidedly better than others.
So no I wouldn't support a ban on it, nor would I support removing it from any language/runtime where it might be useful.
Came here to say this. I have spent a lot of time in hardware validation. Pseudo-random (explicitly NOT random) sequences are hugely useful.
I once had a lights out server room of 60 servers whose entire purpose was to take skeletonized tests and a seed for a pseudo-random function and generate a test instance. That test instance went to one of a dozen test jigs. What was recorded was: pass/fail, the git sha of the template, and the seed. Any failing test could be reproduced at any time from just the git sha and the seed. True random would have killed that whole methodology.
That sounds awesome, and the use of a repeatable random vital. Another example of random in a non-cryptography context where it's unpredictable under normal operation, but completely predictable when needed. If you wanted you could run the same test on all the test jigs with different seeds, safe in the knowledge that you could re-run all of them exactly again and again if required. Or you could add problematic seeds to a list for repeated retest with future versions. So much power and freedom!
There are many uses of random() that do not require cryptographic security: simulation, simulated annealing, sound synthesis, digital signal processing and the like. It would be a nuisance if developers of those kinds of software have to fight warnings because developers of completely different applications can't get it right.
Further, such users usually want to be able to repeat a test case: start from the same seed, get the same sequence. They don't want true randomness, they want a repeatable sequence with good statistical properties.
It's all about the discipline of the team in the end... You can ban things all day, but it just takes 2 developers deciding they don't give a shit to code, review & merge that use-fast-random-for-session-token PR. There is more than 1 way to get something that is "random", so basic string matching for methods you don't like is certainly not a guarantee.
In our organization the policy is very simple. We have static method available throughout called CryptographyService.GenerateCsprngBytes(count = 64). All developers are aware that any security-sensitive requirements around entropy must use this method. It wraps the OS-level offering, and encourages a minimum reasonable level of entropy with a default count.
I don't see any reason to make it more complicated than this. Communication with your team is more important than writing check-in rules to prevent bad things from happening.
As for other uses of Math.Random, et. al., we don't have any official policy. Because we have clearly communicated the awareness that security-sensitive applications should always use the secure method, we don't need to add a bunch of additional bandaids on top. Enrich the team before the process.
> Communication with your team is more important than writing check-in rules to prevent bad things from happening.
There's some subtlety here. This is sort of a security vs safety issue.
Some people are just reckless, and that's a human problem that is best dealt with through a stern talking to (or, ultimately, termination) rather than technical measures. You'd require an oppressive amount of check-in rules in place to be even remotely effective at stopping this behaviour, and those would just make life miserable for everybody else.
Some people are new to the team and/or just plain inexperienced, and it takes time for them to absorb all the standard practices so they can innocently cause trouble. Even veterans will make mistakes. Low friction guard rails can help keep those people from getting into too much trouble without being too onerous.
What I am big on is forcing developers to make deliberate choices. That's why I like React's policy of naming functionality "dangerouslySetInnerHTML" or "__SECRET_DOM_DO_NOT_USE_OR_YOU_WILL_BE_FIRED".
If you add usages for these in a PR I'm reviewing without justification, it's not getting merged.
So why not make cryptographically unsafe random unsafeRandom() or shittyRandom() or iCopyPastedThisFromStackOverflowRandom()?
How often do you write crypto code?
vs
How often do people use random numbers + threshold for A/B tests? How often do game developers use random numbers for gameplay variety? How often is random used for animation variety? Do these use cases need the overhead of a cryptography RNG?
A former employer had the same issue as in the article - the security team implemented an automated vulnerability scanner in our github enterprise instance, and it spammed comments and marked a review as requiring changes if it edited any merge request which touched a file which used java.util.Random. It lasted a day before the security team was made turn it off as on our team (and many others), literally 0 uses of random numbers were those requiring a secure random.
The main arguments I've seen are speed and determinism.
However, a cryptographically secure, deterministic PRNG can be built from hash or block cipher primitives that have hardware acceleration, making them quite fast. Seed (and potentially periodically re-seed) it from a strong source of randomness, and you've got a fast and cryptographically secure non-deterministic PRNG.
I thought that "classic" PRNGs like the widespread Mersenne Twister even had issues that can cause practical problems when used in certain kinds of simulations (Monte Carlo, possibly) that rely on large amounts of random numbers, but I haven't been able to find a clear source for this.
I'm certainly defaulting to secure ones, and I'm surprised modern languages and libraries don't do this by default for their standard randomness functions.
can concur, currently approaching 400k SLOC of C++ in the repo. A few dozens different places crop up where random is needed (with a quick and dirty grepping). Literally 0% is for secure stuff. Most of it has to be as fast as possible (and very low quality, as it just needs to be random / noisy enough to look random for human perception)
- Is it good for crypto
- Is it fast
- Is it reproducible
- Is it portable
But fundamentally, it's about the use case and interface:
- I need secure random (strong, slow, secure)
- I need Monte Carlo (good enough, fast, reproducible)
- I need chaotic behavior for my game/stress test/back off protocol (usually can be barely random, fast, reproducible)
I think calling the last case InsecureRandom or RandomEnough is reasonable to convey "don't use me for secure purposes".
What does that have to do with being verbose and letting developers know they are using an insecure method when it would apply to them?
If I'm writing code and using rng for gameplay variety, and then I notice that I have to use a function called "insecureRandom", at the very least I'm going to read up on an interesting aspect of computing and be a little more informed at the end of the day.
What is appropriate to use comes down to context.
I don't disagree with anything you say, but it also doesn't refer to my comment.
So make random() a CSPRNG (and an alias for SecureRandom() for people who want to be explicit) while InsecureFastRandom() is just what it says and has no other name. Then if you really need performance over unpredictability, it's there, but nobody is confused about what they're getting. And lazy people who don't like to type or pay close attention get the safe one.
random() should be the most universally applicable random which includes making it as secure as possible. Non-universally applicable randoms should be named accordingly.
“Lazy people who don’t want to type” are not the sort of people I want writing the code I might use or interact with that requires cryptographically secure random numbers...
https://searchfox.org/mozilla-central/rev/3ff133d19f87da2ba0...
Rust's "unsafe" is a pretty bad name and completely different reasoning behind using it. It doesn't mark something as "dangerously unsafe, don't use", to a consumer it indicates "exercise caution" and to a compiler it just allows 5 things:
The point of "unsafe" in rust is to highlight which area requires more human attention... not to discourage its usage.`dangerouslySetInnerHTML` is literally dangerous and allows XSS if used with outside input.
It also is faster than the other variant. The same is true for `random()`. Both can be used when you know what are you doing to gain some performance.
Meanwhile, `unsafe` rust by itself is not different from safe rust in terms of speed. You have no choice, but to use it places it supposed to be used.
It's probably okay to leave the function as it is and just drill into people that if you're doing cryptography, you either need to know exactly what you're doing all the way down to the hardware or you need to leave it a task for somebody else more specialized than you. I, for one, never assume random() is cryptographically secure, but it might be because I grew up programming during the era where random was computed off of clock cycles since CPU startup because there wasn't much other cheap entropy to lay a hand on ("battery-backed onboard date clock?! Oh, look who has AKERS money!").
https://hackage.haskell.org/package/bytestring-0.11.0.0/docs...
Gets the point across and will still work when I'm long gone.
Deleted Comment
Deleted Comment
Deleted Comment
"Strong" randomness is often too slow because every time you ask for new entropy, you make a syscall. The solution is to use 32 bytes of strong randomness to seed a userspace CSPRNG. You can generate gigabytes of secure entropy per second in userspace. If you need deterministic entropy, just use the same seed.
This isn't a one-size-fits-all solution, of course. If you only need to generate a few keys now and then, it's marginally safer to make a separate syscall for each of them. If you're targeting some tiny SoC, then sure, use xorshift instead. But what we care about is the common case, and right now the common case is a developer choosing the weak, deterministic RNG because it's faster and has a more convenient API and the secure RNG says "for cryptographic purposes" and well this usecase doesn't seem like cryptography, it's just a simple load balancer...
That decision should never need to be made.
All cryptographic randomness generation should be performed by the kernel.
You always have to think about security because if you don't think about security you're going to get hacked. By all means, name the insecure randomness generation function ‘insecure_random’. It does help. But secure-by-default helps you only marginally because when building secure software you don't get to just use the defaults; you have to think about what they're doing.
You have to (for example) know and think about timing attacks even if you're using a cryptographic primitives library that's hardened against them, because it's really easy to introduce timing dependence into your own code and none of Daniel Bernstein or Tanja Lange’s careful designs will save you.
I haven't thought about this before, so please have patience:
I guess the "secure" qualifier does a lot of work in this sentence? That there's 32 bytes of "true entropy", but "secure entropy" is theoretically weaker but practically just as strong with reasonable assumptions about an attacker's computing resources.
So I'd guess the "secure" qualifier must mean something like "given any quantity of derived pseudorandom information, the seed bytes can't be efficiently deduced? Pretty neat. (I had a knee-jerk disagreement until I re-read your post and saw that you said "32 bytes", not "32 bits". Quite plausible -- and cool -- that we have a good solution with just a small amount more seed randomness though.)
djb has a good article on the subject: https://blog.cr.yp.to/20170723-random.html
I've implemented his "fast-key-erasure CSPRNG" in Go: https://github.com/lukechampine/frand
Eg. for randomised algorithms you need a fast source of randomness.
Cryptographic random() is an extremely niche use case that you shouldn't be using unless you're writing your own crypto libraries. (Don't do that.)
To answer the question as to when you should use cryptographic random(), ask yourself "What is the worst that could happen if someone guesses the result of random()?"
If the answer is "I don't know," go cryptrographic. You'll save your butt if you didn't know it was important.
If the answer is along the lines of "someone could impersonate a user, or leak information they shouldn't see," for the love of all that is holy, use cryptographic. This is basically every scenario where you are using random to generate an ID of some kind, and while it's only truly critical if that ID is all you need for validation, it does provide another layer of security even if you also require other information to match before giving out elevated access.
If the answer is "it defeats the algorithm I'm trying to do" (think something like ASLR, where you're randomizing the offsets of addresses so that attackers don't know where things are located), well, the reason why you need to use cryptographic should be blindingly obvious.
If the answer is instead "they can reproduce my results," well, you shouldn't use cryptographic in this case. And that's not a lot of cases: Monte Carlo simulations, testing, fuzzing are the obvious poster children for this category, and indeed reproducibility in these cases tends to be a highly valuable feature rather than an anti-feature.
Cryptographic random is almost never harmful to your application, and almost always provides some benefit in reducing guessability of your system. You should err on the side of using cryptographic random(), and only not use it when you are sure that guessability will not harm security in any way and you know that the cryptographic nature actively harms your application.
I think this is a scenario that (a) isn't "extremely niche," and (b) warrants CSPRNGs.
The only possible justification for insecure randomness would be performance, but you'd need to generate a lot of random numbers to even be able to measure that.
Cryptographic randomness is typically slower than other forms of randomness.
In all of the programming I've done in my career, I've only needed cryptographic randomness a few times. For the rest, a fast pseudorandom number generator seeded by the clock was the correct choice.
I don't really care for the browser application, if you made a TLS connection in the first place obviously you better have the randomness and might as well make random() use that, but someone explicitly using a CSPRNG in a native application is a huge code smell on the level of implementing your own crypto.
If you need a fast source of randomness, for some Monte Carlo algorithm for example, then you know this and can pick a deliberate pseudo-random generator that fits your needs.
I worked on a Monte Carlo path tracer. Early on we swapped out the random number generator from the standard random(). Initially not for speed, but due to the poor distribution.
After optimizing other areas it became a bottleneck and we swapped it out again for a faster one.
It turned out to be much less useful than that. So they got rid of it.
But short of saying "banned," any review of security critical code should include an explanation of where the random numbers are coming from and why they're trusted. Or in general for any code review: Why do you believe your numbers?
Though normal random() implementations are LCGs which have poor distributions when you either only look at the least significant bits or project them into multiple dimensions.
As a result they may make some randomized algorithms perform poorly!
I took that as a life philosophy and it's been pretty great so far.
In shaders, the same code is executed in parallel for potentially every pixel. Storing state would mean pixels could only be calculated serially, slowing things down.
Hence you need a random-ish function that depends only on its input. Very low RNG quality is not a blocker, as long as things look good.
So in shaders you see a lot of random generators which simply take the pixel coordinate or something else that distinguishes 2 pixels and do some nonsense operations on them.
The 2D vector used for input is to represent pixel coordinates mapped from 0 to 1 or -1 to 1 on both axes.
The magic numbers are nothing special, they are just large numbers to make the result unpredictable for the given input.
The top level function is a fract which takes the fractional part of a number. So if the result of the inner computation is twisted enough it will be hard to trace it back to the original values. There are lots of variations for these one liners, most of the do a great job to produce noise.
Why must CSRNGs always have alternative obtuse APIs. We're still stuck on C style srand() + rand().
Cryptography is so ubiquitous now that failure to provide cryptographically secure random numbers should be viewed as a hardware flaw.
> Differentiating pattern 1 from pattern 2 involved looking at the seed being given to the subsystem. If the software tried to supply a "good seed", and had no framework for re-submitting a seed for reuse, then it was clear it wanted good random numbers. Those ports could be eliminated from consideration, since they indicated they wanted good random numbers.
> This left only 41 ports for consideration. Generally, these are doing reseeding for reproduceable effects during benchmarking. Further analysis may show some of these ports do not need determinism, but if there is any doubt they can be mindlessly modified as described below.
* https://lwn.net/Articles/625562/
Why should newer languages take the same approach to APIs that these old code bases did? It's not like we're porting all those programs to JS. Those C APIs were written long before hardware could provide fast good randomness, heck even before cryptography was standard practice instead of a special use case.
Not to mention in JS you can't even seed the random number generator. If you want predictable "random" numbers, you should have to jump through additional hoops. By default random numbers should be cryptographically secure.
EDIT: It's also worth mentioning that from your reported dataset, 41 of 8800 programs analyzed used srand to get a repeatable set of "random" numbers. That's 0.47%. I'm happy to break less than half a percent of software if it helps prevent the far more ubiquitous failures of software using insecure random numbers.
It's also useful for other data generation tasks where the output can basically be saved as a seed, making it lightweight and easy to store - it could be written it on a scrap of paper in seconds.
Maybe it's a bad name though - it should be called seededRandom() or semiRandom() or deterministicRandom(). Or perhaps it should be true random is no seed is set. Hard to know. Maybe the true random only needs to be the seed to a deterministic random and reset on a frequent basis in some cases.
Then there's the category of casual random that doesn't matter, like random colours just for the sake of it. It doesn't need to be a secure safe random.
And... assuming that any random function is truly random is a mistake anyway. Basing on hardware, and it may fail. Base it on software and where's the source of entropy. Add to that the possibility of bugs/defects in the implementation, and it's possible that it might not be as random as it needs to be. It's better to assume ALL RNGs are PRNGs, with the caveat that some are decidedly better than others.
So no I wouldn't support a ban on it, nor would I support removing it from any language/runtime where it might be useful.
I once had a lights out server room of 60 servers whose entire purpose was to take skeletonized tests and a seed for a pseudo-random function and generate a test instance. That test instance went to one of a dozen test jigs. What was recorded was: pass/fail, the git sha of the template, and the seed. Any failing test could be reproduced at any time from just the git sha and the seed. True random would have killed that whole methodology.
In our organization the policy is very simple. We have static method available throughout called CryptographyService.GenerateCsprngBytes(count = 64). All developers are aware that any security-sensitive requirements around entropy must use this method. It wraps the OS-level offering, and encourages a minimum reasonable level of entropy with a default count.
I don't see any reason to make it more complicated than this. Communication with your team is more important than writing check-in rules to prevent bad things from happening.
As for other uses of Math.Random, et. al., we don't have any official policy. Because we have clearly communicated the awareness that security-sensitive applications should always use the secure method, we don't need to add a bunch of additional bandaids on top. Enrich the team before the process.
There's some subtlety here. This is sort of a security vs safety issue.
Some people are just reckless, and that's a human problem that is best dealt with through a stern talking to (or, ultimately, termination) rather than technical measures. You'd require an oppressive amount of check-in rules in place to be even remotely effective at stopping this behaviour, and those would just make life miserable for everybody else.
Some people are new to the team and/or just plain inexperienced, and it takes time for them to absorb all the standard practices so they can innocently cause trouble. Even veterans will make mistakes. Low friction guard rails can help keep those people from getting into too much trouble without being too onerous.