Readit News logoReadit News
ThePadawan · 5 years ago
I'm not big on bans.

What I am big on is forcing developers to make deliberate choices. That's why I like React's policy of naming functionality "dangerouslySetInnerHTML" or "__SECRET_DOM_DO_NOT_USE_OR_YOU_WILL_BE_FIRED".

If you add usages for these in a PR I'm reviewing without justification, it's not getting merged.

So why not make cryptographically unsafe random unsafeRandom() or shittyRandom() or iCopyPastedThisFromStackOverflowRandom()?

Macha · 5 years ago
This assumes writing crypto code is the most common use case for random numbers.

How often do you write crypto code?

vs

How often do people use random numbers + threshold for A/B tests? How often do game developers use random numbers for gameplay variety? How often is random used for animation variety? Do these use cases need the overhead of a cryptography RNG?

A former employer had the same issue as in the article - the security team implemented an automated vulnerability scanner in our github enterprise instance, and it spammed comments and marked a review as requiring changes if it edited any merge request which touched a file which used java.util.Random. It lasted a day before the security team was made turn it off as on our team (and many others), literally 0 uses of random numbers were those requiring a secure random.

FalconSensei · 5 years ago
Also: if someone is working on crypto and doesn't know that random() isn't true random, should they be working on that?
tgsovlerkhgsel · 5 years ago
I genuinely don't see the reason why non-cryptographic random number generators exist outside of niche applications.

The main arguments I've seen are speed and determinism.

However, a cryptographically secure, deterministic PRNG can be built from hash or block cipher primitives that have hardware acceleration, making them quite fast. Seed (and potentially periodically re-seed) it from a strong source of randomness, and you've got a fast and cryptographically secure non-deterministic PRNG.

I thought that "classic" PRNGs like the widespread Mersenne Twister even had issues that can cause practical problems when used in certain kinds of simulations (Monte Carlo, possibly) that rely on large amounts of random numbers, but I haven't been able to find a clear source for this.

I'm certainly defaulting to secure ones, and I'm surprised modern languages and libraries don't do this by default for their standard randomness functions.

jcelerier · 5 years ago
> It lasted a day before the security team was made turn it off as on our team (and many others), literally 0 uses of random numbers were those requiring a secure random.

can concur, currently approaching 400k SLOC of C++ in the repo. A few dozens different places crop up where random is needed (with a quick and dirty grepping). Literally 0% is for secure stuff. Most of it has to be as fast as possible (and very low quality, as it just needs to be random / noisy enough to look random for human perception)

kortex · 5 years ago
This just kind of proves GP's point. Random APIs usually tell you what the RNG is, but not the why/how. Most people don't care if it's /dev/(u)random, Mersenne twister, PCG, LFSR, LCG, RDRAND, etc. They care about roughly 4 attributes:

- Is it good for crypto

- Is it fast

- Is it reproducible

- Is it portable

But fundamentally, it's about the use case and interface:

- I need secure random (strong, slow, secure)

- I need Monte Carlo (good enough, fast, reproducible)

- I need chaotic behavior for my game/stress test/back off protocol (usually can be barely random, fast, reproducible)

I think calling the last case InsecureRandom or RandomEnough is reasonable to convey "don't use me for secure purposes".

city41 · 5 years ago
Interestingly a major aspect of video game speed running is figuring out how the game generates random numbers then exploiting the knowledge. For example speedrunners avoid all random battles in an rpg with this tactic. I'm not arguing games need true random for the record.
mmazing · 5 years ago
That's all great and makes sense, but ...

What does that have to do with being verbose and letting developers know they are using an insecure method when it would apply to them?

If I'm writing code and using rng for gameplay variety, and then I notice that I have to use a function called "insecureRandom", at the very least I'm going to read up on an interesting aspect of computing and be a little more informed at the end of the day.

Khaine · 5 years ago
The other thing is, true randomness doesn't seem random to humans. Which is why spotify and others had to modify shuffle. So true random might not be appropriate for the use case.

What is appropriate to use comes down to context.

1MoreThing · 5 years ago
The creation of a trueRandom function certainly seems to solve this problem more than taking away a useful tool for cases where pseudo-random is good enough.
wbl · 5 years ago
How many of those applications are ones where a single AES encryption is too slow?
ThePadawan · 5 years ago
Did you mean to reply to my comment?

I don't disagree with anything you say, but it also doesn't refer to my comment.

zrm · 5 years ago
It's also a good idea to give safer things shorter names.

So make random() a CSPRNG (and an alias for SecureRandom() for people who want to be explicit) while InsecureFastRandom() is just what it says and has no other name. Then if you really need performance over unpredictability, it's there, but nobody is confused about what they're getting. And lazy people who don't like to type or pay close attention get the safe one.

cogman10 · 5 years ago
That's be my preference.

random() should be the most universally applicable random which includes making it as secure as possible. Non-universally applicable randoms should be named accordingly.

bigiain · 5 years ago
Then we’ll end up with a csprng getting used in a tight loop iterating over every pixel in a raytracer...

“Lazy people who don’t want to type” are not the sort of people I want writing the code I might use or interact with that requires cryptographically secure random numbers...

cpeterso · 5 years ago
Firefox has a secret setting used in test automation called “turn_off_all_security_so_that_viruses_can_take_over_this_computer”.

https://searchfox.org/mozilla-central/rev/3ff133d19f87da2ba0...

makomk · 5 years ago
Yeah, and if I remember correctly one neat technique for exploiting security vulnerabilities in Firefox was to use them in order to set turn_off_all_security_so_that_viruses_can_take_over_this_computer to true, with obvious results.
wnevets · 5 years ago
hmm I wonder what would happen if I enable this setting...
nicoburns · 5 years ago
React's dangerouslySetInnerHTML is indeed a good way of handling this kind of thing. Rust's `unsafe` is another example of the same approach.
andoriyu · 5 years ago
uhm no.

Rust's "unsafe" is a pretty bad name and completely different reasoning behind using it. It doesn't mark something as "dangerously unsafe, don't use", to a consumer it indicates "exercise caution" and to a compiler it just allows 5 things:

    Dereference a raw pointer
    Call an unsafe function or method
    Access or modify a mutable static variable
    Implement an unsafe trait
    Access fields of unions
The point of "unsafe" in rust is to highlight which area requires more human attention... not to discourage its usage.

`dangerouslySetInnerHTML` is literally dangerous and allows XSS if used with outside input.

It also is faster than the other variant. The same is true for `random()`. Both can be used when you know what are you doing to gain some performance.

Meanwhile, `unsafe` rust by itself is not different from safe rust in terms of speed. You have no choice, but to use it places it supposed to be used.

setr · 5 years ago
I don't think rust's unsafe says anything about cryptographic security, beyond pointer-safety implications.
shadowgovt · 5 years ago
I agree, with the caveat that use of random for cryptography is actually a domain specific use case.

It's probably okay to leave the function as it is and just drill into people that if you're doing cryptography, you either need to know exactly what you're doing all the way down to the hardware or you need to leave it a task for somebody else more specialized than you. I, for one, never assume random() is cryptographically secure, but it might be because I grew up programming during the era where random was computed off of clock cycles since CPU startup because there wasn't much other cheap entropy to lay a hand on ("battery-backed onboard date clock?! Oh, look who has AKERS money!").

iratewizard · 5 years ago
Beginner friendliness is something to remember, too. There are half a dozen words you could use to describe pseudoRandom(). Random() is easy for a first year or non-professional to remember.
dariusj18 · 5 years ago
Most of the time the people who write and name the functions don't know it's not secure or safe. So you would still need to ban random when the new name is implemented.
sp332 · 5 years ago
The "ban" can be evaded by telling semgrep to ignore it for one line. https://semgrep.dev/docs/ignoring-findings/ This doesn't really scale though - if someone bans it with a different tool, you'd have to tell each tool to ignore this line.
suzzer99 · 5 years ago
I have required parameter to push our app to production called: YES_I_HAVE_ALREADY_MERGED_THE_LIB_REPOS_AND_WAITED_FOR_THEM_TO_COMPLETE_BEFORE_MERGING_THE_APP_REPOS

Gets the point across and will still work when I'm long gone.

Deleted Comment

ddlsmurf · 5 years ago
Indeed this is a stupid debate. Knives help us in the kitchen but also sometimes stab people - should we ban knives ?
Spivak · 5 years ago
So you’re not big on bans but if you use dangerouslySetInnerHTML then it’s definitely not getting merged? Is that not a ban? Do you just not like when tooling enforces it?
ThePadawan · 5 years ago
No, as I said, it would raise a red flag. That flag can be lowered by justification, e. g. if you add types or constraints to only allow safe-enough parameters etc.
dariusj18 · 5 years ago
They said "without justification"

Deleted Comment

Deleted Comment

nemo1618 · 5 years ago
The root problem here is the notion that you need to choose between "strong and slow" randomness vs. "weak and fast" randomness. If every language's random() was strong and fast, most developers would never have to think about it.

"Strong" randomness is often too slow because every time you ask for new entropy, you make a syscall. The solution is to use 32 bytes of strong randomness to seed a userspace CSPRNG. You can generate gigabytes of secure entropy per second in userspace. If you need deterministic entropy, just use the same seed.

This isn't a one-size-fits-all solution, of course. If you only need to generate a few keys now and then, it's marginally safer to make a separate syscall for each of them. If you're targeting some tiny SoC, then sure, use xorshift instead. But what we care about is the common case, and right now the common case is a developer choosing the weak, deterministic RNG because it's faster and has a more convenient API and the secure RNG says "for cryptographic purposes" and well this usecase doesn't seem like cryptography, it's just a simple load balancer...

That decision should never need to be made.

moonchild · 5 years ago
> The solution is to use 32 bytes of strong randomness to seed a userspace CSPRNG

All cryptographic randomness generation should be performed by the kernel.

You always have to think about security because if you don't think about security you're going to get hacked. By all means, name the insecure randomness generation function ‘insecure_random’. It does help. But secure-by-default helps you only marginally because when building secure software you don't get to just use the defaults; you have to think about what they're doing.

You have to (for example) know and think about timing attacks even if you're using a cryptographic primitives library that's hardened against them, because it's really easy to introduce timing dependence into your own code and none of Daniel Bernstein or Tanja Lange’s careful designs will save you.

nemo1618 · 5 years ago
That's fair. There's no silver bullet for security. But we should not let the perfect be the enemy of the good. Everyone writing non-trivial systems should have some understanding of security; but the more components we make secure-by-default, the less those developers need to learn.
repsilat · 5 years ago
> You can generate gigabytes of secure entropy per second in userspace.

I haven't thought about this before, so please have patience:

I guess the "secure" qualifier does a lot of work in this sentence? That there's 32 bytes of "true entropy", but "secure entropy" is theoretically weaker but practically just as strong with reasonable assumptions about an attacker's computing resources.

So I'd guess the "secure" qualifier must mean something like "given any quantity of derived pseudorandom information, the seed bytes can't be efficiently deduced? Pretty neat. (I had a knee-jerk disagreement until I re-read your post and saw that you said "32 bytes", not "32 bits". Quite plausible -- and cool -- that we have a good solution with just a small amount more seed randomness though.)

nemo1618 · 5 years ago
Correct -- the RNG output does not reveal anything about its input, just like the output of a strong cipher does not reveal anything about its input.

djb has a good article on the subject: https://blog.cr.yp.to/20170723-random.html

I've implemented his "fast-key-erasure CSPRNG" in Go: https://github.com/lukechampine/frand

smcameron · 5 years ago
There is no such thing as fast enough.
madsbuch · 5 years ago
Is there something I am overlooking here? Randomness is used for many other things than cryptographic usages.

Eg. for randomised algorithms you need a fast source of randomness.

otabdeveloper4 · 5 years ago
No, you are not.

Cryptographic random() is an extremely niche use case that you shouldn't be using unless you're writing your own crypto libraries. (Don't do that.)

jcranmer · 5 years ago
That's not true.

To answer the question as to when you should use cryptographic random(), ask yourself "What is the worst that could happen if someone guesses the result of random()?"

If the answer is "I don't know," go cryptrographic. You'll save your butt if you didn't know it was important.

If the answer is along the lines of "someone could impersonate a user, or leak information they shouldn't see," for the love of all that is holy, use cryptographic. This is basically every scenario where you are using random to generate an ID of some kind, and while it's only truly critical if that ID is all you need for validation, it does provide another layer of security even if you also require other information to match before giving out elevated access.

If the answer is "it defeats the algorithm I'm trying to do" (think something like ASLR, where you're randomizing the offsets of addresses so that attackers don't know where things are located), well, the reason why you need to use cryptographic should be blindingly obvious.

If the answer is instead "they can reproduce my results," well, you shouldn't use cryptographic in this case. And that's not a lot of cases: Monte Carlo simulations, testing, fuzzing are the obvious poster children for this category, and indeed reproducibility in these cases tends to be a highly valuable feature rather than an anti-feature.

Cryptographic random is almost never harmful to your application, and almost always provides some benefit in reducing guessability of your system. You should err on the side of using cryptographic random(), and only not use it when you are sure that guessability will not harm security in any way and you know that the cryptographic nature actively harms your application.

stickfigure · 5 years ago
Server-side folks generate random identifiers and shared secrets all the time. Yes, it's niche, but not "extremely" and you don't use a crypto library for this (you use secure random!)
creata · 5 years ago
Say you're making an online game, and you need an RNG on your server. Above all, this RNG needs to be unpredictable, or someone will easily game it. Most non-cryptographic PRNGs are very predictable, so it's dangerous to use them.

I think this is a scenario that (a) isn't "extremely niche," and (b) warrants CSPRNGs.

mreome · 5 years ago
It's not that you shouldn't be using it necessarily, it's just that for many cases (games, procedural generation, graphics, many kind of simulations) it's unnecessary and slow. In my experience if someone doesn't know if they need a cryptographicly secure random(), or if a given random() implementation is secure then they (a) don't need it or (b) are trying to implement something they shouldn't be.
tedunangst · 5 years ago
What bad things will happen if I use cryptographic random for a non niche use case?
hannob · 5 years ago
The cryptographic randomness has practically no downside if you use it for non-cryptogrpahic purposes. Not true the other way round. And I'm inclined to say given how many misconceptions around randomness there are around, I don't think people are good at knowing whether they need secure randomness.

The only possible justification for insecure randomness would be performance, but you'd need to generate a lot of random numbers to even be able to measure that.

gwbas1c · 5 years ago
> The cryptographic randomness has practically no downside if you use it for non-cryptogrpahic purposes

Cryptographic randomness is typically slower than other forms of randomness.

In all of the programming I've done in my career, I've only needed cryptographic randomness a few times. For the rest, a fast pseudorandom number generator seeded by the clock was the correct choice.

mreome · 5 years ago
My counter would be that if someone "doesn't know whether they need secure randomness" then the problem is not that random() is not secure, it's the fact that someone is doing something they really should not be doing in the first place.
stefan_ · 5 years ago
Obligatory mention here for the fine folks of systemd, who have made a properly seeded CSPRNG a requirement for merely booting a system and then kept bricking peoples systems when it turns out finding that seed at boot time is a non-trivial problem. All for what, avoiding collisions in some hash table implementation?

I don't really care for the browser application, if you made a TLS connection in the first place obviously you better have the randomness and might as well make random() use that, but someone explicitly using a CSPRNG in a native application is a huge code smell on the level of implementing your own crypto.

magicalhippo · 5 years ago
As much as it's overkill for most people, I'm a fan of safe defaults so I say let random() be slow and good. It's better to find out your code is slow due to a slow random() than to find out it's broken because you didn't know and thought random() was really random.

If you need a fast source of randomness, for some Monte Carlo algorithm for example, then you know this and can pick a deliberate pseudo-random generator that fits your needs.

I worked on a Monte Carlo path tracer. Early on we swapped out the random number generator from the standard random(). Initially not for speed, but due to the poor distribution.

After optimizing other areas it became a bottleneck and we swapped it out again for a faster one.

duckerude · 5 years ago
It is. The question was how often that's the case. If 50% of the uses of random() are bad, then getting those fixed may be worth the cost of annoying the authors of the legitimate 50%.

It turned out to be much less useful than that. So they got rid of it.

analog31 · 5 years ago
Indeed, I use something like it from a vendor supplied C math library for a noise generator on an embedded app, where I really just care about its crude statistical behavior.

But short of saying "banned," any review of security critical code should include an explanation of where the random numbers are coming from and why they're trusted. Or in general for any code review: Why do you believe your numbers?

nullc · 5 years ago
> Eg. for randomised algorithms you need a fast source of randomness.

Though normal random() implementations are LCGs which have poor distributions when you either only look at the least significant bits or project them into multiple dimensions.

As a result they may make some randomized algorithms perform poorly!

tiborsaas · 5 years ago
Meanwhile in graphics programming land:

    float rand(vec2 co){
        return fract(sin(dot(co.xy,vec2(12.9898,78.233))) * 43758.5453);
    }

LolWolf · 5 years ago
Honestly, the one thing that got me into graphics (from physics and math) was just the incredible amount of: "you can literally do anything so long as you make it pretty in the end."

I took that as a life philosophy and it's been pretty great so far.

bobbylarrybobby · 5 years ago
I assume you’ve seen John Carmack’s square root hack? What a beaut
makeworld · 5 years ago
Could someone explain this? How is it used, why does it take a vector as input, where do the magic numbers come from, etc.
hyperman1 · 5 years ago
random typically works by storing/modifying some state, so every call with the same argument results in different numbers.

In shaders, the same code is executed in parallel for potentially every pixel. Storing state would mean pixels could only be calculated serially, slowing things down.

Hence you need a random-ish function that depends only on its input. Very low RNG quality is not a blocker, as long as things look good.

So in shaders you see a lot of random generators which simply take the pixel coordinate or something else that distinguishes 2 pixels and do some nonsense operations on them.

tiborsaas · 5 years ago
It's written in GLSL which is a C like language for shaders designed to be executed on the GPU.

The 2D vector used for input is to represent pixel coordinates mapped from 0 to 1 or -1 to 1 on both axes.

The magic numbers are nothing special, they are just large numbers to make the result unpredictable for the given input.

The top level function is a fract which takes the fractional part of a number. So if the result of the inner computation is twisted enough it will be hard to trace it back to the original values. There are lots of variations for these one liners, most of the do a great job to produce noise.

jl2718 · 5 years ago
Is this actually faster than seeding the PRNG with co.xy? Or xxhash({co.xy,time})?
admax88q · 5 years ago
Just make Math.random() cryptographically secure, now all your apps are fixed, and no existing code broken. I can't imagine anything relying on Math.random() being "less" random than a CSRNG.

Why must CSRNGs always have alternative obtuse APIs. We're still stuck on C style srand() + rand().

Cryptography is so ubiquitous now that failure to provide cryptographically secure random numbers should be viewed as a hardware flaw.

throw0101a · 5 years ago
Some folks purposely want random-ish results. When OpenBSD was changing the behaviour of its legacy POSIX random functions it was observed:

   This API is used in two patterns:
 1. Under the assumption it provides good random numbers.
    This is the primary usage case by most developers.
    This is their expectation.
 2. A 'seed' can be re-provided at a later time, allowing
    replay of a previous "random sequence", oh wait, I mean
    a deterministic sequence...
They went through the code, especially the third-party packages/ports, to identify uses:

> Differentiating pattern 1 from pattern 2 involved looking at the seed being given to the subsystem. If the software tried to supply a "good seed", and had no framework for re-submitting a seed for reuse, then it was clear it wanted good random numbers. Those ports could be eliminated from consideration, since they indicated they wanted good random numbers.

> This left only 41 ports for consideration. Generally, these are doing reseeding for reproduceable effects during benchmarking. Further analysis may show some of these ports do not need determinism, but if there is any doubt they can be mindlessly modified as described below.

* https://lwn.net/Articles/625562/

admax88q · 5 years ago
This is exactly what I mean about being stuck in the C mindset. You're looking at the problem though the lens of what this giant pile of ancient C software does.

Why should newer languages take the same approach to APIs that these old code bases did? It's not like we're porting all those programs to JS. Those C APIs were written long before hardware could provide fast good randomness, heck even before cryptography was standard practice instead of a special use case.

Not to mention in JS you can't even seed the random number generator. If you want predictable "random" numbers, you should have to jump through additional hoops. By default random numbers should be cryptographically secure.

EDIT: It's also worth mentioning that from your reported dataset, 41 of 8800 programs analyzed used srand to get a repeatable set of "random" numbers. That's 0.47%. I'm happy to break less than half a percent of software if it helps prevent the far more ubiquitous failures of software using insecure random numbers.

nixpulvis · 5 years ago
Unless I'm wrong, the only valid reason to not use a better random number generator is for performance / simplicity, which then demands benchmarks and evaluation.
antonyh · 5 years ago
Seeded random is a glorious thing in the right circumstances. As an example, I've used it for 'random' testing sequences (jumbling up a list of inputs) but in a way I can later re-run EXACTLY the same test.

It's also useful for other data generation tasks where the output can basically be saved as a seed, making it lightweight and easy to store - it could be written it on a scrap of paper in seconds.

Maybe it's a bad name though - it should be called seededRandom() or semiRandom() or deterministicRandom(). Or perhaps it should be true random is no seed is set. Hard to know. Maybe the true random only needs to be the seed to a deterministic random and reset on a frequent basis in some cases.

Then there's the category of casual random that doesn't matter, like random colours just for the sake of it. It doesn't need to be a secure safe random.

And... assuming that any random function is truly random is a mistake anyway. Basing on hardware, and it may fail. Base it on software and where's the source of entropy. Add to that the possibility of bugs/defects in the implementation, and it's possible that it might not be as random as it needs to be. It's better to assume ALL RNGs are PRNGs, with the caveat that some are decidedly better than others.

So no I wouldn't support a ban on it, nor would I support removing it from any language/runtime where it might be useful.

dbcurtis · 5 years ago
Came here to say this. I have spent a lot of time in hardware validation. Pseudo-random (explicitly NOT random) sequences are hugely useful.

I once had a lights out server room of 60 servers whose entire purpose was to take skeletonized tests and a seed for a pseudo-random function and generate a test instance. That test instance went to one of a dozen test jigs. What was recorded was: pass/fail, the git sha of the template, and the seed. Any failing test could be reproduced at any time from just the git sha and the seed. True random would have killed that whole methodology.

antonyh · 5 years ago
That sounds awesome, and the use of a repeatable random vital. Another example of random in a non-cryptography context where it's unpredictable under normal operation, but completely predictable when needed. If you wanted you could run the same test on all the test jigs with different seeds, safe in the knowledge that you could re-run all of them exactly again and again if required. Or you could add problematic seeds to a list for repeated retest with future versions. So much power and freedom!
argvargc · 5 years ago
notRandom() might be usefully descriptive in this case.
not2b · 5 years ago
There are many uses of random() that do not require cryptographic security: simulation, simulated annealing, sound synthesis, digital signal processing and the like. It would be a nuisance if developers of those kinds of software have to fight warnings because developers of completely different applications can't get it right.
not2b · 5 years ago
Further, such users usually want to be able to repeat a test case: start from the same seed, get the same sequence. They don't want true randomness, they want a repeatable sequence with good statistical properties.
bob1029 · 5 years ago
It's all about the discipline of the team in the end... You can ban things all day, but it just takes 2 developers deciding they don't give a shit to code, review & merge that use-fast-random-for-session-token PR. There is more than 1 way to get something that is "random", so basic string matching for methods you don't like is certainly not a guarantee.

In our organization the policy is very simple. We have static method available throughout called CryptographyService.GenerateCsprngBytes(count = 64). All developers are aware that any security-sensitive requirements around entropy must use this method. It wraps the OS-level offering, and encourages a minimum reasonable level of entropy with a default count.

I don't see any reason to make it more complicated than this. Communication with your team is more important than writing check-in rules to prevent bad things from happening.

As for other uses of Math.Random, et. al., we don't have any official policy. Because we have clearly communicated the awareness that security-sensitive applications should always use the secure method, we don't need to add a bunch of additional bandaids on top. Enrich the team before the process.

pdpi · 5 years ago
> Communication with your team is more important than writing check-in rules to prevent bad things from happening.

There's some subtlety here. This is sort of a security vs safety issue.

Some people are just reckless, and that's a human problem that is best dealt with through a stern talking to (or, ultimately, termination) rather than technical measures. You'd require an oppressive amount of check-in rules in place to be even remotely effective at stopping this behaviour, and those would just make life miserable for everybody else.

Some people are new to the team and/or just plain inexperienced, and it takes time for them to absorb all the standard practices so they can innocently cause trouble. Even veterans will make mistakes. Low friction guard rails can help keep those people from getting into too much trouble without being too onerous.