Secure by Design - Readit News

What programming language would cause least friction for providing discrete types for all domain primitives or even all handled data?

thesuperbigfrog · 5 years ago

Ada does a great job in this regard. Using some examples in the article:

>> For example, say that you want to represent the number of books ordered. Instead of using an integer for this, define a class called Quantity. It contains an integer, but also ensures that the value is always between 1 and 240

The Ada code to implement this is:

type Quantity is new Integer range 1 .. 240;

>> instead of just using a string, define a class called UserName. It contains a string holding the user name, but also enforces all the domain rules for a valid user name. This can include minimum and maximum lengths, allowed characters etc.

The Ada code to implement this is:

with Ada.Strings.Bounded; package UserName is new Ada.Strings.Bounded.Generic_Bounded_Length (Max => UserName_Max_Length);

Dynamic predicates or even a string subtype could be used to further refine the UserName definition depending on exactly what restrictions are needed.

While it's not perfect, Ada does make it pretty easy to specify constraints on data types and will complain loudly when the constraints are violated.

pjmlp · 5 years ago

As an addendum, some of those features are also possible on Pascal/Modula variants, although not as expressive as Ada.

coldacid · 5 years ago

Oh my god. I wish C# had this.

frou_dh · 5 years ago

OCaml, F#, Haskell, Elm, ... (anything with lightweight notation to define sum types [sometimes with only 1 variant], and a module system to limit who can construct them)

There's a whole well-reviewed book on exactly this: https://pragprog.com/book/swdddf/domain-modeling-made-functi...

nicoburns · 5 years ago

Rust and Swift can also be added to this list.

StevenWaterman · 5 years ago

TypeScript is surprisingly good at this. It's not popular as a backend language, but shows that your choices aren't limited to pure FP languages and low-level systems programming languages.

Normal_gaussian · 5 years ago

Typescript is explicitly bad at this; it is structurally typed, not nominally typed, meaning its effectively useless at enforcing domain guards.

See the same program in flow [1] (nominally typed) and TypeScript [2] (structurally typed).

In the case of flow the type can only be constructed with the class -thus enforcing the guards - whereas in TypeScript I can accidently (or deliberately) bypass all guards by having a class with an equivalent structure.

[1] https://flow.org/try/#0MYGwhgzhAEAKD2ECWAXJA3ApgSQHYswHNMAna...

[2] https://typescript-play.js.org/#code/MYGwhgzhAEAKD2ECWAXJA3A...

mosen · 5 years ago

I’ve found F# to be good in this regard. Can’t claim that it would have the least friction, however.

tmountain · 5 years ago

Not saying it's the least friction, but Golang can accomplish this pretty easily with interfaces, structs and receivers. In Haskell, you can use phantom types to accomplish this in really elegant way, as is illustrated here: https://wiki.haskell.org/Phantom_type.

ragnese · 5 years ago

And a related question: How far is too far and/or impractical?

I just recently was working on a backend application and was modeling the database entities. The project uses UUIDs as primary keys. Should each entity have its own primary key type? `Location` gets a `LocationId`, User gets a 'UserId', etc, etc, where they're really all just wrappers around UUID?

Honestly, I thought about doing that a bunch of times during the start of the project, but I was pretty sure I'd get some harsh, sideways, glances from the rest of the team.

maxdeviant · 5 years ago

Yes. This is exactly what we do at work.

It makes sure that you never pass a `LocationId` where a `UserId` is expected; the type system literally will not allow it.

rajandatta · 5 years ago

I've always thought that a language like Idris based on Dependent Types would by far be the best language for this.

The problem is a non-trivial one even for 'simple' things like a person's name. Having a rule that takes in languages, special characters, spaces etc is hard.

JadeNB · 5 years ago

> I've always thought that a language like Idris based on Dependent Types would by far be the best language for this.

As someone who loves the idea of dependent types, it seems to me that this is the best theoretical solution, but maybe not the best practical solution. If solving a simple-seeming domain problem involves modelling, not just first-order types, but the whole theory of dependent types, then I think people are going to start looking for escape hatches rather than upgrading their mental models.

korpiq · 5 years ago

Thank you, both insights help me in pondering how to ever better ensure quality.

fhennig · 5 years ago

I think most statically typed languages should be good for this.

carlmr · 5 years ago

Static typing isn't really the only thing here. Strong typing would also be good.

E.g. C has a very weak type system, which is static. There's a lot of implicit conversion going on. Also the expressiveness of the type system is very limited (in C++ also).

OCaml, F#, Haskell and other functional candidates are strongly and statically typed, with very expressive type systems.

Idris with it's dependent types would be ideal and goes even further than the above.

In embedded most likely ADA and Rust offer strong enough static type systems.

> Rotate secrets automatically every few hours

How is this helpful? To automate it means there is another system that could be attacked and it‘s a valuable one as it manages all the secrets, or not? What‘s the story?

numbsafari · 5 years ago

My personal experience, over 20 years, is that if something like credential rotation isn't automated, it simply doesn't happen. If it does happen, it's a major hassle, probably doesn't get done correctly (causes downtime, something gets missed, probably isn't documented, etc.). Also, there's a huge organizational inertia against doing it. So, for example, when an employee leaves, if you don't have this automated, it likely doesn't happen because "why would we do all this work, it's not like they were a bad person and they aren't stupid".

If you automate this and run it on an automated schedule < 30 days then it is pretty likely that it won't be causing downtime unexpectedly, that you'll have monitoring in place to make sure it actually gets done, that, even if you forget to trigger it for a specific reason (e.g., aforesaid person leaves the organization) it will happen within a reasonable period of time.

In terms of securing such a system... you need to make sure that you separate the system into appropriate pieces with limited access. So, for example, you want a job that is run with an account that only has access to rotate the credentials. It can't use them for anything, just rotate them. Services that consume those credentials should not be able to update them, just use them. You can then ensure that the process that rotates credentials executes in a highly locked-down part of your infrastructure.

Indeed, automating this process also encourages you to create processes with limited access, rather than relying on administrators who have so many responsibilities, you probably just throw them in the equivalent of wide-open sudoers file and call it a day.

It sounds complicated, but if you have decent abstractions, this kind of stuff is actually pretty easy to accomplish.

Silhouette · 5 years ago

It sounds complicated, but if you have decent abstractions, this kind of stuff is actually pretty easy to accomplish.

I'd be interested in seeing any end-to-end examples of how people are doing this in practice.

For example, suppose you're maintaining a SaaS application and you have a private key to access some third party API that certain parts of your back end code need. How do you automate this process, so you change your private key on a regular schedule and update all affected hosts so your application code picks up the new one?

Ideally this needs to avoid introducing risks like a single point of failure, a new attack surface, or the possibility of losing access to the API altogether if something goes wrong. Assuming the old key is immediately invalidated when you request a new one via some API, you also need a real time way of looking up the current active key from any of your application hosts when they need it, again without creating single points of failure, etc.

No doubt this could be done with enough work, but it doesn't feel like a trivial problem.

sitkack · 5 years ago

If the skeleton of your system starts with these processes in place, then you can evolve the arch while maintaining these invariants. If something is an invariant rule, then it needs to exist at the start of the system's life. If you patch the system later, it won't have proper coherence.

candeira · 5 years ago

Two ways in which this is helpful. The list is not exhaustive:

1. The system that gives a service to the public is publicly accessible by default. The system that rotates its credentials it doesn't need to be; it can sit behind your firewall listening only on one port for ssh, with firewall rules allowing access only from a bastion host.

2. The credential rotator also connects to your server and drops credentials into it; you don't call from your server to the credentials rotator, because see Point 1 above. This limits the attack surface.

You're right that everything is potentially vulnerable, and so would the credential rotator system. However, by rotating secrets this way, you shift the locus of security to a smaller place that you can defend much better.

yonixw · 5 years ago

I actually think it talking about a case where the rotator and the rotated are both in same place security wise, Like a schedule task to change login salt, both on the same vm\docker etc.

Same goes to 2 apps connecting with a shared secret, making both of them change it wont expose any more components but will add additional layer of security (like google authenticator)

zimbatm · 5 years ago

It's a security stance. Without rotating the secrets automatically, it becomes more likely that somebody will share a secret.

During a breach, if each service gets their own secrets, it becomes easier to trace the entrypoints and which secret go compromised. Once the system is closed again the attacker automatically looses access to everything after 4h.

thephyber · 5 years ago

The sad truth is that we very rarely find out about a hack when it happens. Depending on the study, companies generally find out about it 6mo - 3yr after a hack starts, and usually it’s accidental that the company finds out. Rotating secrets buys you time by frustrating your attacker a little more and potentially giving you a little more signal to find among the noise in your alerts.

Credential rotation processes are yet another layer of “defense in depth” — the more layers you have, the more secure you can be.

As you rightly point out, it adds complexity. Having a credential generator+rotator means you are more resilient to the fallibility of humans choosing bad credentials or too busy/lazy to do the task.

eeZah7Ux · 5 years ago

Not at all, it's very valuable. It constantly clears out old credentials.

_vvhw · 5 years ago

  Size. A payload of one million characters should probably
  be rejected without further analysis. As well as checking
  the total size, it is good to check the sizes of the parts.

Another thing to check, that is often overlooked, is Quantity.

Every loop should have a limit. There should be no unbounded object allocation. Usually, it's not the big things that get you but the sheer number of small things.

For example, a 20 MB email with 4 million empty attachments:

https://snyk.io/blog/how-to-crash-an-email-server-with-a-sin...

Further examples that affected ClamAV and SpamAssassin:

https://blog.clamav.net/2019/11/clamav-01021-and-01015-patch...

http://mail-archives.apache.org/mod_mbox/spamassassin-announ...

arethuza · 5 years ago

The only problem with this approach that I have encountered is that is requires defining fairly arbitrary limits that are either so small that some user will inevitably run into them or so large that you're not getting as much protection as you think.

NB I don't know what the answer to this is, but pretty much any time a system I have been involved with contains a "reasonable" limit then people want more than that pretty quickly!

The problem you describe is not actually specific to this approach, the same would apply to any Size check, or any limit in general.

In fact, I think the problem is less relevant to a Quantity check, where an order of magnitude headroom above normal usage goes a long way, more so than a Size check.

For example, do you know of people who receive 10,000 attachments per email? This limit would be far and away above any reasonable usage and yet provide decent protection at the same time.

naasking · 5 years ago

> The only problem with this approach that I have encountered is that is requires defining fairly arbitrary limits that are either so small that some user will inevitably run into them or so large that you're not getting as much protection as you think.

Some sort of lazy or other on-demand evaluation can help a lot here. You set the limits large, and then only evaluate the part of it that's actively being accessed.

Would an answer be to have limits that are tunable? I.e., something like "Attachment limit exceeded. Enter new temporary attachment limit for re-scanning: _____"?

mamcx · 5 years ago

Never done before (also i pick magic numbers from the air) but probably the idea is to look at Statistics of the data distribution and pick by Pareto the upper bound that cover more?

Is this data available? One simple one: How much long must be a name in a database field?

dkersten · 5 years ago

> Every loop should have a limit. There should be no unbounded object allocation.

I’ve recently created some mini-languages (DSL’s) for everyone users to control various software and I’ve found that using a synchronous transformational language[1], which has no unbounded loops (or anything) and is therefore guaranteed to complete, and which runs in a stop-the-world fashion (ie each invocation acts as if atomic, whether it actually is or if implementation tricks like transaction rollback + retries are used) makes it easier both to develop and for end users especially ones who aren’t programmers. I’m a huge fan of this approach for correct and secure software.

[1] https://en.m.wikipedia.org/wiki/Synchronous_programming_lang...

WalterBright · 5 years ago

I put limits on things in the D compiler. One thing is certain, I'll get a bug report because the limit was exceeded. Who'd have thought programs would be built with a hundred thousand symbols in them? Or that 65,000 lines in a source file isn't enough?

machinecoffee · 5 years ago

>Or that 65,000 lines in a source file isn't enough?

I would hope that was from auto generated code somehow..?!!

simonw · 5 years ago

One of the things I learned working on software with millions of users is that ANYTHING that does not have a limit will eventually be abused by someone - often not even maliciously.

Got a feature that lets users add multiple additional options? Someone will use it to add 10,000 options, and now pages which display them in a <select> widget will slow to a crawl.

"Anything that does not have a limit will eventually be abused by someone."

There should be a law named for this.

What about Limit's Law?

someguyorother · 5 years ago

See also total functional programming, which applies the same logic to runtime.

chias · 5 years ago

"Here is a book I read that I thought was good. Here are the things I liked most about it. Here is a high level understanding of why they're great."

I love when people write articles like this. Thank you Henrik!

henrik_w · 5 years ago

Thanks chias, that makes me happy!!

I heard a good point about why some numeric thing in your project domain (say, InvoiceNumber) should not be a primitive int: Making it a primitive int implies that ALL the operations available on a primitive int make sense. But it can never make sense to, for example, divide one InvoiceNumber by another InvoiceNumber!

mjul · 5 years ago

Using primitive types “too much” is a frequent anti-pattern. It has a name: Primitive Obsession (https://wiki.c2.com/?PrimitiveObsession)

I think people do it because it is quite some work to implement proper value types in common languages like JavaScript, Java and C#.

Languages like F# or Kotlin give it to you almost for free.

The rule of thumb is exactly as you observed: if values have different semantics they should have different types.

jen20 · 5 years ago

Scott Wlaschin goes into detail about this pattern in F# both on his website [1] and in his book, "Domain Modelling Made Functional". The difference between F# and C# in this regard is quite spectacular.

[1]: https://fsharpforfunandprofit.com/posts/conciseness-type-def...

gitgudnubs · 5 years ago

Creating perfect types for every single input is also an anti-pattern. Most of our needs fall somewhere between "works pretty well" and "formally verified". Riding herd on the type system to prove ever more invariants about your system is usually a waste of time.

It's reasonable to put a little more effort into it along API lines. But there's a reason that the compiler doesn't make it easy to define an integer type that can hold values between -7 and 923091.

I remember taking time to make Radian and Degree classes just to make the code clean. Because it was a simple project, it was 10% of the lines but totally worth it.

nitnelave · 5 years ago

Interesting, but there are quite a few bits that are Java-specific. I'm thinking in particular about exposing mutable objects: they mention that even if you expose an immutable reference, the object can still be mutated, same for collections. This is not the case in C++ or Rust, for instance, let alone in Haskell or similar FP languages.

KingOfCoders · 5 years ago

Java-SDK-specific. There are several immutable data type libraries for Java if you should need them. The main downside of immutable data is that it's much slower than mutable data. A performance oriented solution is borrowing in Rust. (Which after a decade of using Scala is one of it's main downsides).

This is one of the cool things about NextSteps' Foundation classes (now Cocoa) where they have explicit immutable collection interfaces.

Good advice!

> Repave servers and applications every few hours. This means redeploying the same software – if an attacker has compromised a server, the deploy will wipe out the attacker’s foothold there

Yes, but keep in mind that the same attacker will be able to run the same attack successfully again, as long as they have an attack vector.

> Repair vulnerable software as soon as possible (within a few hours) after a patch is available.

Very good advice, and for that you need somebody to update vulnerable libraries and OS packages - like what Linux distributions do - unless you want to maintain hundred of packages by yourself.

MaxBarraclough · 5 years ago

> the same attacker will be able to run the same attack successfully again, as long as they have an attack vector

I'm reminded of 'fileless malware'. A virus can reside exclusively in volatile memory. Worst case here would be to have a set of servers continually reinfecting each other even as you continually repave. I imagine the solution to that would be to repave the whole set and then switch over, like double-buffering. (Of course, not using files isn't exactly a strength here, but it seems apropos.)

https://en.wikipedia.org/wiki/Fileless_malware

natmaka · 5 years ago

>> Repave servers and applications every few hours.

> Yes, but keep in mind that the same attacker will be able to run the same attack successfully again

Exactly. Detecting that deployed files were tampered with can be tricky (one has to take updates into account, and some attacking code may be able to detect this analysis and nurture it with the original version of the files).

carapace · 5 years ago

As an aside, s/nurture/neutralize/ eh? I'm seeing more and more malapropisms in online text. Has some common auto-correct thing gotten aggressive in guessing, badly, at misspelled words? (Auto-incorrect.)

brtkdotse · 5 years ago

Nice to see this on HN. I interviewed the authors back in October[1] (in Swedish) and I found their take on designing in security very interesting.

[1] https://kompilator.se/017/

pintxo · 5 years ago