Is there such a thing as "private, interactive databases" for SaaS's

You need access to their data to process it, any layer of indirection (like a database they control) is additional complexity without meaningful benefit. For clients with strict data control requirements, self-hosting of the whole system is the standard solution (with a very high licensing fee).

Something to keep in mind is that some clients are not operating in good faith, their goal isn't to work together to find a solution but to present roadblocks. The reasoning can be complicated, perhaps there's internal politics around which solution to use, perhaps your solution is receiving pushback because it's not the preferred solution of one stakeholder. You'll probably never know the true motivations, it's important not to get caught up in engineering a solution to a problem that doesn't really exist.

You've mentioned that the data you need access to is code: GitHub is a perfect comparable. GitHub's cloud service is used by the majority of companies with code, in fact, I'd guess even your clients are using GitHub's hosted services. If the problem is that your company doesn't have the reputation necessary to give these clients confidence that you can securely manage their code, that may just be a sign that right now, these clients aren't the right fit for you, and you should work with less antsy clients until you have built up the credibility.

ukoki · a year ago

> their goal isn't to work together to find a solution but to present roadblocks. The reasoning can be complicated..

Or as simple as “the less I appear to value this solution, the lower the supplier will estimate my maximum price for it”

alliewithane · a year ago

That is very valid. My problem is that a large portion of my possible clients seemed to be happy with the idea of the solution I provided. I was looking in it tech wise because I somewhat validated it for my current client-space.

Self-hosting seems like the most reliable option for the time being (or executing functions on the encrypted data without decrypting it) however, is it standard practice that I use Kubernetes to give them a preconfigured database that they can deploy on my own cloud? I wouldn't access the code except temporarily through a little script that talks to my cloud that comes along the database in the pod that they "self host." Would that be considered standard practice?

aimazon · a year ago

No, that wouldn't be considered standard practice. Fundamentally, if you are able to control the code that executes then you can exfiltrate the data regardless of how it is stored. The reason self-hosting is a secure way to execute code against data is because it removes the code from your control: with self-hosting, you would give your code over to the client and then they would run it in their environment.

Providing your customers with their own database in your environment is a method for segregating their data and ensuring that there's no unintentional co-mingling of their data with other customers (which is a common problem in a multi-tenant environment) but it does not protect the customer data from being accessed by you: if code you are executing can access the data, then you can access the data.

Reading between the lines ("a large portion of my possible clients seemed to be happy with the idea of the solution I provided") it sounds like my initial understanding of the situation was incorrect: I thought that you had been asked to build this specific architecture by your clients but it sounds like it's the opposite: you've had an idea, come up with an architecture and then validated that idea with potential clients by describing the architecture? Is that correct?

If that's the actual situation, I think this is a much simpler problem to solve. Architecture is architecture, it isn't a part of the solution, it's a means to an end. There are a very small number of clients who may have strict security/compliance requirements that do necessitate this sort of complexity (which is where self-hosting comes in) but for the majority of clients, how the product works is immaterial, they care only about the results.

Realising that you've made a terrible mistake when building a system using the architecture you designed 6 months ago is a rite of passage, it is the process: every vision you have today for how your system will work is probably going to be wrong 6 months from now. That's completely normal, you will learn more about how your system should work in 1 month of building than you would in 6 months of planning.

Try to take a step back from thinking about architecture. One of the biggest dangers when working on an early stage technology product is committing yourself to a technical direction that then dictates the product direction. If, for example, you decide today to build a system that in which clients self-host the database that your code accesses, and then you decide you want to build a feature that requires 10x as many queries to the database, oops, you can't build that, because it would require your clients upgrade their self-hosted database resources, and getting them to do that will be all but impossible.

If you want to share more about your idea, I can outline some ideas about how I might approach building it in a cheap way that allows for validating the idea. There are exceptions but nowadays, given the maturity of the software development space, most ideas can be built and launched to validate with real customers in 1 month. If your vision for how you'll build something requires, 3, 6 or 12 months to get customers using it, it's probably over complicated.

Two options I’ve seen:

Customer Managed Keys - You have everything encrypted in your database via a key the customer has. You request (likely automated) that key every time you process the data. They can revoke at any point, and have an audit log of every access.

Self Hosting - Let the customer host your solution themselves or automate spinning up a cloud environment for them that they have full control over.

Both are kind of a pain to implement, but that lets you charge more for these enterprise features.

alliewithane · a year ago

I see, I heard about "fully homomorphic encryption" which is faster to implement and allows you to run code on encrypted data but the time complexity is O((10^6) * n) which is insane.

bobbiechen · a year ago

Confidential Computing also provides data-in-use protection and has a significantly more realistic overhead, often <10% in real-world workloads I've seen. However, in this case you might want to combine it with customer managed keys (BYOK) or self-hosting anyways - otherwise the customer has no opportunity to perform remote attestation and prove you're really running in Confidential Computing.

The visualization about halfway down https://www.anjuna.io/solution/secure-ai (my employer) is an example of the self-hosted flavor of this. Happy to discuss deeper, my contact info is in my bio.

Deleted Comment

roetlich · a year ago

> O((10^6) * n)

Isn't that O(n)? Is there a typo or am I missing something?

curious_curios · a year ago

rozenmd · a year ago

Do they hate that it's unencrypted in the DB, or that the DB's storage itself is unencrypted?

(for my business, anyway) I've found this wording to be enough for bigger customers:

Data is stored on AWS RDS, encrypted at rest by an industry standard AES-256 encryption algorithm (more on that here: https://aws.amazon.com/rds/features/security/)

My main problem is that I need to do operations on the data while it's in the DB. This means that I cannot leave it encrypted end-to-end there.

atmosx · a year ago

When RDS is encrypted at rest, it means that the data stored in the database is encrypted while it resides on disk. Means that the data is protected against unauthorised access to raw storage.

The data accessed by the app is not encrypted, you can still work on the data as you would usually do. It's mostly a compliance thing. Not sure what level of security it _actually_ brings to the data itself, but most companies are okay with "encryption at rest".

cr125rider · a year ago

Sure you can. You just can’t do zero knowledge encryption.

JambalayaJimbo · a year ago

Confidential Computing is a way in which cloud providers let their customers encrypt data “in-use” - that might be what you’re looking for.

Sounds like it's exactly what I need. Thank you!

tonygiorgio · a year ago

Yeah exactly this. Especially if you need to programmatically process that data too. You can even let the customers provide their own managed key too (such as AWS externally managed KMS) in combination with something like AWS nitro enclaves.

I’ve enjoyed building on nitro myself and most things should run in it just fine, just need to build the networking vsock proxy into the nitro image for anything that needs networking (such as DB, where you store the encrypted at rest data).

chiph · a year ago

Are you using one database per customer or a shared database (with an additional key on the tables)?

Because for enterprise clients they're going to want their own database. Which has it's own licensing and operating costs - that you should be building into your price. And since they will have their own database it can be encrypted with a key that is unique to them.

For small business customers, a shared database is the only way to stay profitable.

VTimofeenko · a year ago

Disclaimer: I work for Snowflake.

This idea (customer owns the data, code is deployed next to the data, data never leaves customer perimeter) is the exact use case for the native application framework:

https://docs.snowflake.com/en/developer-guide/native-apps/na...

williamtrask · a year ago

I lead an open source nonprofit which deploys things like this. Feel free to shoot me a DM on Twitter. Handle is @iamtrask

cocoa19 · a year ago

Why do they hate the idea?

It’s not clear what the core problem is. Are they contractually or by law obligated to comply with security/privacy requirements? Are they afraid you’ll misuse their data (steal their business, etc).

If you can be explicit about what “hate” means, you can find a solution, or decide this is not a potential customer.

They are not comfortable with the fact that I can look at their code base whenever I want.