Cryptographers solve decades-old privacy problem

How efficient is it now? The last time I checked, FHE required minutes of computation and gigabytes of memory to store tiny amounts of data, and since it does not hold IND-CCA, I could not find any use cases.

sangel · 2 years ago

Very inefficient. Like wildly so. Specifically if you have a very small database and you preprocess it with their techniques, the resulting database is petabytes in size. But the results are very beautiful.

There are no obvious ways to improve on this work, so it is not a matter of engineering. We really do need another breakthrough result to get us closer to practicality.

jengels_ · 2 years ago

Pretty efficient! E.g. a recent paper describes a system to do fully private search over the common crawl (360 million web pages) with an end to end latency of 2.7 seconds: https://dl.acm.org/doi/10.1145/3600006.3613134

vlovich123 · 2 years ago

Yeah, but that’s 1 request using 100% capacity of a 45 machine cluster. Relatively efficient but cost prohibitive.

rhindi · 2 years ago

FHE in general is efficient enough for many applications now. You can see some benchmarks here: https://docs.zama.ai/tfhe-rs/getting-started/benchmarks

PeterisP · 2 years ago

The benchmarks listed there are approximately 100 times slower than the original Intel 8088 microprocessor released 1979 on which the original IBM PC was based.

That microprocessor was efficient enough for many applications of general purpose computing, but we still need Moore's law to give us a 100-fold increase in compute power to reach this level.

This level of performance seems comparable to (and IMHO slower than) the very first electronic stored-program computer, the 1948 Manchester Baby.

jquery · 2 years ago

2 seconds to perform a single division on an a 16-bit? int? Am I reading that chart correctly?

godelski · 2 years ago

Not an answer, but a question that I hope someone can answer. Is the lack of speed because of a lack of optimization or compute? Is this something that could be fixed by an accelerator?

It's often hard to compare things in an accurate way. I mean many current methods might already have hardware specific acceleration and researchers aren't always good at optimization (why should they be?). So is the problem of FHE a bigger lack of just not enough people (specialists) putting in time to optimize it or is it so computationally intensive that we need hardware to get better before it becomes practical? I mean look at llama.cpp and how much faster it is or stable diffusion? Sure, both kinda slow (diffusion's no GAN and SD's not straight diffusion) but very usable for many applications.

meindnoch · 2 years ago

>Is the lack of speed because of a lack of optimization or compute?

No.

>Is this something that could be fixed by an accelerator?

No.

_ks3e · 2 years ago

Based on my recollection of a conversation with the authors after their STOC talk: the RAM scheme is not efficient enough to be competitive with circuit-based FHE schemes; for problems with low RAM usage, existing circuit-based methods are more efficient, and problems with higher RAM usage are infeasible to run on any FHE method right now.

They were 50/50 on whether or not this technique could be made feasible in the same way that, say, the CKKS scheme is.

catilinam · 2 years ago

Isn’t FHE by definition not INDCCA? Weak objection imo

Good to see some people are actually solving this problem, unlike all those startups using it as a marketing buzzword.

captn3m0 · 2 years ago

The first time I can across PIR was via the 2014 blog post from Signal on how Private Contact Discovery was a hard problem and how it required PIR to be solved. https://signal.org/blog/contact-discovery/

Maybe this will help Signal get to a viable solution in a few years.

lucb1e · 2 years ago

Note that Signal decided not to use that:

> Unfortunately, for a reasonably large user base, this strategy doesn’t work because the bloom filters themselves are too large to transmit to mobile clients. For 10 million TextSecure users, a bloom filter like this would be ~40MB, requested from the server 116 times a second if every client refreshes it once a day.

They decided to run computations inside a 'secure' hardware environment instead (SGX specifically) so that they can't get access to the computation themselves but it also doesn't need to be run client side. I assume you meant the former thing, but the approach they actually use is fundamentally different from homomorphic encryption / PIR.

hanniabu · 2 years ago

I assume you're referring to the blockchain industry, but they've advanced cryptography a lot, specifically with zkproofs.

desdenova · 2 years ago

No, I'm referring to the FHE startups that are promising a lot of magic that isn't really feasible, while failing to solve what is feasible.

lucb1e · 2 years ago

I know of the concept of zero-knowledge proofs, but didn't know that the blockchain industry advanced cryptography a lot in that area. What are the practical applications of those new things? Or which new things are there to begin with? The Wikipedia article on zero-knowledge proof doesn't seem to say

ChrisArchitect · 2 years ago

The paper:

Doubly Efficient Private Information Retrieval and Fully Homomorphic RAM Computation from Ring LWE

https://eprint.iacr.org/2022/1703

Dead Comment

llwj · 2 years ago

josh2600 · 2 years ago

I remain unconvinced of the practical applications of solutions like this.

These seem like academic toys.

Pre-processing a dynamic database is a non-starter. No database that is used for any real world use case is static at scale, especially not internet traffic.

I’m not expecting FHE that’s fast in my lifetime, but I’ve been wrong before.

Context: I led a team which built the first open source production-grade implementation of hardware-accelerated PIR which is currently at use at scale.

enkid · 2 years ago

I don't think the purpose of this sort of research is to be immediately applicable, it more shows a direction that could be useful in the future. Shor's algorithm has not been used practically, but it's hard to imagine modern cryptography without "post quantum" being an important topic.

kevincox · 2 years ago

I think you are right that this paper isn't practically useful. But that is much like saying that making one chip off of a block of granite isn't art. It isn't, but the first chip enables further research until it is practical.

CreRecombinase · 2 years ago

That’s certainly not true in scientific computing. For us, dynamic is very much the exception rather than the rule.

karulont · 2 years ago

> hardware-accelerated PIR which is currently at use at scale.

Can you tell us more about that?

nixpulvis · 2 years ago

> But using their private lookup method as scaffolding, the authors constructed a new scheme which runs computations that are more like the programs we use every day, pulling information covertly without sweeping the whole internet.

Doesn’t Google already accomplish this? Or is the key here that Google doesn’t do the sweeping in a covert way? So search boils down to two problems: building the index and using the index.

Or am I confused?

wolverine876 · 2 years ago

It understand it's inefficient, but could it be used by a well-resourced organization where confidentiality is extremely high-value and the database is small, such as intelligence, military plans, Apple's product plans, etc.?

narinxas · 2 years ago

there are much cheaper ways, chief amongst them is to use paper and pencils

How do people with pen and paper operate on 10k pieces of data? Which is honestly a rather small number. There's a reason we use computers and why statistics accelerated after its invention.

paulpauper · 2 years ago

You run Google’s algorithm yourself and secretly pull data from the internet when necessary.

yeah isn't this the same offline viewing? why do you need a new algorithm for that? we've known about this forever. obviously, if you download the database and access it, this is more secure than being online , but way slower. Regarding the library example, this too has easy solutions: checkout many books at once, have a stranger check out the book, etc.

This article seems to do a poor job explaining what exactly this solves, only that it's revolutionary. The metaphors are not helpful.

Out_of_Characte · 2 years ago

How I interpreted this is that you download the google search algorithm, then scan their database over the internet with that algorithm. Which would take ages or millenia depending on how much data google has.

You wouldn't need to download the entire database and there would be alot of uncertainty over which search result you'd actually used.

fyokdrigd · 2 years ago

so, not private nor efficient nor solution to homomorphic data.

basically a polynomial factor hash of the index keys... basically you will need the entire db plus the larger hashed keys. doesn't help at all implementing any of the outlandish claims.

guess the merit is in proving the poly fact they build on as secure. but that is not a game changer as there are better alternatives and nobody solved the claimed problems with them.