http://www.cs.columbia.edu/~orestis/vbf.c
It has very high throughput for L2-resident filters, as long as most queries return "false" and you can use a bulk API. It was, IIRC, about 4x faster than a hand-made "horizontal" SIMD Bloom filter, and 20x faster than cuckoo filters.
By "horizontal" SIMD, I am using the language of a follow-up paper by the same team at Columbia, "Rethinking SIMD Vectorization for In-Memory Databases", http://www.cs.columbia.edu/~orestis/sigmod15.pdf, http://www.cs.columbia.edu/~orestis/sigmod15source.zip. In that paper, they call "vertical" SIMD for hash-based containers "process[ing] a different input key per vector lane". "Horizontal" SIMD is putting the same key in each lane.
I suspect the results in this paper could be improved with more modern gather techniques on newer x86-64 processors.
By that, do you mean the AVX2 'gather' type instructions? If not, I'd be interested to know what those techniques are.
As for AVX2 gathers, I had to look this up recently and it sounds like they're about as fast as manually unpacking the vector and performing scalar loads. That is to say, they're decidedly not fast. On the other hand, it sounds like (as of Skylake) they're bottlenecked on accesses to the L1 cache, so they're about as fast as they reasonably could be.
Source: https://stackoverflow.com/questions/21774454/how-are-the-gat...
Not sure about performance on Zen, but I would imagine it's similar?
Also, a while ago I realized that you can tweak the Remez algorithm to minimize relative error (rather than absolute error) for strictly-positive functions - it's not dissimilar to how this blog post does it for Chebyshev polynomials, in fact. I should really write a blog post about it, but it's definitely doable.
So combining those two, you should be able to get a good "relative minimax" approximation for pi, which might be better than the Chebyshev approximation depending on your goals. Of course, you still need to worry about numerical error, and it looks like a lot of the ideas in the original post on how to deal with that would carry over exactly the same.
That doesn't guarantee that the orbits are perfectly periodic, I suppose, but it does suggest that the orbits are stable with respect to rounding errors up to those you get from using doubles.
Depending on the scale, shaving a few kB here and there can amount to significant savings in the long run.
Say you have a document structured like [boring data] [secret data] [boring data]. I don't know if any existing compressor lets you do this, but the gzip file format (really the 'deflate' format used inside it) allows you to encode this (schematically) as follows:
[compressed boring data] || [uncompressed secret data] || [compressed boring data]
where each || is i) a chunk boundary (the Huffman compression stage is done per-chunk, so this avoids leaks at that level), and ii) a point where the encoder forgets its history - ie, you simply ban the encoder from referencing across the || symbols.
If you wanted, you could even allow references between different "boring" chunks (since the decoder state never needs resetting), just as long as you make sure not to reference any of the secret data chunks.
Edit to add: Also, if the "boring" parts are static, you can pre-compress just those chunks and splice them together, potentially saving you from having to fully recompress an "almost static" document just because it has some dynamic content.
"the curvature of each lens changes,"
"responding to input from an infrared distance meter build into the bridge of the frames"
So the glasses have to know your personal prescription. I know very little about optic, but one thing I've been wondering for a long time is why can't there be a camera pointed to your eyes that adjust the lens until the image on your eye is in focus?
That device was a fairly big thing linked to a desktop, though - no idea if it could be miniaturised or not. In addition, they followed it up with more "traditional" tests, which in my case disagreed with the results from that device. So maybe it's not quite there yet?
The one downside is that you have to pre-process the string, which takes O(n) time and between 5n and 9n space depending on exactly how you do it. But after that, you can do as many searches as you want practically "for free".
There's a paper outlining the various algorithms available here: (PDF) http://www.cosc.canterbury.ac.nz/research/reports/HonsReps/2...
https://docs.docker.com/engine/userguide/storagedriver/image...
* On-disk, the layered approach always saves space, as expected
* In memory, it depends on which storage backend you use: apparently btrfs can't share page cache entries between containers, while aufs/overlayfs/zfs can - I'm not sure if this is due to btrfs or docker's btrfs backend.
From looking at the relevant sources, it looks like (but I could be wrong if I looked in the wrong places) both exec() and dlopen() end up mmap-ing the executable/libraries into the calling process's address space, which should mean they just reuse the page cache entries.
So, if I understand correctly, as long as you pick a filesystem which shares page cache entries between containers, then you do indeed only end up with one copy of (the read-only sections of) executables/libraries in memory, no matter how many containers are running them at once. That's good to know!
For that reason I've been trying out a password-less login for a while now (works via email) and so far non tech folks haven't complained too.
It is pretty much as though you always used the "forgot password" mechanism to login.
Wrote about it here - http://sriku.org/blog/2017/04/29/forget-password/
Plus by merging all of the log-in paths (registration, 'forgot password', and normal login), you have one thing to design and secure rather than three. That seems like a huge advantage from a security perspective.