As you said, common use cases include deduplication and image search, and especially semantic search (text).
AMA!
What are your favorite ways to do sentence and paragraph embeddedings and is there a framework you like where you can tune to custom data? Do you find fine tuning your embedding model helpful?
Could you expand on this? What code did you see, where, and doing what? Were you part of this team?
> There's nothing particularly new about mathematicians using computers to investigate and make conjectures, even if you add ML into it.
I think you're being needlessly reductive here. The article itself explicitly states that "using computers" is not what's new.
That said, the authors did not fabricate their research (as far as I can tell). They just did not know English well, so it was easier to just copy things that you know are phrased well than to learn to write English well. As the saying goes, do not attribute to malice what can be explained by ignorance or laziness. That does not excuse it but it makes it more understandable.
I agree with the article that this is probably just the tip of the iceberg. There are likely many more lesser evils being committed with similar tools that are just much more difficult to spot. I would not have noticed my particular example if I were not a reviewer for the paper, for example. It makes me wonder how big the problem really is.
Some more relevant links for the curious
Github: https://github.com/wsmoses/Enzyme
Paper: https://proceedings.neurips.cc/paper/2020/file/9332c513ef44b...
Basically the long story short is that Enzyme has a couple of interesting contributions:
1) Low-level Automatic Differentiation (AD) IS possible and can be high performance
2) By working at LLVM we get cross-language and cross-platform AD
3) Working at the LLVM level actually can give more speedups (since it's able to be performed after optimization)
4) We made a plugin for PyTorch/TF that uses Enzyme to import foreign code into those frameworks with ease!
Thank you for sharing and releasing usable code! Do you know if this would work for GPU based applications? Tensorflow models that are trained on a GPU, for example?
For example, complex numbers in the eigendecomposition of a real-valued square matrix encode permutations among dimensions in a particular eigenspace via the roots of unity — which amount to rotations. So they’re a bookkeeping device that allows the eigendecomposition of an automorphism of a real vector space to actually work in all cases.
1+ or Xiaomi or Huawei, they can't really do anything about it.
They have to track users to be not banned.