I still do not understand exactly where D1L comes from in LGan(D1L, T(A1(u)). Is D1L simply A1(u)?
I also find that mixing notation in figure 2 and 3 makes it tricky.
Would have loved to have more insights from the results in the tables.
And more results from inversion, on more than Enron dataset. Since that is one end goals, even if reusing another method.
Thank you for the paper, very interesting!
They have the highest prices of any cloud. What happened to “your margin is my opportunity”?
And, as far as I know, customers are unable to allocate a VM with fewer than eight A100, H100, or H200 GPUs. (Please tell me how if I’m wrong.)
So, customers are incentivized to use other cloud products for GPUs in the short term.
They seem to be heavily invested in their own chips in the medium term.
My thinking is that this machine appeals mostly to people who already has an espresso machine. It's not particularly technologically advanced. It's a single boiler, an E61 group and a vibratory pump. If you're buying this machine, you're probably replacing a machine at a similar technology level, and that's not really a sustainable choice.
A well maintained espresso machine has a lifespan in the range of decades. Many recent innovations in espresso machines is mostly controllers, sensors and actuators. Also better pumps. These are all things that can easily be retrofitted to an older espresso machine.
There has been innovation in other areas not easily retrofittable (saturated groups, dual boilers instead of heat-exchangers, to name a few), but this machine doesn't really feature any of those.
I strongly believe that in this particular demographic, it's a much better (more sustainable, cheaper and all around more fun) idea to retrofit new and advanced parts to the espresso machine they presumably already have, than to buy a whole new machine. We don't need old espresso machines on landfills.
On the off chance that a prospective buyer doesn't already have a similar espresso machine, this isn't too bad of a choice, and the price is decent, but on the other hand, there are a lot of used machines on the market that are looking for a new owner and can be upgraded.
https://github.com/mamba-org/mamba
Beyond that, I'll care about an alternative to transformers when it shows superior performance with an open source 7b-34b model compared to transformer model competitors. So far this has not happened yet
Are there any reason why it wouldn't scale to 7b or more? Have they tried it?
I've found evidence that the OpenAI 1536D embeddings are unnecessairly big for 99% of use cases (and now there's a 3072D model?!) so the ability to reduce dimensionality directly from the API is appreciated for the reasons given in this post. Just chopping off dimensions to an arbitrary dimensionality is not a typical dimensionality reduction technique so that likely requires a special training/alignment technique that's novel.
EDIT: Tested the API: it does support reducing to an arbitrary number of dimensions other than the ones noted into the post. (even 2D for data viz, but may not be as useful since the embeddings are normalized)
The embeddings aren't "chopped off", the first components of the embedding will change as dimensionality reduces, but not much.