Readit News logoReadit News
danicgross commented on Scaling Transformer to 1M tokens and beyond with RMT   arxiv.org/abs/2304.11062... · Posted by u/panabee
cs702 · 2 years ago
Here's a list of tools for scaling up transformer context that have github repos:

* FlashAttention: In my experience, the current best solution for n² attention, but it's very hard to scale up beyond the low tens of thousands of tokens. Memory use is O(n) but compute is O(n²). Code: https://github.com/HazyResearch/flash-attention

* Heinsen Routing: In my experience, the current best solution for n×m attention, i.e., mapping n tokens to m tokens. It's like a souped-up version of attention. I've used it to pull up more than a million tokens as context. Memory use and compute are O(nm). It works, but in my (limited) experience, it doesn't work out-of-the-box as well as FlashAttention for n² attention. Code: https://github.com/glassroom/heinsen_routing

* RWKV: A sort-of-recurrent model which claims to have performance comparable to n² attention in transformers. In my (limited) experience, it doesn't. Others seem to agree: https://twitter.com/arankomatsuzaki/status/16390003799784038... . Code: https://github.com/BlinkDL/RWKV-LM

* RMT (this method): I'm skeptical that the recurrent connections will work as well as n² attention or n×m routing in practice, but I'm going to give it a try. Code: https://github.com/booydar/t5-experiments/tree/scaling-repor...

In addition, the group that developed FlashAttention is working on state-space models (SSMs) that look promising to me. The idea is to approximate n² attention dynamically using only O(n log n) compute. There's no code available, but here's a blog post about it: https://hazyresearch.stanford.edu/blog/2023-03-27-long-learn... [CORRECTION: Code is available. See comment by lucidrains below. I'm hopeful this will go to the top of my list.]

If anyone here has other suggestions for working with long sequences (hundreds of thousands to millions of tokens), I'd love to learn about them.

danicgross · 2 years ago
cs702, fantastic comment. I am sorta poking around this area too. I'd be curious what benchmark you're using to evaluate performance amongst these repos? If you're up for it, shoot me an email -- my email is in my profile.
danicgross commented on NSO Group iMessage Zero-Click Exploit Captured in the Wild   citizenlab.ca/2021/09/for... · Posted by u/jbegley
danicgross · 4 years ago
Would turning off iMessage protect from this? Or would the iPhone still process the GIF through SMS somehow...?
danicgross commented on Is Word Error Rate a Good Metric for Speech Recognition Models?   assemblyai.com/blog/word-... · Posted by u/dylanbfox
CornCobs · 4 years ago
I'm working on a similar domain, music transcription. The challenge is to estimate note values (how many beats is a note supposed to be as represented in the score?) and I'm not sure what would be the a good way to measure transcription accuracy. The naive note error rate cannot capture whether my model successfully detects certain musical structure, syncopation, dotted rhythms etc
danicgross · 4 years ago
Related, are there better representations for music than standard notation (or MIDI)?

I'm wondering what the higher convolution levels could look like, if this was a CNN analyzing an image. Something between a the complete Ableton/Logic export and a MIDI file. Being able to capture the "feel" of a song (or a section within a song) strikes me as an important milestone towards designing really good generative music.

danicgross commented on Launch HN: Bedrock AI (YC S21) – Using ML to identify red flags in SEC filings    · Posted by u/kbennatti
danicgross · 4 years ago
Interesting.

How do you think about backtesting? There are a few short-only shops that specialize in finding frauds. If you get their historical 13-Fs, how would you score against them in terms of precision/recall?

And I guess more broadly, how does alpha with your system compare to a portfolio that holds all short positions by big long/short funds (ex thematic shorts)? Meaning, those guys have full-time humans that focus on this... can you beat them? Very interesting if so.

danicgross commented on Show HN: Anki alternative with integrated notes and import/export   get21stnight.com/... · Posted by u/klevertree
sabco · 4 years ago
So basically the same features as Anki except it's paid?
danicgross · 4 years ago
Anki is paid too. ($25 for the mobile app.)
danicgross commented on Show HN: Hacker News-ish stock news, from 40+ sources   steez.news... · Posted by u/mjmasia
Thorncorona · 4 years ago
FWIW I don't think that you can really do this without moderation. Financial news is one area where curation shines particularly strongly, considering how strong monetary and information gain collide.

Benzinga / Motley Fool / Seeking Alpha / Business Wire / Forbes aren't places to find worthwhile information.

danicgross · 4 years ago
Very true. I would gladly pay for a feed of financial news, but instead of "clicks" it optimizes on market movement. News articles from sources that, in the past, have predicted market activity.

I.e. "this blog mentioned NYSE:TEVA, and the next day the stock moved materially, therefore site_ranking++". (You'd probably have some TF/IDF saliency metric too, so that a site that mentions all stocks is penalized.)

danicgross commented on Amazon Sidewalk   amazon.com/Amazon-Sidewal... · Posted by u/encryptluks2
pupdogg · 4 years ago
If anyone is interested, I wouldn't mind collaborating to create a privacy oriented SmartTV or PC for devs or simply an industrial grade dumb TV. I have experience designing and building 70" industrial displays that utilize Samsung LCD panels. These "industrial" panels are normally twice the price of consumer grade ones but are designed to operate 24/7 with a MTBF of 100k hours (approx. 11 years) and are usually twice the brightness of normal ones (a very noticeable difference). I also have direct relations with Samsung for sourcing. Displays I've designed and built: https://bit.ly/3vV9jVm
danicgross · 4 years ago
That’s interesting. Can you make something that’s higher end than the most high end consumer TV?

(If Tesla had to be “a better car that’s also electric”, I think this would need to be “a better TV that’s also private”.)

danicgross commented on HN front page ranked using only votes from early users (2009)   news.ycombinator.com/clas... · Posted by u/ibraheemdev
danicgross · 4 years ago
You could expand the cutoff date by modeling lookalike audiences. A 2018 account that votes on similar things to a 2008 account might be admitted. This affords the moderator an easy “exploration” spigot they can tune up or down.

u/danicgross

KarmaCake day2470October 11, 2009
About
My email is daniel at dcgross.com.
View Original