danicgross (u/danicgross)

danicgross commented on Scaling Transformer to 1M tokens and beyond with RMT arxiv.org/abs/2304.11062... · Posted by u/panabee

cs702 · 2 years ago

Here's a list of tools for scaling up transformer context that have github repos:

* FlashAttention: In my experience, the current best solution for n² attention, but it's very hard to scale up beyond the low tens of thousands of tokens. Memory use is O(n) but compute is O(n²). Code: https://github.com/HazyResearch/flash-attention

* Heinsen Routing: In my experience, the current best solution for n×m attention, i.e., mapping n tokens to m tokens. It's like a souped-up version of attention. I've used it to pull up more than a million tokens as context. Memory use and compute are O(nm). It works, but in my (limited) experience, it doesn't work out-of-the-box as well as FlashAttention for n² attention. Code: https://github.com/glassroom/heinsen_routing

* RWKV: A sort-of-recurrent model which claims to have performance comparable to n² attention in transformers. In my (limited) experience, it doesn't. Others seem to agree: https://twitter.com/arankomatsuzaki/status/16390003799784038... . Code: https://github.com/BlinkDL/RWKV-LM

* RMT (this method): I'm skeptical that the recurrent connections will work as well as n² attention or n×m routing in practice, but I'm going to give it a try. Code: https://github.com/booydar/t5-experiments/tree/scaling-repor...

In addition, the group that developed FlashAttention is working on state-space models (SSMs) that look promising to me. The idea is to approximate n² attention dynamically using only O(n log n) compute. There's no code available, but here's a blog post about it: https://hazyresearch.stanford.edu/blog/2023-03-27-long-learn... [CORRECTION: Code is available. See comment by lucidrains below. I'm hopeful this will go to the top of my list.]

If anyone here has other suggestions for working with long sequences (hundreds of thousands to millions of tokens), I'd love to learn about them.

danicgross · 2 years ago

cs702, fantastic comment. I am sorta poking around this area too. I'd be curious what benchmark you're using to evaluate performance amongst these repos? If you're up for it, shoot me an email -- my email is in my profile.

danicgross commented on NSO Group iMessage Zero-Click Exploit Captured in the Wild citizenlab.ca/2021/09/for... · Posted by u/jbegley

danicgross · 4 years ago

Would turning off iMessage protect from this? Or would the iPhone still process the GIF through SMS somehow...?

danicgross commented on Is Word Error Rate a Good Metric for Speech Recognition Models? assemblyai.com/blog/word-... · Posted by u/dylanbfox

CornCobs · 4 years ago

I'm working on a similar domain, music transcription. The challenge is to estimate note values (how many beats is a note supposed to be as represented in the score?) and I'm not sure what would be the a good way to measure transcription accuracy. The naive note error rate cannot capture whether my model successfully detects certain musical structure, syncopation, dotted rhythms etc

danicgross · 4 years ago

Related, are there better representations for music than standard notation (or MIDI)?

I'm wondering what the higher convolution levels could look like, if this was a CNN analyzing an image. Something between a the complete Ableton/Logic export and a MIDI file. Being able to capture the "feel" of a song (or a section within a song) strikes me as an important milestone towards designing really good generative music.

danicgross commented on Launch HN: Bedrock AI (YC S21) – Using ML to identify red flags in SEC filings · Posted by u/kbennatti

danicgross · 4 years ago

Interesting.

How do you think about backtesting? There are a few short-only shops that specialize in finding frauds. If you get their historical 13-Fs, how would you score against them in terms of precision/recall?

And I guess more broadly, how does alpha with your system compare to a portfolio that holds all short positions by big long/short funds (ex thematic shorts)? Meaning, those guys have full-time humans that focus on this... can you beat them? Very interesting if so.

danicgross commented on Show HN: Anki alternative with integrated notes and import/export get21stnight.com/... · Posted by u/klevertree

sabco · 4 years ago

So basically the same features as Anki except it's paid?

danicgross · 4 years ago

Anki is paid too. ($25 for the mobile app.)

danicgross commented on Show HN: Hacker News-ish stock news, from 40+ sources steez.news... · Posted by u/mjmasia

Thorncorona · 4 years ago

FWIW I don't think that you can really do this without moderation. Financial news is one area where curation shines particularly strongly, considering how strong monetary and information gain collide.

Benzinga / Motley Fool / Seeking Alpha / Business Wire / Forbes aren't places to find worthwhile information.

danicgross · 4 years ago

Very true. I would gladly pay for a feed of financial news, but instead of "clicks" it optimizes on market movement. News articles from sources that, in the past, have predicted market activity.

I.e. "this blog mentioned NYSE:TEVA, and the next day the stock moved materially, therefore site_ranking++". (You'd probably have some TF/IDF saliency metric too, so that a site that mentions all stocks is penalized.)

danicgross commented on Amazon Sidewalk amazon.com/Amazon-Sidewal... · Posted by u/encryptluks2

pupdogg · 4 years ago

If anyone is interested, I wouldn't mind collaborating to create a privacy oriented SmartTV or PC for devs or simply an industrial grade dumb TV. I have experience designing and building 70" industrial displays that utilize Samsung LCD panels. These "industrial" panels are normally twice the price of consumer grade ones but are designed to operate 24/7 with a MTBF of 100k hours (approx. 11 years) and are usually twice the brightness of normal ones (a very noticeable difference). I also have direct relations with Samsung for sourcing. Displays I've designed and built: https://bit.ly/3vV9jVm

danicgross · 4 years ago

That’s interesting. Can you make something that’s higher end than the most high end consumer TV?

(If Tesla had to be “a better car that’s also electric”, I think this would need to be “a better TV that’s also private”.)

danicgross commented on HN front page ranked using only votes from early users (2009) news.ycombinator.com/clas... · Posted by u/ibraheemdev

danicgross · 4 years ago

You could expand the cutoff date by modeling lookalike audiences. A 2018 account that votes on similar things to a 2008 account might be admitted. This affords the moderator an easy “exploration” spigot they can tune up or down.