tigershark (u/tigershark)

tigershark commented on Titans: Learning to Memorize at Test Time arxiv.org/abs/2501.00663... · Posted by u/bicepjai

suninsight · 7 months ago

Key questions:

1. The key data point seems to be Figure 6a. Where it compares performance on BABILong and claims Titans performance is at ~62%, as compared to GPT-4o-mini at ~42% for 100k sequence length.

However, GPT-4o and Claude are missing in this comparison - maybe because they perform better ?

2. There is no example provided of the Neural Memory Module in action. This is the first question I would ask of this paper.

tigershark · 7 months ago

The biggest model that they have used has only 760M parameters, and it outperforms models 1 order of magnitude larger.

tigershark commented on Where can you go in Europe by train in 8h? chronotrains.com/en... · Posted by u/vortex_ape

tlubinski · 8 months ago

They just launched a new high-speed train from Berlin to Paris with a travel time of 8 hours: https://apnews.com/article/germany-france-berlin-paris-highs...

tigershark · 8 months ago

Yeah, less than from Milan to my city still in Italy…

tigershark commented on Apple Photos phones home on iOS 18 and macOS 15 lapcatsoftware.com/articl... · Posted by u/latexr

dmix · 8 months ago

How much size would it take to store a model of every known location in the world and common things?

For ex: I sent a friend a photo of my puppy in the bathtub and her Airpods (via iphone) announced "(name) sent you a photo of a dog in a bathtub". She thought it was really cool and so do I personally. That's a useful feature. IDK how much that requires going off-device though.

tigershark · 8 months ago

I’m not an expert, but I would say extremely small.

For comparison Hunyuan video encodes a shit-ton of videos and rudimentary real world physics understanding, at very high quality in only 13B parameters. LLAMA 3.3 encodes a good chunk of all the knowledge available to humanity in only 70B parameters. And this is only considering open source models, the closed source one may be even more efficient.

tigershark commented on GPT-5 is behind schedule wsj.com/tech/ai/openai-gp... · Posted by u/owenthejumper

energy123 · 8 months ago

A new o1 was released on December 17th. Which one are you talking about

tigershark · 8 months ago

Exactly. The previous version of o1 did actually worse in the coding benchmarks, so I would expect it to be worse in real life scenarios. The new version released a few days ago on the other hand is better in the benchmarks, so it would seem strange that someone used it and is saying that it’s worse than Claude.

tigershark commented on OpenAI O3 breakthrough high score on ARC-AGI-PUB arcprize.org/blog/oai-o3-... · Posted by u/maurycy

csomar · 8 months ago

I don't care about benchmarks. O1 ranks higher than Claude on "benchmarks" but performs worse on particular real life coding situations. I'll judge the model myself by how useful/correct it is for my tasks rather than a hypothetical benchmarks.

tigershark · 8 months ago

As I said, o3 demonstrated field medal level research capacity in the frontier math tests. But I’m sure that your use cases are much more difficult than that, obviously.

tigershark commented on OpenAI O3 breakthrough high score on ARC-AGI-PUB arcprize.org/blog/oai-o3-... · Posted by u/maurycy

csomar · 8 months ago

Just give it a year for this bubble/hype to blow over. We have plateaued since gpt-4 and now most of the industry is hype-driven to get investor money. There is value in AI but it's far from it taking your job. Also everyone seems to be investing in dumb compute instead of looking for the new theoretical paradigm that will unlock the next jump.

tigershark · 8 months ago

Where is the plateau? Chatgtp 4 was ~0% in ARC-AGI. 4o was 5%. This model literally solved it with a score higher than the 85% of the average human. And let’s not forget the unbelievable 25% in frontier math, where all the most brilliant mathematicians in the world cannot solve by themselves a lot of the problems. We are speaking about cutting edge math research problems that are out of reach from practically everyone. You will get a rude awakening if you call this unbelievable advancement a “plateau”.

tigershark commented on Veo 2: Our video generation model deepmind.google/technolog... · Posted by u/mvoodarla

dyauspitr · 8 months ago

The quality for SD is no where near the clear leaders.

tigershark · 8 months ago

You must be stuck at SDXL for posting something absolutely and verifiably false as the sentence above.

tigershark commented on Ask HN: SWEs how do you future-proof your career in light of LLMs? · Posted by u/throwaway_43793

simianparrot · 8 months ago

Nothing because I’m a senior and LLM’s never provide code that pass my sniff test, and it remains a waste of time.

I have a job at a place I love and get more people in my direct network and extended contacting me about work than ever before in my 20 year career.

And finally I keep myself sharp by always making sure I challenge myself creatively. I’m not afraid to delve into areas to understand them that might look “solved” to others. For example I have a CPU-only custom 2D pixel blitter engine I wrote to make 2D games in styles practically impossible with modern GPU-based texture rendering engines, and I recently did 3D in it from scratch as well.

All the while re-evaluating all my assumptions and that of others.

If there’s ever a day where there’s an AI that can do these things, then I’ll gladly retire. But I think that’s generations away at best.

Honestly this fear that there will soon be no need for human programmers stems from people who either themselves don’t understand how LLM’s work, or from people who do that have a business interest convincing others that it’s more than it is as a technology. I say that with confidence.

tigershark · 8 months ago

Yeah.. generations. I really hope that it doesn’t end like the New York Times article saying that human flight was at best hundred of years away a few weeks before the Wright brothers flight..

tigershark commented on Starship Flight 5: Launch and booster catch [video] twitter.com/SpaceX/status... · Posted by u/alecco

WalterBright · 10 months ago

You'll have to indict more than half of the country, then.

tigershark · 10 months ago

Absolutely false. Trump got 74M votes in last election in 2020. US population was 330 millions. Today US population is ~340M. I seriously doubt that more than 170M people support a convicted felon for president. I bet that he will get less than 170M votes in less than a month. Do you want to take the other side of the bet to back up your provably wrong assertion?

tigershark commented on Cannabis use linked to epigenetic changes, study reveals sciencealert.com/cannabis... · Posted by u/XzetaU8

toxicdevil · a year ago

Partially unrelated, but I always post this whenever cannabis is discussed, as a PSA.

Proponents often say that this drug is harmless but in some people it's use can trigger psychiatric illnesses esp. schizophrenia and related disorders. In others it can actually exacerbate anxiety (its somewhat counter intuitive, just like some antidepressants can cause suicidal thoughts). Some people are genetically more predisposed to the effects.

This is a personal topic for me because cannabis (ab)use triggered psychotic episodes in two of my close family members, they had to be hospitalized multiple times (psych ward is no joke) and put on antipsychotics (which are also very hard on you and drain the life out of you). Their actions during the psychotic/manic phase disrupted their family and work lives. Both people were unwilling to cease cannabis use, citing its public acceptability and reasons like "it's legal", "you literally can't overdose on it", "a(n) (internet) doctor prescribed it to me for anxiety so I can use it", "everyone uses it and is fine", "xyz (popular celebrity) uses it". After multiple stints in the psych ward and the threat of government mandated treatments they were finally able to drop cannabis use, it then took them many months to come back to normal functioning.

tigershark · a year ago

Good call, here cannabis damaging effects are usually minimised and hidden under the rug. And you will always, always find someone telling you “what about alcohol”