Zacharias030 (u/Zacharias030)

Zacharias030 commented on The architecture of “not bad”: Decoding the Chinese source code of the void suggger.substack.com/p/th... · Posted by u/Suggger

Zacharias030 · 4 days ago

ok but Germans also use not bad as their highest praise and are from the west etc.

Zacharias030 commented on Think in math, write in code (2019) jmeiners.com/think-in-mat... · Posted by u/alabhyajindal

gpjanik · a month ago

"Think in math, write in code" is the possibly worst programming paradigm for most tasks. Math notations, conventions and concepts usually operate under the principles of minimum description lenght. Good programming actively fights that in favor of extensibility, readability, and generally caters to human nature, not maximum density of notation.

If you want to put this to test, try formulating a React component with autocomplete as a "math problem". Good luck.

(I studied maths, if anyone is questioning where my beliefs come from, that's because I actually used to think in maths while programming for a long time.)

Zacharias030 · a month ago

The people I know who „think in math“ don’t think in the syntax of the written notation, including myself.

Zacharias030 commented on Largest cargo sailboat completes first Atlantic crossing marineinsight.com/shippin... · Posted by u/defrost

gerdesj · a month ago

"suggested masts are better as integral parts of ships rather than bolt on after thoughts."

It seems quite mad that we even need to debate this. Wind is free power and we have at least 2000 years of engineering to draw on how to use it.

Any propulsion unit needs to be effectively attached to a ship. Screws are attached longitudinally, low down and push. Sails are a bit more tricksy. A triangular sail mounted along the long axis will generally work best because it can handle more wind angles but a square sail mounted across the long axis will provide more power on a "reach" to a "run" (the wind is mostly from behind, so pushing).

The cutting edge of sailing ships that carried stuff are the tea clippers. Think "Cutty Sark" which is now a visitor attraction in London, Greenwich. Note the stay sails - the triangular sails at the front. Then note the three masts. Each mast has several main sails that are huge rectangles for "reaches" and additional extensions. There are even more triangular infill sails above the main sails.

It's quite hard to explain how wind and sails work but you need to understand that a sailing ship can sail "into the wind". Those triangles are better at it than those rectangles but those rectangles can get more power by being bigger. Even better, you can use the front triangular sails (stay sails) to moderate the wind to feed the other sails with less turbulent wind.

Wind is free power and it is so well understood. How on earth is this news?

Zacharias030 · a month ago

I thought (and think) that the teaclipper‘s square sail rigs wouldn’t be considered state of the art for any course the ship might sail.

Zacharias030 commented on Kimi Linear: An Expressive, Efficient Attention Architecture github.com/MoonshotAI/Kim... · Posted by u/blackcat201

meowface · 2 months ago

Could someone explain every term in this subthread in a very simple way to someone who basically only knows "transformers are a neural network architecture that use something called 'attention' to consider the entire input the whole time or something like that", and who does not understand what "quadratic" even means in a time complexity or mathematical sense beyond that "quad" has something to do with the number four.

I am aware I could Google it all or ask an LLM, but I'm still interested in a good human explanation.

Zacharias030 · 2 months ago

Transformers try to give you capabilities by doing two things interleaved (in layers) multiple times:

- apply learned knowledge from its parameters to every part of the input representation („tokenized“, ie, chunkified text).

- apply mixing of the input representation with other parts of itself. This is called „attention“ for historical reasons. The original attention computes mixing of (roughly) every token (say N) with every other (N). Thus we pay a compute cost relative to N squared.

The attention cost therefore grows quickly in terms of compute and memory requirements when the input / conversation becomes long (or may even contain documents).

It is a very active field of research to reduce the quadratic part to something cheaper, but so far this has been rather difficult, because as you readily see this means that you have to give up mixing every part of the input with every other.

Most of the time mixing token representations close to each other is more important than those that are far apart, but not always. That’s why there are many attempts now to do away with most of the quadratic attention layers but keeping some.

What to do during mixing when you give up all-to-all attention is the big research question because many approaches seem to behave well only under some conditions and we haven’t established something as good and versatile as all-to-all attention.

If you forgo all-to-all you also open up so many options (eg. all-to-something followed by something-to-all as a pattern, where something serves as a sort of memory or state that summarizes all inputs at once. You can imagine that summarizing all inputs well is a lossy abstraction though, etc.)

Zacharias030 commented on A worker fell into a nuclear reactor pool nrc.gov/reading-rm/doc-co... · Posted by u/nvahalik

ErroneousBosh · 2 months ago

I also read in the XKCD thing that it might be up to about 50° so it's probably a bit uncomfortable for most algae type things.

I bet there's some good chance of getting wacky extremophiles though!

Zacharias030 · 2 months ago

50C is terribly hot for a swim.

Hot spring baths usually top out around 42-43C

Zacharias030 commented on The Cancer Imaging Archive (TCIA) cancerimagingarchive.net/... · Posted by u/1970-01-01

Zacharias030 · 2 months ago

This is a great resource for datasets for medical imaging with AI and surprisingly complete in the areas that I searched.

Zacharias030 commented on Andrej Karpathy – It will take a decade to work through the issues with agents dwarkesh.com/p/andrej-kar... · Posted by u/ctoth

seydor · 2 months ago

Is it worse than "Rich CEO expressing certainly over his hunch"

Zacharias030 · 2 months ago

fair point, I appreciate AI researcher intuition about AI much more than CEO hunch boldly stated but not even firm on the terminology.

Zacharias030 commented on Andrej Karpathy – It will take a decade to work through the issues with agents dwarkesh.com/p/andrej-kar... · Posted by u/ctoth

Zacharias030 · 2 months ago

With all due respect, what does it say about us that „famous researcher voices his speculative opinion“ is an instant top 1 on hackernews?

Zacharias030 commented on Meta Superintelligence Labs' first paper is about RAG paddedinputs.substack.com... · Posted by u/skadamat

ipsum2 · 2 months ago

This has nothing to do with superintelligence, it's just the people that were working on the paper prior to the re-org happened to publish after the name change.

Though it is notable that contrary to many (on HN and Twitter) that Meta would stop publishing papers and be like other AI labs (e.g. OpenAI). They're continued their rapid pace of releasing papers AND open source models.

Zacharias030 · 2 months ago

Should be the top comment.

MSL is not only those few high profile hires.

Zacharias030 commented on Show HN: I invented a new generative model and got accepted to ICLR discrete-distribution-net... · Posted by u/diyer22

diyer22 · 2 months ago

We provide the source code and weights along with a Docker environment to facilitate reproducing the experimental results. The original paper’s EXPERIMENTS section mentions the hardware configuration (8× RTX 2080 Ti).

Zacharias030 · 2 months ago

Impressive setup :)