areddyyt (u/areddyyt)

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

bjornsing · a year ago

Have you tried implementing your ternary transformers on AVX(-512)? I think it fits relatively well with the hardware philosophy, and being able to run inference without a GPU would be a big plus.

areddyyt · a year ago

Our CPU implementation for X86/AMD64 utilizes AVX-512 or AVX-2 instructions where possible. We're experimenting with support for ARM with NEON.

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

simne · a year ago

> I think NVIDIA is the best hardware company for making good software

I must support Your words. Long time I thought that Intel is the best, but unfortunately I could not anymore.

Must admit, I still don't understand, how it happened, but now NVIDIA is best.

areddyyt · a year ago

100%.

When performing performance optimization on CPUs, I was impressed with Intel's suite of tools (like VTUNE). NVIDIA has some unbelievable tools, like Nsys and, of course, its container registry (NGC), which I think surpasses even Intel's software support.

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

tejasvaidhya · a year ago

There’s more to it. https://x.com/NolanoOrg/status/1813969329308021167

I will be archiving the full report with more results soon.

areddyyt · a year ago

I should note that our linear layers are not the same as Microsoft's, in fact, we think Microsoft made a mistake in the code they uploaded. When I have time later today, I'll link to where I think they made a mistake.

I've been following TriLLM. They've achieved great results, and I'm really impressed with the llama.cpp contributors already getting the models integrated.

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

danjl · a year ago

You said (implied?) that your reason for starting a company was that you were waiting for somebody (MS) to build your favorite tech, and you realized it was monetizable. Finding a gap is a great start. But, if money is your goal, it is far easier to make money working at a company than starting one. Existing companies are great places to learn about technology, business, and the issues that should really drive your desire to start something yourself.

areddyyt · a year ago

I don't think I ever implied we started this for money. We started working on the technology because it was exciting and enabled us to run LLMs locally. We wouldn't have started this company if someone else came along and did it, but we waited a month or two and didn't see anyone making progress. It just so happens that hardware is capital intensive, so making hardware means you need access to a lot of capital through grants (which Dartmouth didn't have for chip hardware) or venture capital (which we're going for now). I'm not sure where you got the idea we're doing this solely for money when I explicitly said "We were essentially nerd-sniped into working on this problem"

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

danjl · a year ago

I recommend getting a job at NVIDIA. They care deeply about SW. It is a great place to learn about HW and the supporting SW. There is much to learn. Maybe you will learn why you are unimpressed with their SW offerings. For me, the hard part was the long lead time (8+ years) from design to customers using the product. One of the things that always amazed me about NVIDIA was that so many of the senior architects, who have no financial need to keep working (true for more than a decade), are still working there because they need the company to do what they love.

areddyyt · a year ago

I think there is a comment somewhere here where I comment on NVIDIA, but I think NVIDIA is the best hardware company for making good software. We had a very niche software issue for which NVIDIA maintained open-source repos. I don't think NVIDIA's main advantage is its hardware, though; I think it's the software and the flexibility it brings to its hardware.

Suppose that Transformers die tomorrow, and Mamba becomes all the rage. The released Mamba code already has CUDA kernels for inference and training. Any of the CSPs or other NVIDIA GPU users can switch their entire software stack to train and inference Mamba models. Meanwhile, we'll be completely dead in the water with similar companies that made the same bet, like Etched.

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

danjl · a year ago

Why start a company to make this product? Why not go work at one of the existing chip manufacturers? You'd learn a ton, get to design and work on HW and/or SW, and not have to do the million other things required to start a company.

areddyyt · a year ago

We were waiting for a Bitnet-based software and hardware stack, particularly from Microsoft, but it never did. We were essentially nerd-sniped into working on this problem, then we realized it was also monetizable.

On a side note, I deeply looked into every company in the space and was thoroughly unimpressed with how little they cared about the software stack to make their hardware seamlessly work. So, even if I did go to work at some other hardware company, I doubt a lot of customers would utilize the hardware.

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

Havoc · a year ago

> This represents an almost 8x compression ratio for every weight matrix in the transformer model

Surely you’d need more ternary weights though to achieve same performance outcome?

A bit like a Q4 quant is smaller than a Q8 but also tangibly worse so the “compression” isn’t really like for like

Either way excited about more tenary progress.

areddyyt · a year ago

We do quantization-aware training, so the model should minimize the loss w.r.t. the ternary weights, hence no degradation in performance.

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

99112000 · a year ago

An area worth exploring are IP cameras imho

1. They are everywhere and aren't going anywhere.. 2. Network infrastructure to ingest and analyze thousands of cameras producing video footage is very demanding.. 3. Low power and low latency scream asic to me

areddyyt · a year ago

There was another founder that said this exact same thing. We'll definitely look into it especially as we train more ViTs.

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

transfire · a year ago

Combine it with TOC, and then you’d really be off to the races!

https://intapi.sciendo.com/pdf/10.2478/ijanmc-2022-0036#:~:t...

areddyyt · a year ago

Funnily enough, our ML engineer, Eddy, did a hackathon project working with Procyon to make a neural network with a photonic chip. Unfortunately, I think Lightmatter beat us to the punch.

Edit: I don't think the company exists in its current form anymore

areddyyt commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

luke-stanley · a year ago

The most popular interfaces (human, API and network) I can imagine are ChatGPT, OpenAI compatible HTTP API, Transformers HuggingFace API and models, Llama.cpp / Ollama / Llamafile, Pytorch. USB C, USB A, RJ45, HDMI/video(?) If you can run a frontier model or a comparable model with the ChatGPT clone like Open UI, with a USB or LAN interface, that can work on private data quickly, securely and competitively to a used 3090 it would be super badass. It should be easy to plug in and be used for running chat or API use or fine-tune or use with raw primitives via Pytorch or a very similar compatible API. I've thought about this a bit. There's more I could say but I've got to sleep soon... Good luck, it's an awesome opportunity.

areddyyt · a year ago

Have you sat in on my conversations with my cofounder?

The end plan is to have a single chip and flush all weights onto the chip at initialization. Because we are a single line of code that is Torch compatible (hence HF compatible), every other part of the codebase shouldn't change.