abetlen (u/abetlen) - Readit News

abetlen commented on Numbers every LLM developer should know github.com/ray-project/ll... · Posted by u/richardliaw

abetlen · 2 years ago

I would add the following two numbers if you're generating realtime text or speech for human consumption:

- Human Reading Speed (English): ~250 words per minute

- Human Speaking Speed (English): ~150 words per minute

Should be treated like the Doherty Threshold [1] for generative content.

[1] https://lawsofux.com/doherty-threshold/

abetlen commented on The Coming of Local LLMs nickarner.com/notes/the-c... · Posted by u/yarapavan

travisjungroth · 2 years ago

It’s in the near past. https://github.com/ggerganov/llama.cpp

abetlen · 2 years ago

Also worth checking out https://github.com/saharNooby/rwkv.cpp which is based on Georgi's library and offers support for the RWKV family of models which are Apache-2.0 licensed.

abetlen commented on The Coming of Local LLMs nickarner.com/notes/the-c... · Posted by u/yarapavan

sendfoods · 2 years ago

How realistic is CPU-only inference in the near future?

abetlen · 2 years ago

You can see for yourself (assuming you have the model weights) https://github.com/abetlen/llama-cpp-python

I get around ~140 ms per token running a 13B parameter model on a thinkpad laptop with a 14 core Intel i7-9750 processor. Because it's CPU inference the initial prompt processing takes longer than on GPU so total latency is still higher than I'd like. I'm working on some caching solutions that should make this bareable for things like chat.

abetlen commented on GPT-3 will ignore tools when it disagrees with them vgel.me/posts/tools-not-n... · Posted by u/todsacerdoti

est · 3 years ago

> Using "chain of thought prompting" to get GPT-3 to "reason" based on those results

IIRC gpt-3 can not do chain-of-thought

abetlen · 3 years ago

This is not true, GPT-3 can perform chain-of-thought reasoning through in-context learning either by one/few-shot examples or zero-shot by starting a prompt with "let's think step by step" (less reliable).

GPT-3.5 (what's being used here) is a little better at zero-shot in-context learning as it's been intstruction fine-tuned so it's only given the general format in the context.

abetlen commented on Microsoft's AI Bing also generated factual errors at launch theregister.com/2023/02/1... · Posted by u/isaacfrond

PaulDavisThe1st · 3 years ago

That's the narrow narrative.

The actual wide narrative is that the current language models hallucinate and lie, and there is no coherent plan to avoid this. Google? Microsoft? This is a much less important question than whether or not anyone is going to push this party-trick level technology onto a largely unsuspecting public.

abetlen · 3 years ago

I think you're focusing on a few narrow examples where LLMs are underperforming and generalising about the technology as a whole. This ignores the fact that Microsoft already has a succesful LLM-based product in the market with Github Copilot. It's a real tool (not a party-trick technology) that people actually pay for and use every day.

Search is one application, and it might be crap right now, but for Microsoft it only needs to provide incremental value, for Google it's life or death. Microsoft is still better positioned in both the enterprise (Azure, Office365, Teams) and developer (Github, VSCode) markets.

abetlen commented on Hacks for Engineering Estimates shubhro.com/2022/01/30/ha... · Posted by u/shbhrsaha

abetlen · 3 years ago

My go-to heuristic is three point estimation, basically a weighted average of the best, worst, and average case [0].

(Best + Worst + 4 * Average) / 6

One nice property is that it imposes a distribution that adjusts for longer tailed risks.

https://en.wikipedia.org/wiki/Three-point_estimation

abetlen commented on Tailscale raises $100M tailscale.com/blog/series... · Posted by u/gmemstr

abetlen · 3 years ago

If you run a Kubernetes cluster for self-hosting software or development I highly recommend setting up a Tailscale subnet router [1]. This will allow you to access any IP (pods or services) in your cluster from any of your Tailscale-connected computers. You can even configure Tailscale DNS to point to the DNS server in your cluster to connect using the service names directly ie. http://my-service.namespace.svc.cluster.local

[1] https://tailscale.com/kb/1185/kubernetes/#subnet-router

abetlen commented on Plant UML – Open-source UML Tool plantuml.com/... · Posted by u/kumarvvr

tjoff · 4 years ago

These submissions/threads have been cropping up a bit lately. But I have yet to seen anyone comment on pikchr, https://pikchr.org/ (NOTE: doesn't have as close of a relation to UML as Plant UML)

After what felt like endless googling it is what I decided to spend some time with, haven't had time to do much yet so can't say how it performs for me but the idea and execution really resonates with me.

abetlen · 4 years ago

pikchr is awesome. A project I did recently was a WASM-compiled pikchr library to generate diagrams directly in the browser [1]. Here's a very early demo of a live editor you can play around with [2].

Not fully-featured yet but what I'd like to eventually do is set it up in a similar way to the mermaidjs editor [3]. They encode the entire diagram in the url. That makes it really easy to link to from markdown documents and has the nice benefit that the diagram is immutable for a given url so you don't need a backend to store anything.

[1]: https://www.npmjs.com/package/pikchr-js

[2]: https://pikchr-editor.insert-mode.dev/

[3]: https://mermaid-js.github.io/mermaid-live-editor

abetlen commented on It Can Happen to You mattkeeter.com/blog/2021-... · Posted by u/mooreds

froh · 5 years ago

scanf and printf have complementary format specifiers, which can make maintaining serialization and parsing of regular data a breeze...

the proper remedy is to simply wrap the string to parse with fmemopen(3), which makes the temporary FILE object explicit and persistent for the whole parse, and needs just one strlen call.

https://news.ycombinator.com/item?id=26343149

abetlen · 5 years ago

Cool trick, thanks for sharing. I don't get why there isn't a suitable snscanf function that takes the buffer length as an argument and returns the number of bytes parsed?