atlacatl_sv (u/atlacatl_sv)

atlacatl_sv commented on XBai o4, where o=open, and o4 represents our fourth gen open-source LLM tech github.com/MetaStone-AI/X... · Posted by u/atlacatl_sv

atlacatl_sv · 7 months ago

XBai o4 excels in complex reasoning capabilities and has now completely surpassed OpenAI-o3-mini in Medium mode.

atlacatl_sv commented on Best explanation of Q, K, V? (Attention) · Posted by u/profsummergig

atlacatl_sv · a year ago

Please have a look at this video, hope it helps: https://youtu.be/KJtZARuO3JY

atlacatl_sv commented on DeepSeek-R1 github.com/deepseek-ai/De... · Posted by u/meetpateltech

ein0p · a year ago

Downloaded the 14B, 32B, and 70B variants to my Ollama instance. All three are very impressive, subjectively much more capable than QwQ. 70B especially, unsurprisingly. Gave it some coding problems, even 14B did a pretty good job. I wish I could collapse the "thinking" section in Open-WebUI, and also the title for the chat is currently generated wrong - the same model is used by default as for generation, so the title begins with "<thinking>". Be that as it may, I think these will be the first "locally usable" reasoning models for me. URL for the checkpoints: https://ollama.com/library/deepseek-r1

atlacatl_sv · a year ago

Thanks for sharing your experience with the 14B, 32B, and 70B variants! I'm curious, what hardware setup are you using to run these models on your Ollama instance?

atlacatl_sv commented on Are there HN posts with more than 2k upvotes? · Posted by u/dkpk

atlacatl_sv · a year ago

yes, please a look at this:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

Also this list might be useful too:

https://news.ycombinator.com/lists

atlacatl_sv commented on Glowing Orb on Camera in the New Jersey Sky youtube.com/watch?v=eMitm... · Posted by u/atlacatl_sv

atlacatl_sv · a year ago

Here is another video from the Pentagon from five years ago where they developed a plasma technology that looks very similar to the glowing orb. On another note, I'm wondering why Hacker News seems to ignore this topic. I don't see any drones or orbs on the front page.

https://youtu.be/UYr3zPP5rCw

atlacatl_sv commented on Mamba Explained: The State Space Model Taking On Transformers kolaayonrinde.com/blog/20... · Posted by u/koayon

thecolorgreen · 2 years ago

Why doesn't Equation 1b use the h' defined in Equation 1a?

atlacatl_sv · 2 years ago

I believe h' is for the next state. y(t) is to predict the next word so it uses the current hidden state h(t).