Readit News logoReadit News
fabmilo commented on AI for Scientific Search   arxiv.org/abs/2507.01903... · Posted by u/omarsar
fabmilo · 2 months ago
I like zotero, I started vibe coding some integration for my workflow, the project is a bit clunky to build and iterate the development specially with gemini & claude. But I think that is the direction to take instead of reinvent from scratch something
fabmilo commented on Show HN: Defuddle, an HTML-to-Markdown alternative to Readability   github.com/kepano/defuddl... · Posted by u/kepano
tmpfs · 3 months ago
Interesting as I was researching this recently and certainly not impressed with the quality of the Readability implementations in various languages. Although Readability.js was clearly the best, it being Javascript didn't suit my project.

In the end I found the python trifatura library to extract the best quality content with accurate meta data.

You might want to compare your implementation to trifatura to see if there is room for improvement.

fabmilo · 3 months ago
reference to the library: https://trafilatura.readthedocs.io/en/latest/

for the curious: Trafilatura means "extrusion" in Italian.

| This method creates a porous surface that distinguishes pasta trafilata for its extraordinary way of holding the sauce. search maccheroni trafilati vs maccheroni lisci :)

(btw I think you meant trafilatura not trifatura)

fabmilo commented on A Research Preview of Codex   openai.com/index/introduc... · Posted by u/meetpateltech
tough · 3 months ago
they also have a dual implementation on rust and typescript there's codex-rs in that monorepo
fabmilo · 3 months ago
more excited about the rust impl than the typescript one.
fabmilo commented on Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL   primeintellect.ai/blog/in... · Posted by u/Philpax
refulgentis · 3 months ago
I guess I'm bearish?

It's not that they trained a new model, but they took an existing model and RL'd it a bit?

The scores are very close to QwQ-32B, and at the end:

"Overall, as QwQ-32B was already extensively trained with RL, it was difficult to obtain huge amounts of generalized improvement on benchmarks beyond our improvements on the training dataset. To see stronger improvements, it is likely that better base models such as the now available Qwen3, or higher quality datasets and RL environments are needed."

fabmilo · 3 months ago
The interesting delta here is that this proves that we can distribute the training and get a functioning model. The scaling factor is way bigger than datacenters
fabmilo commented on Multi-Token Attention   arxiv.org/abs/2504.00927... · Posted by u/fzliu
bigdict · 5 months ago
Sure, you can get better model performance by throwing more compute at the problem in different places. Does is it improve perf on an isoflop basis?
fabmilo · 5 months ago
I read the paper and the results don't really convince me that is the case. But the problem still remains of being able to use information from different part of the model without squishing it to a single value with the softmax.
fabmilo commented on Multi-Token Attention   arxiv.org/abs/2504.00927... · Posted by u/fzliu
fabmilo · 5 months ago
We have to move past tokenization for the next leap in capabilities. All this work done on tokens, specially in the RL optimization contest, is just local optimization alchemy.
fabmilo commented on LIMO: Less Is More for Reasoning   arxiv.org/abs/2502.03387... · Posted by u/trott
fabmilo · 6 months ago
I will believe reasoning architectures when the model knows how to store parametric information in an external memory out of the training loop.
fabmilo commented on DeepSeek-R1   github.com/deepseek-ai/De... · Posted by u/meetpateltech
ozgune · 7 months ago
The R1 GitHub repo is way more exciting than I had thought.

They aren't only open sourcing R1 as an advanced reasoning model. They are also introducing a pipeline to "teach" existing models how to reason and align with human preferences. [2] On top of that, they fine-tuned Llama and Qwen models that use this pipeline; and they are also open sourcing the fine-tuned models. [3]

This is *three separate announcements* bundled as one. There's a lot to digest here. Are there any AI practitioners, who could share more about these announcements?

[2] We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.

[3] Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

fabmilo · 7 months ago
was genuinely excited when I read this but the github repo does not have any code.
fabmilo commented on rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking   arxiv.org/abs/2501.04519... · Posted by u/roboboffin
fabmilo · 7 months ago
I was just about to submit this link and redirected me to this page. I am shocked that it received only four comments. If you are working in the LLMs/Agent space ( you are, right?) and you don't understand the significance of this paper, you are set for failure.
fabmilo commented on Happy New Year 2025    · Posted by u/martynvandijke
fabmilo · 8 months ago
Happy new year to everyone, hacker news is more than my home page. This community is awesome!

u/fabmilo

KarmaCake day77September 20, 2014
About
Reasoning / LLMs / PyTorch / Deep Learning / AI / Distributed Systems
View Original