bradhilton (u/bradhilton)

bradhilton commented on “Captain Gains” on Capitol Hill nber.org/papers/w34524... · Posted by u/mhb

Great, so I can copycat their portfolio and do 47 pts better.

bradhilton · 3 months ago

The problem is that they have a lot of time to report their purchases. If they were required to report before they purchased the problem would probably resolve itself.

bradhilton commented on “Captain Gains” on Capitol Hill nber.org/papers/w34524... · Posted by u/mhb

Mistletoe · 3 months ago

Even that is open to massive bias. Who would vote to curtail the 7 tech giants when their own portfolio is basically the SP500 those companies dominate?

bradhilton · 3 months ago

The SP500 is probably the most popular investment in America, perhaps aside from housing. Wouldn't hurt to have lawmaker's fortunes broadly aligned versus narrowly aligned with specific corporations.

bradhilton commented on Show HN: ART – a new open-source RL framework for training agents github.com/OpenPipe/ART... · Posted by u/kcorbitt

gitroom · 10 months ago

Perfect, I've always wanted an easier way to mess with RL frameworks. Gonna mess around with this asap.

bradhilton · 10 months ago

Awesome! If you run into any problems or have questions feel free to open an issue or drop by the discord [1] server.

[1] https://discord.gg/zbBHRUpwf4

bradhilton commented on Show HN: ART – a new open-source RL framework for training agents github.com/OpenPipe/ART... · Posted by u/kcorbitt

tcdent · 10 months ago

I really like this concept.

Do you have documentation for the API response from the `/_train_model` endpoint?

bradhilton · 10 months ago

Hi, we don't have reliable documentation for the HTTP API endpoints yet, mostly as they are still subject to change.

However, to briefly provide some context, `/_train_model` returns a stream of line delimited JSON objects for each gradient step as the model trains on the provided trajectories so the client can monitor progress. The final version of this endpoint may provide the option for both streaming & non-streaming responses, and/or potentially return a "training job" that can be polled instead.

bradhilton commented on Show HN: ART – a new open-source RL framework for training agents github.com/OpenPipe/ART... · Posted by u/kcorbitt

bradhilton · 10 months ago

Contributor here, we developed the Agent Reinforcement Trainer (ART) library to make it easy to train LLMs for anything.

No callbacks or straitjacket flows. Instead we serve an OpenAI API-compatible endpoint that you can use as a drop-in replacement for any proprietary APIs you may be hitting.

After collecting responses from the inference API, you can tune the model with your own custom rewards and repeat the process as long as you like, until performance converges. We believe this level of flexibility will make it easier for you to train state-of-the-art models for your own use cases, much like Kyle's new email agent[1].

Also happy to answer any questions you have about the framework.

[1] https://openpipe.ai/blog/art-e-mail-agent

bradhilton commented on ART·E: how we built an email research agent that beats o3 openpipe.ai/blog/art-e-ma... · Posted by u/kcorbitt

jaredmpeterson · 10 months ago

Impressive work on leveraging ART and GRPO to enhance the capabilities of LLM-based email agents for deep-research tasks. I’m curious about the scalability of this approach: how does the system handle a growing volume of diverse email queries, and what measures are in place to maintain response quality over time? Additionally, are there plans to integrate this agent with existing email platforms for broader accessibility?

bradhilton · 10 months ago

I could see training your own email agent being beneficial for products like this:

https://x.com/advaitpaliwal/status/1913290027897131084

bradhilton commented on The Llama 4 herd ai.meta.com/blog/llama-4-... · Posted by u/georgehill

simonklee · a year ago

Is this the first model that has a 10M context length?

bradhilton · a year ago

I know Google DeepMind ran experiments with 10M a while ago, but I think this will be the first legit, released 10M context window model.

bradhilton commented on Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” openpipe.ai/blog/using-gr... · Posted by u/kcorbitt

fc417fc802 · a year ago

> only if you do a gradient step with data sampled from the exact same weights is it an online step.

Bit pedantic, but amusing thought; wouldn't that imply that asynchronous actor critic is an offline training methodology?

bradhilton · a year ago

Yes, pedantically, it is! But as I said, everything's on a spectrum. Online-ish data can still work just fine.

bradhilton commented on Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” openpipe.ai/blog/using-gr... · Posted by u/kcorbitt

Liwink · a year ago

Can you please share the training cost?

bradhilton · a year ago

We used about 58 hours on 4xH100s and about 19 hours on 8xH100s to get the very best result with the 32B model. We trained for about another 16 hours before finishing the run, but we could have stopped earlier after it was apparent the model was regressing. Actual dollar costs are provider dependent.

bradhilton commented on Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” openpipe.ai/blog/using-gr... · Posted by u/kcorbitt

ekidd · a year ago

Once the problem gets narrow enough, do you risk training a model that reinvents a straightforward classic algorithm at far higher cost?

bradhilton · a year ago

Well, in this case there is a much more straightforward method with the same CP-SAT solver used to create the puzzles. This is more of a fun experiment to see if we can train LLMs to solve these kinds of logical deduction problems.

u/bradhilton

KarmaCake day166July 19, 2018

About

Twitter: @bradthilton Email: brad.hilton.nw@gmail.com

View Original