Readit News logoReadit News
_praf commented on AccountingBench: Evaluating LLMs on real long-horizon business tasks   accounting.penrose.com/... · Posted by u/rickcarlino
yunyu · a month ago
Hey all, member of the benchmark team here! The goal for this project was to see how LLMs well could do bookkeeping without an overly opinionated scaffold. We gave them access to processed transaction records and code execution tools, but it was up to them to choose exactly how to use those.

Claude and Grok 4 did reasonably well (within CPA baselines) for the first few months, but tended to degrade as more data came in. Interestingly, the failures aren’t exclusively a context length problem, as we reset the context monthly (with past decisions, accruals/deferrals, and comments available via tool calls) and the types of errors appear to be more reward hacking vs pure hallucinations.

Accounting is very interesting in an RL-first world as it is pretty easy to develop intermediate rewards for training models. We are pretty sure that we can juice the performance more with a far more rigid scaffold, but that’s less relevant from a capabilities research perspective. We’re pushing down this research direction and will see how it goes.

Let us know if you have any questions!

_praf · a month ago
Love this as a real world benchmark!

How much prompt iteration did you do? I've noticed when building real world agentic apps that small prompt tweaks can make a huge difference in behavior (re: the reward hacking vs hallucinating). Would love to learn more about the approach here.

_praf commented on Ask HN: Who is hiring? (January 2025)    · Posted by u/whoishiring
_praf · 8 months ago
Column (https://column.com/) | Software Eng (Infrastructure), Software Eng (Backend), Software Eng (Product) | San Francisco, CA (ONSITE) | Full Time

Column is the first nationally chartered bank built from the ground up for developers. We provide an API first, modern banking experience for our customers, replacing the bloated middleware and legacy software that currently powers most financial companies.

Started by the co-founder of Plaid, Column has a team of 10 experienced engineers and is currently processing hundreds of billions in payments annually, supporting some of the largest and most sophisticated fintech companies. We are looking for ambitious infrastructure, product, and backend engineers that want to build the best-in-class banking tech from first principles. 2025 is lining up to be a huge year for us with lots of new features and high-scale customers, so it's an exciting time to join the team!

Apply here: https://column.com/careers

Feel free to email me with any questions: praful@

_praf commented on Ask HN: Who is hiring? (November 2024)    · Posted by u/whoishiring
_praf · 10 months ago
Column (https://column.com/) | Software Eng (Infrastructure), Software Eng (Backend), Software Eng (Product) | San Francisco, CA (ONSITE) | Full Time

Column is the first nationally chartered bank built from the ground up for developers. We provide an API first, modern banking experience for our customers, replacing the bloated middleware and legacy software that currently powers most financial companies.

Started by the co-founder of Plaid, Column has a team of <10 experienced engineers and is currently processing hundreds of billions in payments annually, supporting some of the largest and most sophisticated fintech companies. We are looking for ambitious infrastructure, product, and backend engineers that want to build the best-in-class banking tech from first principles. It's a fun time to join - we are scaling volume crazy fast while shipping tons of new features. To keep the team small we have a very high bar for talent, but if this sounds exciting would encourage you to apply!

Apply here: https://column.com/careers

Feel free to email me with any questions: praful@

_praf commented on Ask HN: Who is hiring? (October 2024)    · Posted by u/whoishiring
angoragoats · a year ago
Please add REMOTE or ONSITE to the top line of your post, as requested in the main post of the thread.
_praf · a year ago
Done!
_praf commented on Ask HN: Who is hiring? (October 2024)    · Posted by u/whoishiring
fijiaarone · a year ago
There appears to be a bug on the Column.com contact form after signing up. Number of employees is pre-populated with a range (e.g. 2-5) based on initial question, but trying to submit the contact form gives a NaN error with "Please match the requested format" but is not editable.

So I can't contact you to create an real account.

_praf · a year ago
Thanks for flagging - will take a look!
_praf commented on Ask HN: Who is hiring? (October 2024)    · Posted by u/whoishiring
_praf · a year ago
Column (https://column.com/) | Software Eng (Infrastructure), Software Eng (Backend), Software Eng (Product) | San Francisco, CA (ONSITE) | Full Time

Column is the first nationally chartered bank built from the ground up for developers. We provide an API first, modern banking experience for our customers, replacing the bloated middleware and legacy software that currently powers most financial companies.

Started by the co-founder of Plaid, Column has a team of <10 experienced engineers and is currently processing hundreds of billions in payments annually, supporting some of the largest and most sophisticated fintech companies. We are looking for ambitious infrastructure, product, and backend engineers that want to build the best-in-class banking tech from first principles. We all work on high impact, independently driven projects - writing code for regulated financial infrastructure at scale.

Apply here: https://column.com/careers

Feel free to email me with any questions: praful@

Deleted Comment

_praf commented on Ask HN: Who is hiring? (July 2024)    · Posted by u/whoishiring
_praf · a year ago
Column (https://column.com/) | Software Eng (Backend), Software Eng (Infrastructure) | San Francisco, CA | Full Time

Column is the first nationally chartered bank built from the ground up for developers. We provide an API first, modern banking experience for our customers, replacing the bloated middleware and legacy software that currently powers most financial companies.

Started by the co-founder of Plaid, Column has a team of <10 engineers and is currently processing hundreds of billions in payments annually, supporting some of the largest and most sophisticated fintech companies. We are looking for experienced backend and infra engineers to join our team. We all work on high impact, independently driven projects - writing code for regulated financial infrastructure at scale.

Apply here: https://column.com/careers or read more about our hiring philosophy here: https://column.com/blog/hiring-at-column

Feel free to email me with any questions: praful@

u/_praf

KarmaCake day148April 18, 2022
About
Engineering @ Column
View Original