Readit News logoReadit News
getoffmyyawn commented on Understanding the Limitations of Mathematical Reasoning in LLMs   arxiv.org/abs/2410.05229... · Posted by u/hnhn34
yk · a year ago
I test llms actually similar. For example there is a well known logic puzzle were a farmer tries to cross a river with a cabbage a goat and a wolf. Llms can solve that since at least GPT-2, however if we replace the wolf with a cow, gpt-o does correctly infer the rules of the puzzle but can't solve it.
getoffmyyawn · a year ago
I've found that the River Crossing puzzle is a great way to show how LLMs break down.

For example, I tested Gemini with several versions of the puzzle that are easy to solve because they don't have the restrictions such as the farmer's boat only being able to carry one passenger/item at a time.

Ask this version, "A farmer has a spouse, chicken, cabbage, and baby with them. The farmer needs to get them all across the river in their boat. What is the best way to do it?"

In my tests the LLMs nearly always assume that the boat has a carry-restriction and they come up with wild solutions involving multiple trips.

getoffmyyawn commented on “Wherever you get your podcasts” is a radical statement   anildash.com//2024/02/06/... · Posted by u/Tomte
lxgr · 2 years ago
That's really what it is to me too, and I'd consider myself pretty technical: A podcast is what I can listen to in the gym or on my bike without missing anything important on the video.

The only difference to an audiobook is that Podcasts are usually free and often (but not always) in serial format, but these boundaries are blurring more and more.

That said, I do definitely prefer open RSS distribution than something like "Youtube for Audio", and I'm glad there isn't any such thing (yet), but I wouldn't not call a podcast exclusive to Spotify, Apple etc. "not a podcast", just an annoyingly-distributed podcast.

getoffmyyawn · 2 years ago
> A podcast is what I can listen to in the gym or on my bike without missing anything important on the video.

If I were listening to a talk show on a pocket FM radio while in the gym, would you call that a podcast? What about a an audiobook on a cd player? How about a 30 minute audio recording my wife made to encourage me that I saved in my phone? If not, why not?

getoffmyyawn commented on “Wherever you get your podcasts” is a radical statement   anildash.com//2024/02/06/... · Posted by u/Tomte
JohnFen · 2 years ago
If it doesn't use RSS (or similar) and I can't download it and play it through any player I choose, it's not really a podcast to me.
getoffmyyawn · 2 years ago
I agree 100%. The defining characteristic of a podcast is how it is distributed. Otherwise it's just an audio program. However, we are losing the word already. The least technical people I know think "podcast" means any kind of audio program with talking.

As in, "Hey I just started a podcast on youtube!" but literally it's just a yt channel.

getoffmyyawn commented on OKRs Are Bullshit   blog.appliedcomputing.io/... · Posted by u/hiyer
getoffmyyawn · 2 years ago
In my experience, the vast majority of companies that use OKRs do it incorrectly, misunderstand the point, and cause more harm than good. As a member of the management team, I was able to get my current company to leave them behind, by giving a well sourced presentation on what OKR is for and when/how to use them. After 1 quarter the lights went on and nobody wanted to use them anymore.

Here are the biggest mistakes I commonly see.

1. Use OKRs are for quarterly or year targets. No, OKRs are for managing big changes to things. Never for Business as Usual activities.

2. Require every team to write OKRs for every quarter / interval. See number 1.

3. Focus on the Key Results instead of the Objective. If the Key Results can be achieved without achieving the Objective, that's a broken OKR.

Overall, I think OKRs are a tool that has its uses but its misuse causes far more problems than its absence.

Update: typos

getoffmyyawn commented on Executing Cron Scripts Reliably at Scale   slack.engineering/executi... · Posted by u/kiyanwang
djboz · 2 years ago
Really cool! For the Gearman workers, did you load jobs dynamically? Or, would you have to re-deploy for new jobs/updated jobs?
getoffmyyawn · 2 years ago
As I recall, all the jobs are checked into a repo that is deployed to all the runners, which each start gearman workers for their assigned role.
getoffmyyawn commented on DEF CON 32 Was Canceled. We Un-Canceled it   forum.defcon.org/node/248... · Posted by u/Spodera
tptacek · 2 years ago
It was famously at Alexis Park for years, which might as well be the moon if you're at Caesar's.
getoffmyyawn · 2 years ago
I started going to def con before the AP and I feel now that the AP were def con's golden years. At least for what I like best about it.

It's nice that it has continued to grow and reach more people but it has also changed a lot from what it used to be to what it is now.

getoffmyyawn commented on Executing Cron Scripts Reliably at Scale   slack.engineering/executi... · Posted by u/kiyanwang
ryze20245 · 2 years ago
I feel like a more effective way to use cron is just to dispatch jobs into a queue that will perform the actual processing. And not to do the processing within the cron scripts themselves. That way the load on the cron is light and the heavy lifting is done by your queue/worker system.
getoffmyyawn · 2 years ago
This works great in my experience.

A lifetime ago I scaled up cron jobs for a client with Gearman. Using cron to trigger jobs on the Gearman server and the pool of runners to do all the work. This proved to be so reliable they still use the system today, over 10 years later.

getoffmyyawn commented on Executing Cron Scripts Reliably at Scale   slack.engineering/executi... · Posted by u/kiyanwang
chronid · 2 years ago
I think there's a bit of "they could" but also something that is considered very little in many contexts unless you have experienced the contrary: integration is costly and integrating properly sometimes is more work than doing something from "scratch", so you don't do it and then you have a mess that hurts you in the long run.
getoffmyyawn · 2 years ago
I'm sure it's indeed something like that. I think it also comes down to, at least partly, having a culture that is more about building components than systems. I suspect it could also be the "buzz" factor. The press release about building a new system always seems more exciting than one about solving a familiar problem with boring old existing software.
getoffmyyawn commented on Executing Cron Scripts Reliably at Scale   slack.engineering/executi... · Posted by u/kiyanwang
getoffmyyawn · 2 years ago
That there are numerous mature battle tested open source solutions to distributed and/or centrally managed job queues that it really makes me wonder how they justified building something from scratch.
getoffmyyawn commented on Waveterm   waveterm.dev/... · Posted by u/indigodaddy
getoffmyyawn · 2 years ago
I don't know who this tool is for but it is certainly not designed for me. Unrelated but I recently learned the magic of

    vim scp://1.2.3.4/file_to_edit

u/getoffmyyawn

KarmaCake day394January 18, 2023View Original