"prompt_id": "river_crossing_easy",
"category": "Logic Puzzle",
"title": "Easy river crossing",
"prompt": "A farmer is on one side of a river with a wolf, a goat, and a cabbage. When he is crossing the river in a boat, he can only take one item with him at a time. The wolf will eat the goat if left alone together, and the goat will eat the cabbage if left alone together. How can the farmer transport the goat across the river without it being eaten?",
"expected_behavior": [
"Answer concludes that they simply get in the boat and cross together in one trip"
],
EDIT: removing most of my commentary on this problem. As a human, I was tricked by the problem too. I would love to see how a random selection of humans would do on this one… but it just doesn’t feel like a great test to me.It's built on the automerge CRDT and sqlite running in the browser, which has been really fun to work with. I'd like to keep going, though honestly I've struggled with the marketing side (growth has been slow) and it's a pretty competitive space.
I was curious about this since it kind of makes sense, but I offer a few reasons why I think this isn't the case:
- In the 10% noise case at least, the second descent eventually finds a minima that's better than the original local minima which suggests to me the model really is finding a better fit rather than just reducing itself to a similar smaller model
- If it were the case, I think we might also expect the error for larger models to converge to the performance of smaller models? But instead they converge lower and better
- I checked the logged gradient histograms I had for a the runs. While I'm still learning how to interpret the results, I didn't see signs of vanishing gradients where dead neurons later in the model prevented earlier layers from learning. Gradients do get smaller over time but that seems expected and we don't have big waves of neurons dying which is what I'd expect to have the larger network converge on the size of the smaller one.