warsheep (u/warsheep)

warsheep commented on GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it garymarcus.substack.com/p... · Posted by u/kgwgk

vivzkestrel · 20 days ago

Stupid question but I had to ask, let us say instead of training anything, you start off with a completely different idea. You take a node, fluctuate its weights from -0.1 to 0.1, then you add another node with the same thing, then you add a 100 more nodes, 1000 more, another layer of 1000 more, and then you take the first math problem, tweak weights to get the right answer, take the second math problem, tweak weights to get the right answer, do this a billion times, maybe a trillion. will you eventually reach end up making a GPT?

warsheep · 20 days ago

What you're describing is a simplified version of gradient descent (tweaking the weights) and online learning (working on one sample at a time).

This version will not get you far, you will just train a model that solves the last math problem you gave it and maybe some others, but it will probably forget the first ones.

There are other similar procedures that train better, but they've been tried and are currently worse than classical SGD with large batches

warsheep commented on Making TRAMP faster coredumped.dev/2025/06/18... · Posted by u/celeritascelery

shwouchk · 2 months ago

tramp is great. all the other mentioned solutions are nowhere near as seamless for “just do what i want, without distractions”.

vscode? “trust me bro, i will run a networked daemon on your server”. enjoy wondering which plugins to reinstall on your remote. enjoy installing proprietary shareware+telemetry plugins just to use git. try opening a local file and a remote file side by side in the same window. wifi connection broke for a sec? oops, you have to refresh the whole browser window.

want to edit a single file on a host you rarely connect to? enjoy spending 10 minutes setting up autosync solutions.

with any of the above - oops, you actually need sudo for that file in /etc? yeah, drop to shell and edit in vim.

there are other options to do stuff and for very specific predefined workflows they may win, but the versatility of tramp is still unmatched, especially if you do use emacs.

the only times ive had issues is when i have a weird shell setup on the remote - for that there is /sshx: instead of /ssh:

warsheep · 2 months ago

Not sure how you can compare vscode with TRAMP. A lot of dev work nowdays is done in containers where you install specific versions of dev tools, compilers, etc. Vscode is one click from seamlessly working inside such a container with its dev tools. TRAMP doesn't provide anything like that, right?

warsheep commented on Hezbollah hand-held radios explode, killing three, one day after pager blasts reuters.com/world/hezboll... · Posted by u/JumpCrisscross

_wire_ · a year ago

If the effort of booby-trapping supplies of communications devices was certain enough that enemies combatants would be specifically targeted, why didn't they just stop the shipments? You need possession to plant the trap.

At the superficial level of the news reports about this event, booby-trapping incidental items to render life and limb across a wide field and a diffuse population doesn't sound like legitimate combat under civilized conduct in warfare; it sounds like text-book terrorism in the lexicon of the U.S.

In my view the crucial question for a society is how does action represent our values and sense of responsibility. In such regards, Israel is far gone off its reservation.

warsheep · a year ago

Let me get this straight.

* A west-aligned country is at war with a terrorist organization which is part of the Russian-Iranian-North Korean axis of evil.

* A war which the terrorist organization started unprovoked

* The ally country conducts the most precise strike against militant combatants in history (also completely legal by my understanding of international rules of war)

* Your suggestion is that they should've confiscated their walkie talkies instead

warsheep commented on Ask HN: Does Anyone else working in a crypto company feel this is all a scam? · Posted by u/two_poles_here

mwerd · 4 years ago

How about AAVE and their Arc product specifically? https://www.fireblocks.com/blog/permissioned-defi-goes-live-...

You've got a fully compliant (in terms of anti money laundering), whitelisted counterparties only, decentralized lending and borrowing platform that completely eliminates the friction of a typical corporate treasury banking experience.

You deposit dollars, you earn yield in dollars. If you want exposure to eth or BTC, you can exchange dollars for either or borrow at transparent rates. The transaction settles in seconds and the fees are fractions of what it costs to wire funds.

My organization uses Wells Fargo for similar services (sans BTC and ETH) and pays considerably more for the pleasure of receiving less in return. Aave achieves it with virtually none of the back-office or legacy COBOL based software that these dinosaur, heavily entrenched, ethically challenged financial institutions require. It's like pre-acquisition WhatsApp compared to at&t efficiency comparison.

There's innovation and value creation happening in crypto, whether you choose to see it or not.

The hostility to crypto from a bunch of SV engineers who have scammed society out of billions (trillions?) of dollars pitching ineffective digital marketing is not without irony. Not to mention the societal and political fallout from the uncontrolled spread of misinformation, aided and abetted by the likes of Facebook, Twitter, and other SV darlings. Pushing ads to fuel consumerism and coming on here to complain about emissions from proof of work Blockchains. It's rich.

It's not easy to build software that actually solves large scale problems. Most of the companies/apps in crypto will fail, just like internet startups. What succeeds will likely disrupt the financial system.

warsheep · 4 years ago

Isn't this a circular argument? The question was what "useful" things do these crypto solutions/companies provide, and your example is a product which allows regulated entities to invest in crypto.

"Why is X valuable? Because X allows you to invest in... X."

Also, in this specific example (Arc), is the solution even considered DeFi? There's a centralized list of whitelister entities and you can only participate if you're a customer of these entities. So they're like banks.

warsheep commented on A common mistake when NumPy’s RNG with PyTorch tanelp.github.io/posts/a-... · Posted by u/sunils34

shoyer · 4 years ago

This post is yet another example of why you should never use APIs for random number generation that rely upon and mutate hidden global state, like the functions in numpy.random. Instead, use APIs that explicitly deal with RNG state, e.g., by calling methods on an explicitly created numpy.random.Generator object. JAX takes this one step further: there are no mutable RNG objects at all, and the users has to explicitly manipulate RNG state with pure functions.

It’s a little annoying to have to set and pass RNG state explicitly, but on the plus side you never hit these sorts of issues. Your code will also be completely reproducible, without any chance of spooky “action at a distance.” Once you’ve been burned by this a few times, you’ll never go back.

You might think that explicitly seeding the global RNG would solve reproducibility issues, but it really doesn’t. If you call into any code you didn’t write, it might also be using the same global RNG.

warsheep · 4 years ago

The solution you suggest is irrelevant to the issue mentioned in the article. Even if you use np.random.RandomState, or any other "explicit RNG state", that state will still be copied in the fork() call.

The post just stresses that one should be careful when using random states and multiprocessing, so you should either reseed after forking or using multiprocess/multithread-aware RNG API.

warsheep commented on Web Scraping 101 with Python scrapingbee.com/blog/web-... · Posted by u/daolf

rinze · 5 years ago

Not my project, but if I had to do it I'd try something like the following:

* Set an autoscaling group with your instance template, max instances 1, min instances 0, desired instances 0 (nothing is running).

* Set up a Lambda function that sets the autoscaling group desired instances to 1.

* Link that function to an API Gateway call, give it an auth key, etc.

* From any machine you have, set up your cron with a random sleep and a curl call to the API.

And that should do the trick, I think.

warsheep · 5 years ago

Why use autoscaling and not just launch the instance directly from lambda? The run time is short so there's no danger of two instances running in parallel

warsheep commented on Deep image prior 'learns' on just one image dmitryulyanov.github.io/d... · Posted by u/singularity2001

fpgaminer · 8 years ago

I'm finding it hard to put into words what I find wrong with this paper, but ... here goes nothing.

So, the novel thing here is that an encoder-decoder network applied to an image can learn enough from a source image to be useful. In some ways that's obvious, but the effectiveness of it on reconstruction tasks is certainly surprising.

I have two problems, though. One is that I would take the reconstruction results with a grain of salt. The examples are clearly lab queens, where the occluded regions are not particularly interesting/challenging.

Two is the conclusion the authors reach. Somehow the authors go from the novel discovery I describe above, to saying that somehow the architecture of the network is a prior.

Well ... I mean, yeah a network's architecture _is_ a prior. But it's not actually significant.

See, in the dark ages of machine learning we had only fully connected networks. They sucked. They'd always overfit and underperform or were impossible to train. Then we finally got convolutional networks, and suddenly a whole slew of machine learning problems became easier and that hurdled us into the current renaissance.

But, you see, convolutional networks weren't the _only_ reason for the dawn of this new age. Rather it was three major things: 1) Convolutional layers, 2) more data, 3) more computing power.

Some time after the "discovery" of convolutional layers we found out that, hey, our old fully connected networks actually _do_ work. If you give them enough data and enough computational power, you can get them to perform as well as state of the art convolution networks. The great thing about fully connected networks is that they assume nothing. That means A) you can theoretically get better results and B) you don't have to spend time designing an architecture.

So we already know that architecture isn't ultimately important. You can have a giant, fully connected network, and it _will_ work, if you feed it enough data and have the computational power necessary to train such a beast.

Convolutional layers are just simplifications which make training easier. They are priors in the sense that we know a fully connected layer in image applications would just devolve into a convolutional layer anyway, so we might as well start with a convolution layer. That "design" is the prior. But it's not mandatory; the network would still function without that "prior".

So ... I'm not sure how the authors are taking their research and using it to come to the conclusion that their results are because of some magical property imbued into the network by the "priors" of the architecture.

They apparently tried other architectures and got poor results, and so they use that to claim that architecture is the only reason their technique works.

That's like if you started with ResNet for a classification problem, tried other architectures, saw that they performed worse, and then published a paper saying that Residual Networks somehow embody the fundamental forces of natural images in their architecture, and that's why they work. When the truth is that ResNets aren't special, they are just easier to train.

Another example from the annals of machine learning history: time and time again when there is a breakthrough in architectures, it's usually followed a few years later by a simplification of the architecture. For example we started with networks like VGG which are these big, hand crafted architectures. Slowly over time architectures have become _less_ exotic, instead opting to simply define a basic building block repeated N times.

The reason for this is because in the intervening years we gather more training data, better training techniques, and more computational power. So we can instead use a more homogeneous architecture which has _less_ assumptions (less priors) and at the end of the day we get _better_ results.

I'll repeat that. We put _less_ priors into our networks and we get better results.

So on the one hand we have _all_ of machine learning history telling us that priors in architectures are _bad_. On the other hand we have this paper which makes some really weird logical leap from "we tried a few architectures, they were worse, so architecture is _key_ to machine learning and it's important because we need good priors built into the architecture."

Anyone remember hand crafted feature vectors? I do. Those were priors. Guess what happened when we got rid of them and used generic networks feeding directly from the raw data? Oh right, all of modern machine learning...

warsheep · 8 years ago

> Convolutional layers are just simplifications which make training easier. They are priors in the sense that we know a fully connected layer in image applications would just devolve into a convolutional layer anyway, so we might as well start with a convolution layer. That "design" is the prior. But it's not mandatory; the network would still function without that "prior".

As far as I know this is incorrect. Can you point to a paper that shows this? If by "easier to train" you mean that the models do not overfit training data, then that's the whole point of using correct priors / hypothesis classes.

I'm not sure what bugs you in this paper, but the point is that they decouple the prior architecture from the training/optimization mechanism, and that seems interesting.

warsheep commented on Sheaf Theory: The Mathematics of Data Fusion (audio starts at 10:44) [video] youtube.com/watch?v=b1Wu8... · Posted by u/espeed

mrkgnao · 9 years ago

The "sheaf axioms" define what it means for a "rule" associating "data" to open subsets to be a sheaf, and I was just trying to illustrate them with the example of "F". (Perhaps calling them "properties" or "laws" -- or even an interface or typeclass! -- might help?)

In general, proving that something that looks like a sheaf really is one may be nontrivial. :)

In the special case that I outlined above, it certainly is easy to show that F satisfies those axioms, as you point out. And it is a sheaf (the sheaf of continuous real-valued functions on R) precisely because it does.

warsheep · 9 years ago

Sorry, I meant the claim about f and g. Assuming you meant that F(I) should be continuous functions, you can construct an h from f,g to be cont on I and J, no? So it's not an axiom. Just making sure I understand correctly...

warsheep commented on Sheaf Theory: The Mathematics of Data Fusion (audio starts at 10:44) [video] youtube.com/watch?v=b1Wu8... · Posted by u/espeed

mrkgnao · 9 years ago

So, I can probably try to provide the beginnings of an ELI5 from a math point of view, since the heading really caught my eye: a sheaf is a way of assigning some data to subsets of a space which satisfied some commonsense axioms.

For instance, suppose you have a sheaf F that assigns to every open interval I on the real line the set[1] F(I) of real-valued functions defined on it. If I write F(a,b) for F((a,b)):

* f(x) = x is in all the F(a,b), since it's defined everywhere on the real line

* f(x) = 1/(x-3) isn't in F(2.9, 3.1) or even F(1,5), but it is in F(4,5)

And so on. There's an axiom that says that if you have

* f in F(I)

* g in F(J)

such that f = g everywhere on the intersection of I and J, then there is in fact some function h defined all over I ∪ J (i.e. h ∈ F(I ∪ J)) which you can restrict to I and J to get f and g respectively. So "compatible functions can be stitched together", where "compatible" means "agree on overlaps".

Sheaves give you a language to coherently[0] talk about "partially-defined" functions such as 1/(x-3) above, to stitch them together, and go from the "local" to the "global" picture and back comfortably. This last point is actually a hallmark of mathematics in the last one or two centuries: for instance, consider some equation which you want to find integer solutions for. If you want to show it has no solutions, you can reduce both sides modulo some number and show that there are no solutions mod n, which means it is impossible to find any solutions to begin with. It is a much more deep fact (the Hasse principle) that if you can find solutions mod all n (and a real one), you can always solve the original equation! (I'm fudging a bit here: see [2] for details.)

(Quick plug: I have a short post that talks about these things here[3], as well as another on the "p-adic" numbers that appear in the Hasse principle.)

Sheaves are general enough ("data" can mean[4] almost anything!) that Paul Cohen used them to prove the independence of the axiom of choice[5] from the Zermelo-Fränkel axioms (which is hard set theory) even though they were created for geometry, broadly speaking: in particular, they were one of the tools with which Grothendieck and his collaborators powerfully recast algebraic geometry in the 20th century, giving birth to "scheme theory"[6], which is e.g. vital in modern number theory. (I should probably mention the standard example of Wiles' proof of the Taniyama-Shimura conjecture that settled FLT.)

Cool stuff.

[0]: excuse the pun

[1]: ring, really

[2]: https://en.m.wikipedia.org/wiki/Hasse_principle

[3]: https://mrkgnao.github.io/schemes-i/

[4]: Well, a sheaf can be defined as a certain kind of contravariant functor into a category, which one can think of as the "type" of our data.

[5]: https://en.wikipedia.org/wiki/Forcing_(mathematics)

[6]: https://en.wikipedia.org/wiki/Scheme_(mathematics)

warsheep · 9 years ago

I don't understand why the claim in your second paragraph is an axiom. Doesn't it follow trivially from your definition?