What caused this specific behavior is the dilemma of backwards comparability when it comes to BGP security. We area long ways off from all routes being covered by rpki, (just 56% of v4 routes according to https://rpki-monitor.antd.nist.gov/ROV ) so invalid routes tend to be treated as less preferred, not rejected by BGP speakers that support RPKI.
Ugh. I don't understand this. Especially passive PMTUD should just be rolled out everywhere. On Linux it still defaults to disabled! https://sourcegraph.com/search?q=context%3Aglobal+repo%3A%5E...
More rare cases, but really frustrating to debug was when we had an L2 switch in the path with lower MTU than the routers it was joining together. Without an IP level stack, there is no generation of ICMP messages and that thing just ate larger packets. The even stranger case was when there was a Linux box doing forwarding that had segment offload left on. It was taking in several 1500 byte TCP packets from one side, smashing them into ~9000 byte monsters, and then tried to send those over a VPNish network interface that absolutely couldn't handle that. Even if the network in the middle bothered to generate the ICMP too big message, the source would have been thoroughly confused because it never sent anything over 1500.
The war room meetings were full of managers and QA engineers reporting on how many times they reproduced the bug. Their repro was related to triggering a super slow memory leak in the main user UI. I had the utmost respect for the senior QA engineer who actually listened to us when we said we could repro the issue way faster, and didn't need the twice daily reports on manual repro attempts. He took the meetings from his desk, 20 feet away, visible through the glass wall of the room we were all crammed into. I unfortunately didn't have the seniority to do the same.
Since I can't resist telling a good bug story:
The symptom we were seeing is that when the system was low on memory, a process (usually the main user UI, but not always) would get either a SIGILL at a memory location containing a valid CPU instruction, or a floating point divide by zero exception at a code location that didn't contain a floating point instruction. I built a memory pressure tool that would frequently read how much memory was free and would mmap (and dirty) or munmap pages as necessary to hold the system just short of the oom kill threshold. I could repro what the slow memory leak was doing to the system in seconds, rather than wait an hour for the memory leak to do it.
I wanted to learn more about what was going on between code being loaded into memory and then being executed, which lead me to look into the page fault path. I added some tracing that would dump out info about recent page faults after a sigill was sent out. It turns out all of the code that was having these mysterious errors was always _very_ recently loaded into memory. I realized when Linux is low on memory, one of the ways it can get some memory back is to throw out unmodified memory mapped file pages, like the executable pages of libraries and binaries. In the extreme case, the system makes almost no forward progress and spends almost all of its time loading code, briefly executing it, and then throwing it out for another process's code.
I realized there was a useful looking code path in the page fault logic we would never seem to hit. This code path would check if the page was marked as having been modified (and if I recall correctly, also if it was mapped as executable.) If it passed the check, this code would instruct the CPU to flush the data cache in the address range back to the shared L2 cache, and then clear the instruction cache for the range. (The arm processor we were using didn't have any synchronization between the L1 instruction and L1 data cache, so writing out executable content requires extra synchronization, both for the kernel loading code off disk, as well as JIT compilers.) With a little more digging around. I found the kernel's implementation of scatter gather copy would set that bit. However, our SOC vendor, in their infinite wisdom, made a copy of that function that was exactly the same, except that it didn't set the bit in the page table. Of course they used it in their SDIO driver.
BBN ran them back in the 90s because (IIRC) they pre-dated route reflectors and were impossible to cleanly migrate off of. Other than that, yeah, nobody uses these things. RRs or (rarely) full mesh FTW.
This post the equivalent of "creating a bleeding foot by using only a knife and your foot".
But that's just HN, most people comment on the headline without even clicking the link.
https://news.ycombinator.com/reply?id=40541919&goto=item%3Fi...
https://gist.github.com/atoonk/b749305012ae5b86bacba9b01160d...
AWS adds an extra 5.5M IPv4 addresses(https://github.com/seligman/aws-ip-ranges) - https://news.ycombinator.com/item?id=28177807
I'm also curious if the price will come down over time as addresses are yielded back. I guess it depends on if their goal is to recoup all the money they spent on addresses, or just to avoid running out.
On the other hand, I've enjoyed vibe coding Rust more, because I'm interested in Rust and felt like my understanding approved along they way as I saw what code was produced.
A lot of coding "talent" isn't skill with the language, it's learning all the particularities of the dependencies: The details of the Smithay package in Rust, the complex set of GTK modules or the Wayland protocol implementation.
On a good day, AI can help navigate all that "book knowledge" faster.
With rust, what I see is generally what I get. I'm not worried about heisenbug gotchas lurking in innocent looking changes. If someone is going to be vibe coding, and truly doesn't care about the language the product ends up in, they might as well do it in a language that has rigid guardrails.