Readit News logoReadit News
rhet0rica commented on Robots.txt is a suicide note (2011)   wiki.archiveteam.org/inde... · Posted by u/rafram
snowwrestler · 14 days ago
Copying my comment from a previous discussion of ignoring robots.txt, below. I actually don’t care if someone ignores my robots.txt, as long as their crawler is well run. But the smug attitude is annoying when so many crawlers are not.

————

We have a faceted search that creates billions of unique URLs by combinations of the facets. As such, we block all crawlers from it in robots.txt, which saves us AND them from a bunch of pointless indexing load. But a stealth bot has been crawling all these URLs for weeks. Thus wasting a shitload of our resources AND a shitload of their resources too. Whoever it is, they thought they were being so clever by ignoring our robots.txt. Instead they have been wasting money for weeks. Our block was there for a reason.

rhet0rica · 14 days ago
I have two related stories.

Googlebot has been playing a multiple-choice flash card game on my site for months—the page picks a random question and gives you five options to choose from. Each URL contains all of the state of the last click: the option you chose, the correct answer, and the five buttons. Naturally, Google wants to crawl all the buttons, meaning the search tree has a branch factor of five and search space of about 5000^7 possible pages. Adding a robots.txt entry failed to fix this—now the page checks the user agent and tells Googlebot specifically to fuck off with a 403. Weeks later, I'm still seeing occasional hits. Worst of all it's pretty heavy-duty—the flash cards are for learning words, and the page generator sometimes sprinkles in items that look similar to the correct answer (i.e., they have a low edit distance.)

On the other hand there was a... thing crawling a search page on a separate site, but doing so in the most ass-brained way possible. Different IP addresses, all with fake user agents from real clients fetching search results for a database retrieval form with default options. (You really expect me to believe that someone on Symbian is fetching only page 6000 of all blog posts for the lowest user ID in the database?) The worst part about this one is that the URLs frequently had mangled query strings, like someone had tried to use substring functions to swap out the page number and gotten it wrong 30 times, resulting in Markov-like gibberish. The only way to get this foul customer to go away was to automatically ban any IP that used the search form incorrectly. So far I have banned 111,153 unique addresses.

robots.txt wasn't adequate to stop this madness, but I can't say I miss Ahrefs or DotBot trying to gather valuable SEO information about my constructed languages.

rhet0rica commented on Lina Khan points to Figma IPO as vindication of M&A scrutiny   techcrunch.com/2025/08/02... · Posted by u/bingden
aianus · a month ago
If I suggest putting your net worth on black at roulette and it lands on black, does that make my advice right?

Khan forced the employees and investors to continue working and gambling on a company they might not have wanted to continue working for or gambling on. It doesn't really matter that the gamble succeeded in this case.

rhet0rica · a month ago
I'm pretty sure no employee wants to work for Adobe.
rhet0rica commented on Figma will IPO on July 31   figma.com/blog/ipo-pricin... · Posted by u/nevir
scarface_74 · a month ago
A majority or at least large minority of Adobe users were/are on Macs.

The Mac version has lived through 68K MacOS pre and post System 7, PPC Mac pre and post OS X, x86 Macs pre and post Carbon support and now ARM Macs. After each transition , there was a limited amount of time that you could use the same version and even a smaller amount of time that you would have wanted to.

But the same argument applies that applies to Figma. It’s a professional tool that should help you generate income far greater than the cost

rhet0rica · a month ago
True, but depressing. Definitely something to add to the FLOSS casus belli...
rhet0rica commented on Figma will IPO on July 31   figma.com/blog/ipo-pricin... · Posted by u/nevir
scarface_74 · a month ago
I looked up adobe credits. Aren’t they just used to buy licensed assets like pictures and videos. But not for the core app?
rhet0rica · a month ago
You're right; unfortunately I can't edit my comment to remove Adobe from it. Though they are plenty guilty of 'adding value' in the worst possible ways.
rhet0rica commented on The HTML Hobbyist (2022)   htmlhobbyist.com/... · Posted by u/janandonly
ravenstine · a month ago
To each their own. At that point, I'd rather just write to myself without publishing so that I can be 110% candid, which I already do by journaling.
rhet0rica · a month ago
It sounds like the presumption that you would do this for money is the problem here—you don't have to "beg for scraps" if it's just a hobby done for fun.

...which is probably the most succinct way of describing where our dear Old Net has gone: swallowed up by the razor-thin margins of the professional creative economy.

rhet0rica commented on Figma will IPO on July 31   figma.com/blog/ipo-pricin... · Posted by u/nevir
scarface_74 · a month ago
Charging more money for features is not enshittificaton. Making the product worse like adding advertisements would be.

A full professional seat is $16 for individual, $55 for organizations and $90 for enterprises. Either price is a nothing burger for a professional tool.

rhet0rica · a month ago
There are plenty of textbook cases of enshittification that are covered by price increases—just look at Adobe and AutoCAD selling credits that are used just to launch the program. As long as it fits with the "claw back value from your customers and partners to feed your investors" pattern, ∂shit > 0.
rhet0rica commented on M8.7 earthquake in Western Pacific, tsunami warning issued   earthquake.usgs.gov/earth... · Posted by u/jandrewrogers
mordechai9000 · a month ago
Has anyone heard how bad it was in Petropovlosk? USGS estimates "severe" shaking with the possibility of moderate to heavy damage and a chance of fatalities.

They have had quite a swarm of quakes there over the last couple of weeks, including one that was M7+ around the 20th.

rhet0rica · a month ago
Severo-Kurilsk, an island town destroyed by a similar tsunami in 1956, lost its port again: https://en.wikipedia.org/wiki/Severo-Kurilsk — the rest of the settlement was rebuilt on higher ground, leaving only the port vulnerable.

The settlement is notable as having belonged to the Japanese in late 19th and early 20th centuries, who once relocated islanders there. Russian Wikipedia says they were Ainu.

Deleted Comment

rhet0rica commented on It's time for modern CSS to kill the SPA   jonoalderson.com/conjectu... · Posted by u/tambourine_man
nektro · a month ago
might wanna consider redesigning your site, it looks like AI spam
rhet0rica · a month ago
Yeah, it's really missing the charm you'd expect from a AI-free spam site. Maybe more flashing text?
rhet0rica commented on “The Bitter Lesson” is wrong. Well sort of   assaf-pinhasi.medium.com/... · Posted by u/GavCo
wasabi991011 · a month ago
Yes, but the Bitter Lesson is about AI, not ML.
rhet0rica · a month ago
Exactly. The expert system era was the first victim of the Bitter Lesson, as it was blown away when backpropagation was figured out at the end of the eighties.

An author familiar with the history of AI would have mentioned this instead of glossing it over as "not a learning model"—dismissing a problem-solving technique because it doesn't use regression serves no constructive purpose.

u/rhet0rica

KarmaCake day246January 12, 2025
About
LONG LIVE THE NEW FLESH

SSL is for cowards

http://rhetori.ca

View Original