Readit News logoReadit News
remexre commented on Why are anime catgirls blocking my access to the Linux kernel?   lock.cmpxchg8b.com/anubis... · Posted by u/taviso
tptacek · 4 days ago
Can you flesh that out more? In the case of AI scrapers it seems especially clear: the model companies just want tokens, and are paying a (one-time) cost of C for N tokens.

Again, with Hashcash, this isn't how it works: most outbound spam messages are worthless. The point of the system is to exploit the negative exponent on the attacker's value function.

remexre · 4 days ago
The scraper breaking every time a new version of Anubis is deployed, until new anti-Anubis features are implemented, is the point; if the scrapers were well-engineered by a team that cared about the individual sites they're scraping, they probably wouldn't be so pathological towards forges.

The human-labor cost of working around Anubis is unlikely to be paid unless it affects enough data to be worth dedicating time to, and the data they're trying to scrape can typically be obtained "respectfully" in those cases -- instead of hitting the git blame route on every file of every commit of every repo, just clone the repos and run it locally, etc.

remexre commented on Branch prediction: Why CPUs can't wait?   namvdo.ai/cpu-branch-pred... · Posted by u/signa11
zenolijo · 5 days ago
I do wonder how branch prediction actually works in the CPU, predicting which branch to take also seems like it should be expensive, but I guess something clever is going on.

I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code. Would be a fun experiment to compare the assembly when using it and not using it.

remexre · 5 days ago
https://danluu.com/branch-prediction/ is a good illustrated overview of a few algorithms.
remexre commented on It seems like the AI crawlers learned how to solve the Anubis challenges   social.anoxinon.de/@Codeb... · Posted by u/moelf
logicprog · 9 days ago
> That, or, they could just respect robots.txt

IMO, if digital information is posted publicly online, it's fair game to be crawled unless that crawl is unreasonably expensive or takes it down for others, because these are non rivalrous resources that are literally already public.

> we could put enforcement penalties for not respecting the web service's request to not be crawled... We need laws.

How would that be enforceable? A central government agency watching network traffic? A means of appealing to a bureaucracy like the FCC? Setting it up so you can sue companies that do it? All of those seem like bad options to me.

remexre · 8 days ago
> unless that crawl is unreasonably expensive or takes it down for others

This _is_ the problem Anubis is intended to solve -- forges like Codeberg or Forgejo, where many routes perform expensive Git operations (e.g. git blame), and scrapers do not respect the robots.txt asking them not to hit those routes.

remexre commented on Streaming services are driving viewers back to piracy   theguardian.com/film/2025... · Posted by u/nemoniac
JambalayaJimbo · 10 days ago
The black market is only more competitive because it doesn’t bear the costs of actually creating the content.
remexre · 10 days ago
Alfred Hitchcock's movies aren't missing from Netflix because Netflix couldn't afford to pay for their production.
remexre commented on Lambdas, Nested Functions, and Blocks (2021)   thephd.dev/lambdas-nested... · Posted by u/zaikunzhang
remexre · 10 days ago
Are there any implementations of this lambdas proposal on top of any production-quality compilers? Or are there some updates on how this is doing in committee?
remexre commented on What's the strongest AI model you can train on a laptop in five minutes?   seangoedecke.com/model-on... · Posted by u/ingve
remexre · 10 days ago
Am I missing where the GitHub link is for this, or did the author not release sources? It'd be fun to reproduce this on a different machine, and play around with other architectures and optimizers that weren't mentioned in the article...
remexre commented on I made a real-time C/C++/Rust build visualizer   danielchasehooper.com/pos... · Posted by u/dhooper
boris · 10 days ago
> It also has 6 seconds of inactivity before starting any useful work. For comparison, ninja takes 0.4 seconds to start compiling the 2,468,083 line llvm project. Ninja is not a 100% fair comparison to other tools, because it benefits from some “baked in” build logic by the tool that created the ninja file, but I think it’s a reasonable “speed of light” performance benchmark for build systems.

This is an important observation that is often overlooked. What’s more, the changes to the information on which this “baked in” build logic is based is not tracked very precisely.

How close can we get to this “speed of light” without such “baking in”? I ran a little benchmark (not 100% accurate for various reasons but good enough as a general indication) which builds the same project (Xerces-C++) both with ninja as configured by CMake and with build2, which doesn’t require a separate step and does configuration management as part of the build (and with precise change tracking). Ninja builds this project from scratch in 3.23s while build2 builds it in 3.54s. If we omit some of the steps done by CMake (like generating config.h) by not cleaning the corresponding files, then the time goes down to 3.28s. For reference, the CMake step takes 4.83s. So a fully from-scratch CMake+ninja build actually takes 8s, which is what you would normally pay if you were using this project as a dependency.

remexre · 10 days ago
> What’s more, the changes to the information on which this “baked in” build logic is based is not tracked very precisely.

kbuild handles this on top of Make by having each target depend on a dummy file that gets updated when e.g. the CFLAGS change. It also treats Make a lot more like Ninja (e.g. avoiding putting the entire build graph into every Make process) -- I'd be interested to see how it compares.

remexre commented on What's the strongest AI model you can train on a laptop in five minutes?   seangoedecke.com/model-on... · Posted by u/ingve
gambiting · 10 days ago
Why not? And I'm not being flippant, but like....isn't that the whole point of small models?
remexre · 10 days ago
For one thing, the model is trained on a language modelling task, not a question-answering task?
remexre commented on Claude Sonnet 4 now supports 1M tokens of context   anthropic.com/news/1m-con... · Posted by u/adocomplete
brulard · 12 days ago
I think you misunderstand how context in current LLMs works. To get the best results you have to be very careful to provide what is needed for immediate task progression, and postpone context thats needed later in the process. If you give all the context at once, you will likely get quite degraded output quality. Thats like if you want to give a junior developer his first task, you likely won't teach him every corner of your app. You would give him context he needs. It is similar with these models. Those that provided 1M or 2M of context (Gemini etc.) were getting less and less useful after cca 200k tokens in the context.

Maybe models would get better in picking up relevant information from large context, but AFAIK it is not the case today.

remexre · 12 days ago
That's a really anthropomorphizing description; a more mechanical one might be,

The attention mechanism that transformers use to find information in the context is, in its simplest form, O(n^2); for each token position, the model considers whether relevant information has been produced at the position of every other token.

To preserve performance when really long contexts are used, current-generation LLMs use various ways to consider fewer positions in the context; for example, they might only consider the 4096 "most likely" places to matter (de-emphasizing large numbers of "subtle hints" that something isn't correct), or they might have some way of combining multiple tokens worth of information into a single value (losing some fine detail).

remexre commented on Linux 6.16: faster file systems, improved confidential memory, more Rust support   zdnet.com/article/linux-6... · Posted by u/CrankyBear
remexre · a month ago
> For starters, Linux now supports Intel Advanced Performance Extensions (APX). [...] This improvement means you'll see increased performance from next-generation Intel CPUs, such as the Lunar Lake processors and the Granite Rapids Xeon processors.

This isn't actually right, is it? APX hasn't been released, to my knowledge.

u/remexre

KarmaCake day1477October 12, 2016
About
PhD student at https://melt.cs.umn.edu/

Researcher at https://www.sift.net/

Email is concat ["nathan@", username, ".com"]

View Original