I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code. Would be a fun experiment to compare the assembly when using it and not using it.
I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code. Would be a fun experiment to compare the assembly when using it and not using it.
IMO, if digital information is posted publicly online, it's fair game to be crawled unless that crawl is unreasonably expensive or takes it down for others, because these are non rivalrous resources that are literally already public.
> we could put enforcement penalties for not respecting the web service's request to not be crawled... We need laws.
How would that be enforceable? A central government agency watching network traffic? A means of appealing to a bureaucracy like the FCC? Setting it up so you can sue companies that do it? All of those seem like bad options to me.
This _is_ the problem Anubis is intended to solve -- forges like Codeberg or Forgejo, where many routes perform expensive Git operations (e.g. git blame), and scrapers do not respect the robots.txt asking them not to hit those routes.
This is an important observation that is often overlooked. What’s more, the changes to the information on which this “baked in” build logic is based is not tracked very precisely.
How close can we get to this “speed of light” without such “baking in”? I ran a little benchmark (not 100% accurate for various reasons but good enough as a general indication) which builds the same project (Xerces-C++) both with ninja as configured by CMake and with build2, which doesn’t require a separate step and does configuration management as part of the build (and with precise change tracking). Ninja builds this project from scratch in 3.23s while build2 builds it in 3.54s. If we omit some of the steps done by CMake (like generating config.h) by not cleaning the corresponding files, then the time goes down to 3.28s. For reference, the CMake step takes 4.83s. So a fully from-scratch CMake+ninja build actually takes 8s, which is what you would normally pay if you were using this project as a dependency.
kbuild handles this on top of Make by having each target depend on a dummy file that gets updated when e.g. the CFLAGS change. It also treats Make a lot more like Ninja (e.g. avoiding putting the entire build graph into every Make process) -- I'd be interested to see how it compares.
Maybe models would get better in picking up relevant information from large context, but AFAIK it is not the case today.
The attention mechanism that transformers use to find information in the context is, in its simplest form, O(n^2); for each token position, the model considers whether relevant information has been produced at the position of every other token.
To preserve performance when really long contexts are used, current-generation LLMs use various ways to consider fewer positions in the context; for example, they might only consider the 4096 "most likely" places to matter (de-emphasizing large numbers of "subtle hints" that something isn't correct), or they might have some way of combining multiple tokens worth of information into a single value (losing some fine detail).
This isn't actually right, is it? APX hasn't been released, to my knowledge.
Again, with Hashcash, this isn't how it works: most outbound spam messages are worthless. The point of the system is to exploit the negative exponent on the attacker's value function.
The human-labor cost of working around Anubis is unlikely to be paid unless it affects enough data to be worth dedicating time to, and the data they're trying to scrape can typically be obtained "respectfully" in those cases -- instead of hitting the git blame route on every file of every commit of every repo, just clone the repos and run it locally, etc.