"The 2 May 2023, 6 months later, the regulation started applying and the potential gatekeepers had 2 months to report to the commission to be identified as gatekeepers. This process would take up to 45 days and after being identified as gatekeepers, they would have 6 months to come into compliance, at the latest the 6 March 2024.[8][32] From 7 March 2024, gatekeepers must comply with the DMA. [33]"
It's also stupid and massively wastes computation and traffic on both ends.
Then people would think "but then I'd have to pay money to offer a free API", but they're already doing that in a more expensive way via the web interface anyway.
Legitimate scrapers, maybe. Everyone else does it to circumvent the API limitations, by posing as real traffic. APIs imply API keys which can be traced and banned.
My mental model of LLMs being next-word predictors with a long context window suggests they don't. Are there any papers on this?
Prepending the message with the length means the message is length-limited. Seems standard practice here.
The paper mentions accuracy i.e. (true positives + true negatives) / total examples. And it's actually 100% accurate i.e. there are no false positives _or_ false negatives.
But the big caveats are:
1. this was tested only on 180 examples, which is a very very small dataset to draw conclusions on, and
2. this is obviously an adversarial space so any classifier will be obsolete with the next training run
I'm bearish on any attempt to distinguish real content vs. AI generated content (on any medium, text, image or anything else). This is an adversarial game and the AIs can incorporate your fancy algorithm to fool you better. In the end these projects only end up improving the AI models in terms of realism.
It's hard to imagine a company like Microsoft with so many billions in the bank is so hard up for cash that they're effectively putting billboards up for sale on their primary product. I'm sure Jobs would have had some colorful things to say, if not outright gloating at the vindication.