Turns out Anthropic literally purchased and scanned millions of print books to train its language model Claude.
The judge finds that the scanning of these purchased books is “fair use” even without licensing, and goes on to say:
Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable. For centuries, we have read and re-read books. We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems.
This is quite interesting, as it draws an analogy between how humans read, comprehend, and write text and how a language model operates.