Dear Doc:
I keep reading that the New York Times, book publishers, authors, musicians, and others are bringing lawsuits against companies that develop artificial intelligence software, claiming that the AI companies are infringing copyright by using published materials to “train” AI. What’s going on?
Signed,
Genuinely Intelligent
Dear GI:
The authors and publishers object not only to the use of their creative work to train AI, but also to the possibility that, once trained, the AI systems may be capable of producing materials that substitute in some way for these same materials. It is now a simple matter to prompt an AI chatbot to write a poem in the style of a Shakespearean sonnet, or a lyric worthy of Taylor Swift.
AI can create a painting in the style of Van Gogh, write music that sounds like Bach (or Berry), and even create entire movies in distinctive genre-specific styles, using cinematography, lighting, and many other characteristics that are recognizably taken from artists’ work that was used in training. None of this is surprising because the underlying technology of machine learning is just encoding particular statistical patterns present in the training data set and recreating those patterns across other input information.
The Doc likens this process to giving a human student a pile of books to read, all in a particular genre, and then asking the human to write a new book in the same style. Reading is learning, and the patterns that are recognized signatures of the genre are just patterns that repeat more often in this pile of books than in others of different styles. It may be possible that the student’s new book comes too close to one or more of the originals, but our copyright system protects particular expression and not mere style.
In a pair of recent cases, Bartz v. Anthropic and Kadrey v. Meta, judges in federal courts in California considered whether training AI is “fair use” — that is, does feeding content into an AI engine and allowing it to create a statistical model of the content (which does require making a temporary digital copy of it) “transform” the content into something new and different, and is thus permitted. In these cases, the courts have concluded that if the AI company acquires the training materials legitimately, then using them to train AI is not copyright infringement. It is “fair use.”
The courts also looked at the instance where the AI companies used pirated libraries of content. In that instance, the courts did not need to get to the fair use question. The piracy was an infringement, and the companies quickly offered a multi-billion-dollar settlement to the authors and publishers.
For all the settlements and trials, the Doc can assure his readers that the issue of fair use in AI training is not settled, and that we will see lawsuits and appeals for years to come. Still, the training of AI large language models on massive quantities of text (in many languages), graphics, audio of all types, and video will accelerate, and the quality of the output will only get better. It seems unstoppable, and, as always, the legal system will be late to the dance, offering only to move dollars around, rather than providing proactive policy guidance that may help to avoid costly and lengthy disputes.
Are you training an AI? Is your copyrighted material being used to train an AI? Have you ever used an AI? The attorneys at LW&H are on top of all this artificial intelligence, and they do it by being actually intelligent. Give them a call.
Until next month,
The “Doc”
(Not written using any AI at all!)
–Lawrence A. Husick, Esq.



