thumbnail
Mohamed Amine Terbah

Clearing Up Misconceptions about Copyright

April 24, 2025
Paul J. Lucas
Paul J. Lucas

Posted on

Clearing Up Misconceptions about Copyright

Introduction

Copyright is often misunderstood, especially when it comes to software in general and training large language models (LLMs) specifically. I keep seeing comments from those who misunderstand copyright, so I’m going to explain it. I’ll also include actual comments from others who misunderstand copyright and refute them.

Disclaimer: I am not a lawyer (IANAL), but I at least know enough about copyright. If you disagree with any of what follows, then, unless you are actually a lawyer specializing in copyright law, your opinion is no more valid than mine. If you are actually a lawyer specializing in copyright law, I’d appreciate any corrections.

What is Copyright?

Quite simply, a copyright is literally the right to copy a work — it’s right there in the name. Whoever holds a copyright on a work has the exclusive right to make copies of it. Anyone else who makes (unauthorized) copies is committing copyright infringement. (A copyright holder can license another entity, such as a publisher, to make authorized copies on the holder’s behalf.) A copyright holder can also license anyone to make copies provided they adhere to certain conditions.

Public Domain

In contrast, when a work is in the public domain, then anybody is free to make copies of the work, modify it, redistribute it, and pretty much do whatever they want with it.

Myth 1: “It’s on the Internet, so it’s in the Public Domain”

False. Being in the Internet in no way puts a work into the Public Domain. This remains true even if the copyright holder makes the work available for free. This is how all open-source software works.

There are many copyrighted books whose PDFs have been uploaded to the Internet. Those are unauthorized copies. The original uploader and all downloaders have committed copyright infringement.

Myth 2: “I can do whatever I want due to Fair Use”

Copyright law includes a “fair use” exception. The intent is to allow you to use excerpts for review, e.g., a book reviewer to quote paragraphs, a movie critic to show clips, a reporter to report on breaking news, among other things. In general, whether a use is fair use isn’t clear-cut and depends on its four factors (as explained via the link).

Copyright law provides an automatic fair use exception for reading a book, looking at a website, playing a movie, etc.

None of those activities have anything to do with fair use. If you buy a book, you’ve obtained an authorized copy. What you do with the book after that is irrelevant. You can read it (or not), put it on a shelf, put it under the short leg of your table, lend it to a friend, sell it, etc. Again, copyright is about the copy, not what you do with the copy.

Assuming the content on a website is an authorized copy to begin with, whether you look at the web site or not is irrelevant. Assuming you have an authorized copy of a movie to begin with, the copyright holder has granted you a license to watch it in your own home. Typically, you are not licensed to show the movie publicly. If you do show it publicly, then you are violating the terms of the license, but not committing copyright infringement. Again, copyright is about the copy. You own the physical copy, but do not hold the copyright; you are licensed to watch it.

So false. You can’t just do whatever you want and claim fair use.

Myth 3: “But I’m not making any money from the copies”

While making money from unauthorized copies certainly makes you more likely to be found guilty of copyright infringement, not making money doesn’t automatically make you innocent. Again, fair use has four factors to consider, only one of which is whether you are making any money.

Training Large Language Models (LLMs)

In recent years, copyright has gotten more attention due to “AI” companies using copyrighted works to train their LLMs. The companies are crawling the internet and feeding copies of whatever they find into their LLMs ignoring copyright. There are three parts to this:

  1. Whether the copy of the work on which the LLM was trained was an authorized copy. In the case of books they find on the Internet, those are most likely unauthorized copies and so they’ve committed copyright infringement.

    Even if they learn from a pirated source, the knowledge is legit and free to use ....

    False. The knowledge is irrelevant. Just making the copy means they’ve committed copyright infringement — again, copyright is about the copy. The fact that they’re using the copy to train their LLM is irrelevant (just like what you do with a book after you buy it is irrelevant).

    In the case of web sites that anyone can read, the copyright holders are asserting that’s an unlicensed use (just like you showing a movie publicly is an unlicensed use). Remember that a copyright and a license are two different things.

  2. Whether the LLMs can be induced to reproduce verbatim copies of the original work, i.e., “regurgitate.” This creates another copy of the original work and so the company is committing copyright infringement. This is part of the New York Times’ lawsuit against OpenAI.

  3. Whether the LLMs can be induced to create works “in the style of” another artist. Generally, “style” is not copyrightable: only specific works are. However, if an LLM can be induced to create a work “sufficiently similar” to an existing work, then it may be considered a derivative work that is also protected by the copyright of the original artist. Alleged copyright violations involving style or derivative works have to be considered on a case-by-case basis.

    Individual characters, if they possess distinctive traits, e.g., Sherlock Holmes or Spiderman, can be copyrighted independent of works in which they appear. So even if an LLM creates a new work in a novel style, it may still have committed copyright infringement if the work depicts a copyrighted character. (This is no different than any human creating fan fiction or art. It’s up to the copyright holder whether they want to enforce their copyright.)

Epilogue

I hope that clears some things up about copyright.

Top comments (2)

pic
Submit Preview
Collapse Expand
 
xwero profile image
david duymelinck
Dropdown menu

The big problem with AI is that the law is lacking. Even big companies can't go after every violation because of the case-by-case process.

I think in the future big AI companies are going to provide easy training tools, so they are not responsible for the content that can be generated.

Like comment: Like comment: 1 Like Comment button
Collapse Expand
 
nevodavid profile image
Nevo David
Dropdown menu

Yep, I always get confused by copyright rules. Felt like people online just wing it sometimes lol

Like comment: Like comment: 1 Like Comment button