Discover more from Tech Talk CTO
Resolving the AI Copyright Debate through Open Source Licensing: Classifying AI as Derivative Works
A proposed solution for all parties ...
If we don't sort out the legal issues surrounding AI-generated content properly, it could have serious implications for the future of thousands of jobs.
As such, I believe, It's better to address these issues now rather than later, as the potential damage grows with the increased use of AI across all industries.
Throughout this article, you will see the term "works" frequently used to refer to any content to which copyright can be assigned, including images, artworks, and written text. To avoid confusion, the user of the AI model will be referred to as the "human operator", this is separate from the “creator” of the AI model itself.
For the purposes of this article, we will primarily be discussing the legal issues surrounding AI art generators, as they are currently at the forefront of this debate. However, the proposed solution discussed here is applicable to all AI models, including text-based ones.
A crash course on the AI copyright debate, and a disclaimer
To oversimplify an extremely complicated topic that could fill multiple legal essays on its own, the current legal status of AI-generated content is undecided, with lawyers and members of the public advocating for either side.
One of the high-profile cases (which is not finalized) is how the "US Copyright Office" has been going back and forth on the copyright status of an AI-generated comic (link here)
At its core, the AI copyright debate revolves around three major questions:
Is it permissible (possibly under the concept of fair use) to use copyrighted materials to train an AI?
Can AI-generated works be copyrighted?
If so, can an AI model be co-assigned copyright of a generated work (e.g. an artwork) along with its "human operator"?
In general, these are the major arguments for AI.
Proposition 1: AI learns and creates (words, music, art, etc) like a human
Proposition 2: AI is just a tool
With the following 3 major opposition
Opposition 1: Only "real" humans are eligible for copyright
Opposition 2: Fair use is not automatic and is evaluated on a case-by-case basis
Opposition 3: Lack of consent, from data sources used in the dataset
There are many additional arguments for and against AI that have been made by both sides, but they are not included here because they are either not as significant as the key points or they are based on ethical considerations rather than existing legal codes.
For example, one notable argument that was left out: is the "for progress" proposition. On how AI should be allowed to learn from copyright materials to encourage progress in society, and that resisting such progress is harmful to society. This is an argument based on ethical and societal considerations and not an existing legal code.
As I am skimming the details of the debate, if you want a more in-depth read, you can consider theverge article - The scary truth about AI copyright is nobody knows what will happen next
Finally, to state the obvious, I am not your lawyer, and the following is not legal advice.
Also, copyright laws differ drastically from country to country (Japan, for example, does not fully recognize fair use).
Now with all of that, out of the way ...
A Proposal to Classify AI-Generated Content and AI Models as Derivative Works with Attribution*
*with various limitations depending on the process used in each step
More specifically, the following sequence is proposed:
Dataset as a collection of work, from its individual data sources*
AI Model as derivative work, from the dataset, together with the AI model creator
If an AI model is used to train another AI model (instead of a dataset), it is classified as a derivative of the base model, its training data, and its trainer model.
AI Output as derivative work, from the AI model, together with the "human operator" who provided a "prompt/instruction"*
This proposal allows us to divide the copyright problem into distinct steps and rely on existing legal frameworks and case laws, particularly the legal framework revolving around "Collaborative authorship" of open-source code. It also helps to answer the three questions distinctively:
In this framework, it would also allow answering the 3 questions distinctively.
Is it permissible, under fair use, to use copyrighted materials to train an AI?
No, by default, unless it allows derivative work*
Can generated works, be copyrighted?
If so, can an AI model be co-assigned copyright of a generated work (ie. artworks), together with its human author (user of the AI)?
Kind of. It would be attributed to*
This proposal avoids being stuck in the debate of whether an AI performs the creation of work "like a living being."
It also helps to create a clear path for all parties involved in a fair way:
It gives them control over their own unique brand name, and art style, allowing them to commercialize it where possible
It does not stop AI from being used to generate artwork in known older art styles
For AI Model Creators
It provides a legal framework for them to create and commercialize AI models safely
For Human Operators
It allows them to generate AI artwork
It allows them to potentially commercialize such artworks for their own personal or commercial uses.
Understanding permissive license (usually with attribution clause)
One major factor that is often overlooked in the debate is how copyright is not binary, and how permissive licenses lie between full restrictive copyright and no copyright.
This proposal intends to take into account how permissive licenses are treated in the process and intends to use such licenses as a means to navigate the legal debate.
Because permissive licenses have been a hallmark in the open-source copyright movement for software coding (e.g. MIT license), this will help lay the groundwork for the proposed approach to existing case laws. (The list of open-source licenses can be found here).
Outside of the software industry, the Creative Commons is its most prominent variant, with over 2 billion works tracked in circulation.
Broadly speaking, permissive licenses are used within three major categories:
1. Content that cannot be used: Share Alike, and No-Derivatives
No Derivatives is as obvious as it gets, No-Derivatives
Share Alike, or CopyLeft licenses, require the existing license to be copied over into the license of the derivative work. While technically this is possible within smaller focused datasets, for practical purposes this should be avoided due to the logical challenge when merging multiple "CopyLeft" licenses.
BY-NC-SA, for software code, an example would be the
It is worth reminding the tech bros that no consent (to make derivatives) means no consent.
2. Content that can be used: Typically with attribution
Derivative and distribution are allowed with attribution
BY, for software, an example would be the
Derivative and distribution are allowed even without attribution
Works in Public Domain, or licensed under Creative Commons Zero
3. Content that can be used: For non-commercial use cases
Distribution and derivative with attribution, limited to non-commercial use.
Placing an early prediction, there will be a derivative of the Creative Commons license which specifically prohibits the usage with a dataset or AI training.
BY-NADD(No AI nor dataset derivatives)
Important Note: Limitations MUST be carried forward
Permissive licenses are typically written in a way that enforces derivative works, and any collections including their work, or its distribution, to have the same limitation (attribution, non-commercial requirements).
Measures must be taken to ensure the same limitations apply to the dataset, AI model, and their generated content.
Failure to do so would break the current proposal, due to potential loopholes that may allow commercial companies to fund the creation of public datasets under "fair use" only to eventually use them commercially.
LAION.AI - was commissioned to build and open-source a "Non-Commercial" dataset
Stability.ai - built an AI model, based on the LAION.AI dataset, without the "Non-Commercial" restriction, and released it to the public
Dreambooth.ai, and countless other commercial AI art generators - used the model to build their paid commercial AI art generator.
The end result is, despite starting out in a way that "does not create a competing product," due to its open-source nature, it ends up as a tool, that can be used in commercial settings to replace specific living artists identified by their name, who had their images scrapped within the dataset without consent.
How does it work in practice? - when making AI models
By classifying, downstream datasets, and AI models, as derivative works of their upstream content, copyright can be respected across a chain.
This is similar to how open-source projects work in the software industry, allowing existing laws to work on each layer.
Dataset layer: Will be based on existing web-scrapping laws
AI Model layer: Because the final AI Model can be treated as "Code" (it literally can be exported as code), existing open-source laws can handle it. A clear legal path will clear the way for commercial adoption
AI Output layer: If we treat the AI model as code, with licensing restrictions that force attribution to its output, it falls into place with several existing code licensing models.
In all cases, attribution needs to be maintained up the chain.
For example, all AI models must attribute their datasets, which must in turn provide attribution for their sources.
Once this is established, there will be three major types of AI models:
Commercial use-friendly AI models, which can be used in commercial settings
Creative Commons `BY` or something similar
Non-Commercial use AI models, which inherited the various "No commercial use" restrictions from the original data
Creative Commons `BY-NC` or something similar
Privately licensed AI models, which will typically be based on a "Commercial use friendly AI model" and finetuned for specific licensed use cases
For right holders (ie. artists), who do not want any of their works to be used to train an AI model, or to be used by others as derivative work. They decide and stay in control, via the licensing terms.
For rights holders, who do allow derivative works, they get to decide if it's restricted to non-commercial uses (non-competing).
How does it work in practice? - when using AI models
Using the AI model is more nuanced. Depending on if the AI is adding IP value, or not, there are two major categories that need to be outlined.
AI as an Intellectual Property generator
The general rule of thumb is that the output would be the derivative of all three participants: the AI model, the human operator (via a prompt typically), and the input material (if any, for example, editing or redrawing an existing image).
This is because the final output would literally be a derivative of all three inputs.
If any of the three enforces a copyright clause (ie. attribution, no commercial use), so will the derivative output. If the AI model follows the Creative Commons model, and the "Human Operator" has full rights to the input, the "Human Operator" would have full rights to the output, with the attribution clause limitation (and the no-commercial clause for no-commercial use AI).
If it's just the "Human Operator" and the AI, the above structure still applies.
AI as a reduction / transformation tool
This is currently limited to the AI text model (as far as I know), where the AI is used in a way that does not add IP value. For example in "Short Summarization, Labeling, or Factual Extraction".
In this case, if the output has no additional content added and is simply the reduction of the input, or transformation (eg. translation), per the user instructions, and was done so in a way that is clearly mapped to the instruction - the AI has arguably not "added any IP or context".
As such, in this case, the AI model should be treated as if it was a tool/extension of the human operator. And the output copyright should be as if the "Human Operator" performed the said instructions manually. (Need legal feedback on this segment)
Will this stop AI from taking artist jobs?
Quite frankly, No, the bar has been raised for being an artist.
If the artist's primary job was commissioned to render artwork in a commonly known style, that is out of copyright (eg. Vincent van Gogh style) or is very common in the public domain.
However, if the artist were to make or remix a newer style, that is unique and distinctive to them.
This prevents their new style or likeness, from being copied. And allows them to commercialize and be rewarded for their style on their terms. With time, like all artworks - it enters the public domain.
Ultimately, allowing them to make a name and career for themselves, for their unique style.
Into this, strange new AI world
Alright, what's next?
As I am also neither a professional artist, lawyer, nor politician. All of this may just end up being a programmer screaming into the crowd.
However, if this approach makes sense to you, especially if you are an artist, do feedback, forward it to as many legal friends, and start organizing a movement towards this.
For any lawyers who wish to refine this to a legal paper, please do so, reach out to me, and provide credit.
The only paper I can find exploring this angle is "AI Derivatives: the Application to the Derivative Work Right to Literary and Artistic Productions of AI Machines"
At the end of the day, I do wish that we all steer in a direction where, both AI and artists, can work together hand in hand. Either with what's proposed, or any other alternative proposal that respects both sides.
~ Until next time 🖖 live long and prosper 🚀
Eugene Cheah @ tech-talk-cto.com
For more on how AI works by dreaming,
All Images used, with their proper attribution