By Anthony J. Biller, Partner
The Transparency and Responsibility for Artificial Intelligence Networks Act (TRAIN Act), introduced by Senator Welch on November 21, 2024, aims to create an administrative subpoena process for copyright owners to determine if their works were used in training AI models. The bill would allow copyright owners to request subpoenas from U.S. district courts, compelling AI model developers or deployers to disclose information about copyrighted works used in training their models.
This proposed legislation, while ostensibly aimed at protecting copyright holders, would likely prove detrimental for several reasons.
1. Undermining Fair Use: The use of copyrighted material for training AI models should be considered “fair use” under the Copyright Act. Fair use allows limited use of copyrighted material without permission from the copyright holder for purposes such as research, education, and innovation. AI training arguably falls under these categories, as it involves:
– Transformative use: AI training doesn’t reproduce works in their original form but uses them to create new, transformative outputs. The use is for the literal education of an advanced computing, non-human intelligence.
– Non-competitive purpose: Training data doesn’t compete with or replace the original works in the market.
– Potential for public benefit: AI models can lead to significant advancements in various fields, benefiting society at large.
As a “fair use,” there is no public policy justification for directing U.S. district courts to issue subpoenas for legal activities.
The justification of fair use for model training should not be conflated with fair use for model output, however. If an AI model produces or publishes content substantially similar to copyrighted content used in training, the model owners would and should be liable for copyright infringement, unless there is a legitimate defense of fair use independent of the fair use of educating the AI model.
2. Stifling Innovation: The bill would create an enormous administrative burden for U.S. AI developers, potentially slowing down research and development. This could lead to:
– Increased costs and time for AI development;
– Reluctance to use diverse training data, potentially reducing AI model quality and fairness; and
– A chilling effect on smaller companies and startups unable to bear the legal and administrative costs.
The ultimate purpose of copyright protection is to help encourage innovation and creativity by recognizing the rights of the content producer. The bill would allow copyright protection to be used in ways contrary to that purpose.
3. Global Competitive Disadvantage: The TRAIN Act would likely put the United States at a significant disadvantage in the global AI race:
– Other countries without such restrictions could develop AI more rapidly and efficiently.
– U.S. companies might relocate AI development to other countries with more favorable regulations.
– Foreign AI companies could gain a competitive edge, potentially dominating the global market.
4. Practical Challenges: The TRAIN Act also presents several practical challenges, both administratively and legally:
– Determining the exact copyrighted works used in training large AI models is often technically infeasible.
– The volume of potential subpoenas could overwhelm both the court system and AI companies.
– The rebuttable presumption of copying for non-compliance could lead to unfair legal outcomes.
5. Potential for Abuse: Copyright holders might use this process to harass AI companies or extract settlements, even in cases where fair use applies, which presumptively would be nearly all cases.
It warrants mentioning that effectively every person is a copyright “holder.” One does not need a federal copyright registration to “hold” a copyright. Every person who has ever posted a paragraph of content or an image they took to social medial is a copyright owner who might “believe” their content was used as part of training an AI model.
All that a copyright owner would need to invoke the subpoena power is a “subjective good faith belief” that the developer or deployer “used some or all” of copyrighted works to train an AI model. This “empty head but honest heart” minimal standard is, in effect, no standard at all for safeguarding the invocation of federal subpoena power.
If the AI developer or distributor does not timely respond, the copyright holder would be entitled to a presumption of copying and the full remedial provisions of the rules of civil procedure, which include being found in contempt of court and sanctioned.
In short, the TRAIN Act creates an invitation for abuse.
6. Privacy and Trade Secret Concerns: Forcing companies to disclose training data could force them to disclose valuable trade secrets.
Datasets used for training can be a key differentiator for AI companies. Disclosing this information could reveal valuable insights about a company’s AI strategy, potentially eroding their competitive advantage. It would also compromise the developers ability to claim their training models as trade secret.
*****
While the TRAIN Act aims to address legitimate concerns about copyright in the AI era, its approach is likely to cause more harm than good. It fails to recognize the transformative nature of AI training under fair use doctrine, potentially hampering U.S. innovation and competitiveness in this crucial field. Additionally, it presents several practical challenges and creates a legal environment ripe for abuse. A more balanced approach that considers both copyright protection and the unique challenges and opportunities of AI development would be more beneficial for all stakeholders and the United States’ position in the global AI landscape.