Knowing Is (Only) Half the Battle: The TRAIN Act and the Pursuit of Ethically Sourced AI

Footnotes for this article are available at the end of this page.

Key Takeaways

  • The Transparency and Responsibility for Artificial Intelligence Networks (“TRAIN”) Act would give copyright owners a new tool to determine whether AI developers used their works for model training, but the bill does not resolve whether AI training requires consent, compensation, or licensing.
  • Copyright litigation remains focused on fair use, market harm, and unlicensed copying. Courts continue to reach different conclusions about when AI training and AI-generated outputs may infringe protected works.
  • Market forces may shape AI licensing faster than legislation. Growing demand for transparent, licensed, and auditable training datasets could make “ethically sourced AI” a competitive advantage for developers and enterprise users.

Creators are desperate for transparency. For decades, unclaimed royalties have disappeared into black boxes, licensing deals have been negotiated behind closed doors, and creative works have been scraped at scale to train competing technologies. Copyright law, at its core, promises authors and artists the right to understand and influence how their works are used. Transparency is a necessary step toward that goal, but without accountability, it risks becoming little more than a status update on the erosion of their rights.

The Transparency and Responsibility for Artificial Intelligence Networks (“TRAIN”) Act tries to meet this moment. Introduced in January 2026 and currently sitting in committee, the bipartisan bill would bolt a new administrative subpoena mechanism onto the Copyright Act. Copyright owners could compel artificial intelligence (“AI”) companies to confirm or deny whether their works were used as training data without authorization. On paper, that is a meaningful disclosure tool, but in practice, evidence that a work has been used to train AI is not really the “gotcha” moment bill proponents have made it out to be.

To start, the bill assumes the AI companies will be compliant without providing much in the way of incentives. There are some “teeth,” but they are quite dull. If a subpoena is ignored, that noncompliance would create a rebuttable presumption in future litigation that the developer did in fact copy the work. For aggrieved artists, the promise of an evidentiary advantage rings hollow. They have their evidence; but evidence of what, exactly?

Michael Huppe of SoundExchange touted the TRAIN Act as “an important and necessary tool as [creators] fight to ensure their works are not exploited without the proper consent, credit, or compensation.” The problem is that there is no clear guidance to suggest that artists are even entitled to consent, credit, or compensation when it comes to AI training. At least, not yet.

Training Is Not Copying; It Is Only Evidence of Access

Disclosure in response to a subpoena may establish that a work was ingested, but ingestion is not yet synonymous with infringement. Copyright law was built to protect human authors from human infringers, and humans are constantly “training.” For example, a musician may “scrape” their parents’ record collection, study in a conservatory, or spend years absorbing influences through playlists and live shows. None of that learning triggers infringement, because no fixed copies are being made and stored in a way the law recognizes as unauthorized reproduction. Our brains encode information sequentially and imperfectly, filtered through memory, emotion, and context. The law judges us on our outputs, not our inputs.

AI companies lean heavily on that analogy. They argue that ingesting large quantities of text, images, music, and video is simply a way of internalizing patterns rather than copying in the legal sense. Some courts have been receptive. In Bartz v. Anthropic, Judge William Alsup treated large‑scale model training as “quintessentially” or “exceedingly” transformative, likening it to a reader who studies many books and later writes something new.

The training process, if treated as fair use, becomes largely insulated so long as the system’s outputs stay on the right side of the line. This is where the human/AI analogy starts to break down. AI systems are not just passively consuming works over a lifetime; they are engineered to ingest vast, meticulously assembled datasets — recordings, lyrics, stems, artwork, and text — often through scraping, bulk licensing, or both. They can then generate outputs with digital fidelity at scale and speed. In practice, we have already seen models used to generate soundalike tracks, synthetic vocals, and derivative works that dilute royalty pools and compete directly with the markets copyright is supposed to protect.

Courts are beginning to recognize that difference. Some opinions continue to focus on whether training itself is transformative. Others look more squarely at wholesale copying, the existence of licensing markets, and whether the AI system functions as a market substitute. Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc. is a leading example on the restrictive side. Judge Stephanos Bibas rejected the defendant’s fair use defense and granted summary judgment to Thomson Reuters, finding that ROSS’s use of Westlaw content to build a competing legal research tool was not sufficiently transformative and caused market harm. The message is that when AI systems are built on unlicensed copying to replace an existing product or service, fair use is a harder sell.

The Global Context: One Catalog, Many Rules

The TRAIN Act also sits within a fractured global landscape. The U.S. currently approaches AI training questions primarily through case‑by‑case fair use analysis, without a dedicated statutory exception. Other jurisdictions lean more heavily on text‑and‑data‑mining rules that either permit or condition certain forms of training.

In the EU, Article 4 of the DSM Directive allows text and data mining for lawfully accessible content, but it gives rights holders the ability to reserve their rights in an “appropriate manner,” effectively creating an opt‑out regime for some uses. That structure has fueled ongoing debate: should generative AI training remain subject to opt‑outs, or move toward explicit consent or remuneration models? Japan is often cited as comparatively permissive in its text‑and‑data‑mining exceptions, yet policymakers there have begun to openly acknowledge that generative AI raises questions earlier frameworks did not anticipate.

For global music and entertainment companies, the implication is that AI‑related rights management will not be uniform. It also means that the same catalog may be treated as opt‑out in one territory, license‑required in another, and “maybe fair use” in a third.

Follow the Money: How Markets Could Outrun the TRAIN Act

If doctrine is unsettled and legislation like the TRAIN Act is underpowered, that does not mean AI companies are immune to pressure. Companies are exquisitely sensitive to what customers, investors, and markets reward and there some signs that the market prefers what is touted as “ethical AI.”

Surveys increasingly show that users are uneasy about how AI systems are trained and how data is used. A public-opinion research cited by AI Impacts found that 79% of U.S. respondents support a law requiring companies to be transparent about the data used to train AI.1 A 2024 News/Media Alliance survey likewise reported that 72% of voters support placing limits on AI and expressed concern about the unauthorized use of copyrighted works, suggesting that users care not just about outputs, but about how those outputs are built. Enterprise buyers, regulators, and sophisticated customers are already moving in the same direction, favoring AI tools built on traceable, well-documented datasets whose provenance and licenses can be audited against copyright risk and emerging regulatory frameworks. That demand is reflected in the rapid growth of dataset licensing as a business: one market forecast projects that the global market for AI training datasets will grow from $2.68 in 2024 to $11.16 billion by 2030, while more specialized licensing segments for research and marketing are projected to grow at compound annual rates of 27.9% and 27.3%, respectively.2 In that environment, licensing agreements offer not only legal cover but also a marketing narrative. Like farm to table diners or shoppers looking for free range chicken eggs, consumers are demanding ethically sourced AI.

Viewed through that lens, the TRAIN Act’s disclosure mechanism, while weak as an enforcement tool, could still matter as an information tool. The more creators, consumers, and enterprise customers can see into training sets, the easier it becomes to differentiate between AI products built on licensed, consent‑based practices and those that are not. If public opinion continues to shift toward rewarding the former, transparency and licensing could become profit centers rather than compliance burdens. In other words, even if the TRAIN Act cannot, by itself, force AI developers to obtain licenses, it can help create a market in which voluntarily licensed and verifiable training data is the more profitable choice. Until lawmakers are willing to say what creators are actually owed, the path to ethically sourced AI may run less through subpoenas and more through what people demand — and are willing to pay for — when they decide which systems to adopt.

 

[1] See Overwhelming Majority of Voters Believe Tech Companies Should be Liable for Harm Caused by AI Models, Favor Reducing AI Proliferation and Law Requiring Political Ad Disclose Use of AI.

[2] See AI Training Dataset Services Market Size (2024-2030).