The Cause

Is it copyright infringement or a defect in the system when a generative artificial intelligence (AI) system produces a result much like its training data? Following this question, the New York Times recently filed a lawsuit against OpenAI, owners of ChatGPT.

According to the New York Times, OpenAI trained its AI models using more content from its website than any other proprietary source, except for Wikipedia and data sets that included US patent records. However, OpenAI argues that the New York Times’ complaint is “without merit” and that using copyrighted material for training is “fair use.”

What Is At Stake

Analysts believe the lawsuit could attract an out-of-court settlement, resulting in damages, dismissal, or other possible outcomes. Beyond the monetary remedies or injunctions (which can appear as provisional, awaiting appeal, or activated upon unsuccessful appeal), the consequences on society could be enormous.

First, the American legal system might record significant loss if the court’s decision favors OpenAI, stating that teaching AI systems on copyrighted content qualifies as fair use. Mike Cook, a senior lecturer at King’s College, explained using a hypothetical scenario.

He said if you’ve employed AI to respond to emails or compile tasks for you, ChatGPT is a means to a goal in and of itself. However, if the only way to accomplish that is to exempt particular corporate entities from regulations that apply to everyone else, that should alarm us.

The New York Times contends that such an exemption would threaten its economic model. The popular media outlet claims that the long-term effects for it and other journalistic organizations whose work could be used to train AI systems could be disastrous if OpenAI continued training on protected information without any restrictions.

The same applies to other industries, such as film, television, music, literature, and other print media, where protected content is profitable. Conversely, OpenAI stated in documents sent to the communications and digital committee of the United Kingdom’s House of Lords that “training today’s top AI models would be impossible without using copyrighted materials.”

The AI company also mentioned while it might make for an intriguing experiment, limiting training data to resources in the public domain that were made more than a century ago would not produce AI systems that meet current needs.

Copyright Concerns And “Black Box” Systems

While OpenAI has taken action to prevent copyrighted content from ChatGPT and its other products’ output, there are no technological assurances that this won’t happen. ChatGPT and other AI models are called “black box” systems because engineers cannot precisely understand the motivation behind the system’s output generation.

Once training an AI model is completed, there is no way to remove data from The New York Times or any other copyright holder due to this “black box” system and the way top language models like ChatGPT are taught. Given the present technology and procedures, if ChatGPT receives a prohibition from using copyrighted content, OpenAI would have to start over from scratch.

Ultimately, this might be too costly and ineffective to be beneficial. Accordingly, OpenAI promises to keep working to eradicate the regurgitation “bug” while extending relationships to journalism and media organizations.

Possible Scenarios

The loss of revenue potential for models trained on copyrighted materials would be the worst-case scenario for the artificial intelligence firm. Furthermore, AI systems that conduct supercomputer simulations could render generative products like ChatGPT unlawful for commercial release.

Nonetheless, the most unfavorable outcome for copyright holders would be a court decision permitting unrestricted utilization of copyrighted material for training AI systems. Such a decision could allow AI businesses to freely disseminate barely altered copyrighted materials. At the same time, end users become legally liable for modifying these materials when they fall short of the requirements set forth by law to prevent copyright infringement.

George Ward

By George Ward

George Ward is a crypto journalist and market analyst at Herald Sheets, known for his engaging articles on the latest digital currency trends. With a background in finance and journalism, he presents complex topics accessibly. George holds a degree in Business and Finance from the University of Cambridge.