Nearly a dozen copyright infringement cases have been filed against several AI companies in 2023, but one lawsuit in particular could have wide-ranging implications.
Since OpenAI's late 2022 release of ChatGPT-3.5, a chatbot that answers any random question with coherent answers, its underlying technology, generative AI, has revolutionized the world of technology. While the technology still makes mistakes in its responses, it has improved dramatically and, as the New York Times claims in its lawsuit, trained its large language models on content from the bulk of the Internet.
“The most heavily weighted dataset in GPT-3, Common Crawl, is a ‘version of the Internet,’” and the New York Times is “the most representative proprietary source (and third overall after Wikipedia and the US Patent Documents Database),” the Times said in its initial complaint. , which was filed in the federal court in the Southern District of New York.
The complaint also asserts that the defendants' GenAI tools can “generate output that reads the Times's content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by dozens of examples.”
“The New York Times case has a host of merits,” said Robert Brownis, a professor at George Washington University Law School and co-director of the school's intellectual property program. “The attorneys have relied on some previous lawsuits.” He said one strategic move on the part of the Times was to add Microsoft as a defendant. “The early lawsuits against OpenAI do not involve Microsoft,” Braunis said.
Another element that makes this a better case for copyright issues in the new world of artificial intelligence is the broad scope of copyrighted material that the Times claims has been infringed.
OpenAI has not yet filed a response to the lawsuit, but in a blog post earlier this month, the company said it disagrees with the allegations in the New York Times lawsuit, and that the media giant is not telling the whole story. OpenAI has said that training its large language models using “publicly available Internet material” is considered fair use, but it also provides an opt-out process for publishers, which it said the Times adopted in August 2023.
John Hasnas, a professor and director of the Institute for the Study of Markets and Ethics at Georgetown University, said providing an opt-out feature in the training “is unlikely to have any impact because there will be a lot of copies of Times articles posted online by others that the AI will still encounter.”
Another notable element of the Times case compared to some other lawsuits is the output from ChatGPT mentioned in the lawsuit.
“The New York Times has done a very comprehensive job of identifying not only the large volume of its content used to train AI systems… but also some good examples of the output of these AI systems that are very similar to the content of The New York Times,” Jason said. Bloom is a partner and head of the intellectual property practice at the law firm Haynes and Boone.
The Times is also prepared to spend on lawyers in its lawsuit. Unlike some class actions brought against OpenAI by authors and artists, the New York Times case doesn't need to rely on the hope of getting an award in legal fees or a portion of the damages in the end, said Braunis, OpenAI's chairman. Professor at George Washington University. “This makes it a little bit cleaner,” he added.
Naturally, while he's at the press, this columnist and likely other journalists and content creators are hoping the media giant pulls off a major victory.
However, the issue is not that clear. For example, there is a slightly similar case against Alphabet Inc.'s GOOG.
Google,
The case brought by the Authors' Association in 2005 revolved around its scanning of books for use in the Google Books service, but this case was not successful in the courts. The Court of Appeal ruled that Google's unauthorized scanning of copyrighted books amounted to diversion, with limited public viewing of the text — a key indicator of statutory fair use, which allows limited use of copyrighted material without permission for news reporting, criticism, and educational purposes. And search. In 2016, the US Supreme Court allowed an appeals court ruling in the Google Books case to stand.
In another case, several of its claims were overturned in a case brought by three artists against Stability AI and two other companies, Midjourney, Inc. and DevantArt, Inc., on behalf of the class. The presiding judge significantly limited the suit and granted most of the motions to dismiss, including a finding that two of the plaintiffs did not even copyright the work they were seeking to defend.
“These are really interesting lawsuits,” said Hasnas, the Georgetown University professor. “In a new case like this, it's all just guesswork.” He said an important factor in the Times case will center on how the technology actually works and whether the models are truly absorbing/learning or publishing almost verbatim versions of the stories, which he believes would be copyright infringement. “But I still don't see how the Times can make its case unless Open AI reproduces the content of the Times articles,” he added.
OpenAI recently told Bloomberg that it is in talks with other media organizations to license its content. Those talks include CNN, Fox and Time, even after it held talks that collapsed with The New York Times. The Times said in its lawsuit that it had been in discussions for several months with OpenAI to reach a negotiated agreement to allow its products to be used in new digital products.
In Davos this week, OpenAI CEO Sam Altman told Bloomberg that OpenAI did not want to train its systems on New York Times data. But that is unlikely to quell ongoing legal efforts.
Brownis, of George Washington University, said Altman's comments should be taken with caution. “There are some conditions that OpenAI would be very happy to license [the New York Times content]“Even with the current legal uncertainty about whether a license is necessary,” Braunis said.
Bloom, of Haynes and Boone, added: “Although Open AI's agreement not to use NYT data on a future basis may resolve some issues, it will not necessarily resolve NYT claims about past use and whatever impact that may have had on Current data. and future operations of the model.”
Read also: These big tech stocks are looking to capture the largest share of the AI market in 2024
more: Apple stock looks expensive compared to the rest of the 'Magnificent Seven'