New York Times sues OpenAI, Microsoft for using articles to train AI

[ad_1]

The New York Times sued OpenAI and Microsoft on Wednesday over the tech companies’ use of its copyrighted articles to train their artificial intelligence technology, joining a growing wave of opposition to the tech industry’s use of creative work without paying for it or getting permission.

OpenAI and Microsoft used “millions” of Times articles to help build their tech, which is now extremely lucrative and directly competes with the Times’s own services, the newspaper’s lawyers wrote in a complaint filed in federal court in Manhattan.

“For months, The Times has attempted to reach a negotiated agreement,” the Times’s lawyers said in the lawsuit. “These negotiations have not led to a resolution.”

Spokespeople for OpenAI and Microsoft did not immediately return requests for comment.

The “large language models” (LLMs) behind AI tools such as ChatGPT work by ingesting huge amounts of text scraped from the internet, learning the connections between words and concepts, and then developing the ability to predict what word to say next in a sentence, allowing them to mimic human speech and writing. OpenAI, Microsoft and Google have refused to reveal what goes into their newest models, but previous LLMs have been shown to include large amounts of content from news organizations and catalogues of books.

Inside the secret list of websites that make AI like ChatGPT sound smart

The tech companies have steadfastly said that the use of information scraped from the internet to train their AI algorithms falls under “fair use” — a concept in copyright law that allows people to use the work of others if it is substantially changed.

The Times’s lawsuit, however, includes multiple examples of OpenAI’s GPT-4 AI model outputting New York Times articles word for word. A growing group of artists, authors, musicians, filmmakers and other creative professionals are also pushing back, saying that rich tech companies are using their content to build tools that in some ways are already undermining their work.

Legal experts have said that plaintiffs will have stronger cases of copyright infringement if they can show that AI tools are directly reproducing copyrighted works, rather than paraphrasing the information from them.

Some of these plaintiffs, including blockbuster writers such as George R.R. Martin, Jodi Picoult, Jonathan Franzen and George Saunders, have sued OpenAI. Since August, at least 583 news organizations, including the Times, The Washington Post and Reuters, have installed blockers on their websites to stop tech companies from scraping their articles. But it’s likely that their online catalogues, going back decades, have already been used to create AI tools.

Meanwhile, OpenAI has been negotiating deals with news organizations over the past year to pay them for content. In July, it signed a deal with the Associated Press for access to its archive of news articles. But in October, a spokesperson for OpenAI said the company’s practices do not violate copyright laws, and that the new deals it was working on would only be to access content that it couldn’t get online or to show links or full sections of articles in ChatGPT.

[ad_2]

Source link

Leave a Reply Cancel reply