Openai’s models ‘memorized’ copyrighted content, offers a new study

One New research At least AI models are important to trust allegations of some AI models about copyrighted content.

Openai, authors, programmers and companies are charged with programs, programs, codes, etc. Openai has long been claimed Use the exhibition Defense, but in these cases, the bidders claim that there is no carving in the US copyright law to get information.

Researchers by researchers offer a new method of « remembering » by the researchers, the University of Copenhagen University and Stanford University, API and API.

Models are forecast engines. Many of the information have been trained, learn the patterns – they can get anything from essays, pictures and more. Most of the speeches are not verbatim copies of training information, but some of the models are inevitable thanks to « learn ». Image models found Revise the screenshots from trained moviesLanguage models were also observed Effective Plagiarism News.

The research method is reliant on the words that co-authors call the « high surprise » – that is, a larger body of work is the words that differ unusually in the context of the body. For example, in the word of Radar, the word « radar » will be considered a high surprise « radar and me », because the « engine » or « radio » or « radio » is less likely than words.

Co-authors tested several Openai models, including several Openai models GPT-4 And GPT-3.5 has a « to guess » the pieces of the fabrics of art books and the parts of the New York Times, eliminating surprise words and masked words with models. If the models were able to guess the right, it is likely that they memorized the track during the training and closed co-authors.

Openai Copyright Research — A pattern with a model of « guessing » a model is a surprise word at a high speed.Photo credits:Open

According to the results of the tests, the GPT-4 showed the symptoms of books, including books, including popular fiction books, including Popular Fiction Books, including bookcases, including books. The results were also proposed to remember the reminder of the New York Times articles relatively lower than the New York Times articles.

Abhilasha Ravicander, a co-author of doctoral and research at Washington University, TechCrunch said that the findings are light to light in the « controversial data » models.

« To have a great language models that are valid, there should be scientific and checking and checking and checking and checking and check and check, » Ravicander said. « Our work is aimed at providing a means to explore large language models, but there is a real need for more information transparency in all ecosystems. »

Openai has long been attorney Looser Restrictions in models using copyrighted data. The company offers opt-off mechanisms that allow certain content to have licensing transactions and copyright owners to have the priced content that they want to use for the company’s training purposes several government lobbies Coding the rules « fair use » around AI training approaches.

Source link