WelcomeUser Guide
ToSPrivacyCanary
DonateBugsLicense

©2025 Poal.co

1.1K

The researchers report that, of 1,800 snippets from GPT-2, they extracted more than 600 that were memorized from the training data. The examples covered a range of content including news headlines, log messages, JavaScript code, personally identifiable information, and more. Many appeared only infrequently in the training dataset, but the model learned them anyway, perhaps because the originating documents contained multiple instances of the examples.

The coauthors also found that larger language models more easily memorize training data compared with smaller models. For example, in one experiment, they report that GPT-2 XL, which contains 1.5 billion parameters — the variables internal to the model that influence its predictions — memorizes 10 times more information than the 124-million-parameter GPT-2.

While it’s beyond the scope of the work, this second finding has implications for models like the 175-billion-parameter GPT-3, which is publicly accessible via an API. Microsoft’s Turing Natural Language Generation Model, a model that powers a number of services on Azure, contains 17 billion parameters. And Facebook is using a model for translation with over 12 billion parameters.

The researchers report that, of 1,800 snippets from GPT-2, they extracted more than 600 that were memorized from the training data. The examples covered a range of content including news headlines, log messages, JavaScript code, personally identifiable information, and more. Many appeared only infrequently in the training dataset, but the model learned them anyway, perhaps because the originating documents contained multiple instances of the examples. The coauthors also found that larger language models more easily memorize training data compared with smaller models. For example, in one experiment, they report that GPT-2 XL, which contains 1.5 billion parameters — the variables internal to the model that influence its predictions — memorizes 10 times more information than the 124-million-parameter GPT-2. While it’s beyond the scope of the work, this second finding has implications for models like the 175-billion-parameter GPT-3, which is publicly accessible via an API. Microsoft’s Turing Natural Language Generation Model, a model that powers a number of services on Azure, contains 17 billion parameters. And Facebook is using a model for translation with over 12 billion parameters.

(post is archived)