Наталя Хандусенко AI Eng 16 September 2025, 13:23

Google unveils VaultGemma, its first LLM that preserves data privacy

Companies looking to build larger AI models are increasingly short of high-quality training data. As they actively scan the web for new information for their models, they may increasingly use potentially sensitive user data. A Google Research team is exploring new methods to reduce the likelihood that large language models (LLMs) will “remember” this content.

Leave a comment

Google unveils VaultGemma, its first LLM that preserves data privacy

Companies looking to build larger AI models are increasingly short of high-quality training data. As they actively scan the web for new information for their models, they may increasingly use potentially sensitive user data. A Google Research team is exploring new methods to reduce the likelihood that large language models (LLMs) will “remember” this content.

The output of LLM models is unpredictable, so you never know what they will generate. Although the answers may be different, the models sometimes repeat the information they used during training. If this data included personal information, this could violate the privacy of users. When copyrighted information gets into the training data, its appearance in the answers can cause problems for developers. Differential privacy allows you to avoid this by adding “noise” to the training process , writes Ars Technica.

Adding differential privacy to a model has its drawbacks: it affects accuracy and computational requirements. Until now, no one has tried to figure out to what extent this changes the scaling laws of AI models. The Google Research team assumed that the performance of a model would depend on the ratio between the amount of noise added and the amount of original training data.

By experimenting with different model sizes and ratios of “noise” to data volume, the team established the basic scaling laws of differential privacy, which depend on three factors: computing power, privacy level, and the amount of data. In short, the more noise, the worse the quality of the results, unless more computing resources or data are used. The paper describes these laws, which will help developers find the optimal balance to improve the privacy of models.

This work on differential privacy has led to a new open model from Google called VaultGemma. The model uses differential privacy to reduce the likelihood of “remembering,” which could change the company’s approach to privacy in its future AI agents. However, the model is still experimental.

VaultGemma is based on the Gemma 2 base model, which is one generation behind Google’s latest open model. The Google team used their test results to train VaultGemma with the best differential privacy performance. The model is not very large, with only 1 billion parameters. However, according to Google Research, VaultGemma’s performance is not inferior to models of the same size that do not have privacy features.

The team hopes their research will help other companies efficiently use resources to build private AI models. This is unlikely to affect the largest models, as performance is the most important thing for them. However, the results suggest that differential privacy is more effective for smaller LLMs that are used for specific functions.

VaultGemma can be downloaded from Hugging Face and Kaggle. The model has open weights but is not open source. Google allows you to modify and distribute Gemma models, but only if you do not use them for illegal purposes and always include a copy of the Gemma license with your modified versions.

Microsoft has built free Copilot Chat AI into Word, Excel and other apps

OpenAI has updated its Codex programming AI agent with a new version of GPT-5

Read the country's main IT news in our Telegram

Leave a comment

Text: Наталя Хандусенко Tags: llm, google, ai

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Головоломка киянина Quadline перемогла на фестивалі інді-ігор Google Play

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

Харківська художниця намалювала новий дудл для Google на День Незалежності України

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment