Impact: Attackers leaking process memory to impact 300,000 servers globally.
What is Ollama?
Ollama is an open source solution for running LLMs on local machines and is highly popular among organizations as a self-hosted AI inference engine. Ollama is mainly used to download, manage and interact with models like Llama, Mistral, and others — all running locally on your hardware.
Ollama is now a standard for running open source models locally. It has 170,000 stars on GitHub, over 100 million downloads on Docker Hub and is widely adopted across enterprises.
Vulnerability details:
‘Bleeding Llama’ the bug affects the GGUF model loader, which accepts an attacker-supplied GGUF file containing a declared tensor offset and size larger than the file’s length.
When processing the file, the sensor reads past the allocated heap buffer, accessing memory that may contain sensitive information.
Threat actors can exploit this vulnerability without any credentials — using only three API calls, they can extract the entire heap memory of the Ollama process. An attacker can learn basically anything about the organization from your AI inference — API keys, proprietary code, customer contracts etc.
The mechanism applied makes the vulnerability particularly dangerous. It doesn’t interrupt the system, it just quietly turns it into a data leak.
| Vulnerability Name | CVE ID | Product Affected | Severity | Affected Version |
| Bleeding Llama | CVE-2026-7482 | GGUF model loader | Critical CVSS score of 9.3 | Ollama servers |
GGUF is a file format used to store large language models in a way that makes them efficient to load and run locally.
A GGUF file contains tensors — which are basically multi-dimensional arrays of numbers that represent the model’s learned parameters (weights). Think of tensors as the “brain” of the model — they store all the knowledge the model has learned during training.
general.file_type— this tells you (shocking) the file type of the GGUF, which determines how the numbers inside the tensors are stored.What does the leaked data contains?
The leaked data contains user prompts, system prompts from other models, and even environment variables from the machine running the Ollama server – all highly sensitive information, now exposed with just three API calls. An attacker can learn basically anything about the organization from your AI inference — API keys, proprietary code, customer contracts, and much more.
“The attacker leverages Ollama’s built-in model push feature to exfiltrate the resulting file – complete with stolen heap data – to an attacker-controlled server. The entire attack requires only three unauthenticated API calls”, as per Cyera.
Remediation:
As per Cyera the risk is immense and every organization needs to mitigate this immediately.
Sources: https://www.cyera.com/research/bleeding-llama-critical-unauthenticated-memory-leak-in-ollama
Sources: https://www.echo.ai/blog/cve-2026-7482-ollama-vulnerability