‘Bleeding Llama’ Vulnerability in Ollama Expose Entire Process Memory

CVE-2026–7482 in Ollama enables unauthenticated attackers to leak the entire Ollama process memory.

Impact: Attackers leaking process memory to impact 300,000 servers globally.

What is Ollama?
Ollama is an open source solution for running LLMs on local machines and is highly popular among organizations as a self-hosted AI inference engine. Ollama is mainly used to download, manage and interact with models like Llama, Mistral, and others — all running locally on your hardware.

Ollama is now a standard for running open source models locally. It has 170,000 stars on GitHub, over 100 million downloads on Docker Hub and is widely adopted across enterprises.

Vulnerability details:

‘Bleeding Llama’ the bug affects the GGUF model loader, which accepts an attacker-supplied GGUF file containing a declared tensor offset and size larger than the file’s length.

When processing the file, the sensor reads past the allocated heap buffer, accessing memory that may contain sensitive information.

Threat actors can exploit this vulnerability without any credentials — using only three API calls, they can extract the entire heap memory of the Ollama process. An attacker can learn basically anything about the organization from your AI inference — API keys, proprietary code, customer contracts etc.

The mechanism applied makes the vulnerability particularly dangerous. It doesn’t interrupt the system, it just quietly turns it into a data leak.

Vulnerability Name	CVE ID	Product Affected	Severity	Affected Version
Bleeding Llama	CVE-2026-7482	GGUF model loader	Critical CVSS score of 9.3	Ollama servers

GGUF (GPT-Generated Unified Format)

GGUF is a file format used to store large language models in a way that makes them efficient to load and run locally.

A GGUF file contains tensors — which are basically multi-dimensional arrays of numbers that represent the model’s learned parameters (weights). Think of tensors as the “brain” of the model — they store all the knowledge the model has learned during training.

The header of a GGUF file contains data that describes it, like the version of the GGUF format, the amount of tensors it contains and some key-value metadata.
One metadata field worth mentioning is general.file_type— this tells you (shocking) the file type of the GGUF, which determines how the numbers inside the tensors are stored.
For this research, we only care about F16 (float-16) and F32 (float-32).
After the GGUF header comes a list of tensor objects. Each one stores the tensor’s name, number of dimensions, data type (precision info), and an offset that points to where the actual tensor data lives later in the file.

What does the leaked data contains?

The leaked data contains user prompts, system prompts from other models, and even environment variables from the machine running the Ollama server – all highly sensitive information, now exposed with just three API calls. An attacker can learn basically anything about the organization from your AI inference — API keys, proprietary code, customer contracts, and much more.

“The attacker leverages Ollama’s built-in model push feature to exfiltrate the resulting file – complete with stolen heap data – to an attacker-controlled server. The entire attack requires only three unauthenticated API calls”, as per Cyera.

Remediation:

Organizations are advised to apply the fix as soon as possible and restrict network access to their deployments.
Deploying authentication proxy and network segmentation should improve security.
Run audits instances for internet exposure and consider any instance accessible from the internet, as well as the environment variables and data passing through it, to be compromised.

As per Cyera the risk is immense and every organization needs to mitigate this immediately.

Sources: https://www.cyera.com/research/bleeding-llama-critical-unauthenticated-memory-leak-in-ollama

Sources: https://www.echo.ai/blog/cve-2026-7482-ollama-vulnerability

Tags:API Attackers Audits Bug CISO Cybersecurity GGUF Loader GitHub Intrucept LLMs Ollama Ollama Vulnerability process memory Security advisory

‘Bleeding Llama’ Vulnerability in Ollama Expose Entire Process Memory

CVE-2026–7482 in Ollama enables unauthenticated attackers to leak the entire Ollama process memory.

GGUF (GPT-Generated Unified Format)

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Search

Recent Comments

Recent Posts

Archives