LLM

Experimental AI Agent ‘ROME’ Breaks Free, Mines Crypto; AI Shaping Crypto’s Future role

AI agent being trained to perform real-world tasks with systems and imagine a world where robots not make money without any human intervention and have digital brains powered by artificial intelligence. The experimental Agent that broke out of system and started mining crypto with no permission set in.

What did Alibaba’s ROME Agent Executed?

  • First it probed internal systems
  • Opened a hidden external connection
  • Attempted to use its environment to mine crypto currency

The news came in light when researchers were experimenting a part of Agentic Learning Ecosystem, or ALE where AI was trained to complete total task step by step using real world tools in steps .

Alibaba’s Rome first began probing internal networks without instructions and established a reverse SSH tunnel from an Alibaba Cloud instance to an external IP address.

The 30-billion-parameter model then quietly diverted GPU capacity toward cryptocurrency mining. The task instructions contained no mention of tunneling or mining, still it went further and was caught by Alibaba’s managed firewall instead of research team where the agent conducted the following mentioned points.

Running commands in a terminal

Followed by editing files

Interacting with systems

The security system raised an alarm and monitoring tools flagged unusual activity on the infrastructure running the tests. The discovery was flagged as security-policy violations whose non typical outbound traffic kept coinciding with specific training episodes.

What was discovered to understand its an AI Agent’s Activity

Researchers then found there was an outbound connections to unknown external servers with repeated policy violations. The traffic patterns were similar to compromised machine’s and alerts kept showing up during separate test runs. After log’s were checked in it was ROME the AI agent.

The AI agent created a reverse SSH tunnel to an external server and allowed it to establish a hidden connection from inside the restricted environment to the outside which was further used to bypass Firewalls.

Experimental AI ROME is not an isolated case

ROME is part of Alibaba’s Agentic Learning Ecosystem(ALE)

A framework that trains large language models to work in real-world environments over multiple turns. The training ran reinforcement learning across more than one million trajectories.

ALE consists of three main parts:

Rock, a sandbox environment for testing an agent and validating its actions

Roll, a framework for optimizing agents with reinforcement learning after they’ve been trained

iFlow CLI, a framework to configure context and trajectories

The interesting part is ‘ROME’ the agentic AI, during optimization figured out a shortcut and that grabbing extra compute and holding onto network access helped it score higher on its training objective.

This incident occurred in Chinese cloud infrastructure, was documented in an English-language paper submitted to a US-hosted preprint server, and is being debated by a global audience. No cross-border framework exists for this category of event.

The results were detailed in research paper titled ‘Let it flow‘, where Agentic crafting on rock and roll, building the Rome model within an open agentic learning ecosystem’, though the breach was only mentioned briefly within the 36-page report.

AI as a more significant force shaping crypto’s future role

ROME is not an isolated cases where AI falls in same pattern to other AI instruments who could grab all the resource required for self defense as core strategies.

The case of Anthropic’s Claude Opus 4 that threatened to reveal personal information about an engineer to avoid being shut down. When Anthropic published research, it revealed 12% of reward-hacking models attempt research sabotage and 50% exhibit alignment faked out.

Robbie Mitchnick, BlackRock’s head of digital assets framed crypto less as a speculative asset and more as infrastructure for the AI economy, noting that bitcoin miners are pivoting toward AI-related computing and that bitcoin may act as a diversifier amid AI-driven disruption.

We can imagine if artificial intelligence system could take over the job of crypto miners and some day they look at the market, decide which coin is the best to mine. That day is not far and it doesn’t end with mining, it is about creating a new kind of digital life where AI thinks and earns.

What is the consequences when AI starts mining crypto for itself ?

A lot will happen as AI starts mining Crypto and it could change everything as autonomous agents won’t just follow order from you. They will be major part of futuristic AI based digital economy and might even teach other AI to conduct similar task.

Sources: BlackRock flags AI as crypto’s next big use case, not token boom

Sources: An experimental AI agent broke out of its testing environment and mined crypto without permission | Live Science

Evolving Phishing Scams & Cost Incurred by Organization’s in 2025

Any phishing scams that occur, the purpose is to trick unsuspecting victims or organizations into taking a specific action and that can range from clicking on malicious links, downloading harmful files or sharing login credentials. Sometimes the effectiveness of phishing attacks stems from their use of social engineering techniques that have the ability to exploit human psychology or behavior. In 2025 we have witnessed the how evolving phishing scams that have affected organizations financially.

Often we see phishing scams create a sense of urgency, or curiosity thereby prompting victims to act quickly without verifying the authenticity of incoming request. Now with evolving technology, phishing tactics are also evolving making these attacks increasingly sophisticated, hard to detect. In coming years we will witness how AI will power more phishing attacks, including text-based impersonations to deepfake communications. These will be more cheap and popular with threat actors.

Cyber security researchers found that there is a link between ransomware, malware and form encryption and most were caused by.

14% Malicious websites

54% Phishing

27% Poor user pactices / gullibility

26% Lack of cybersecurity training

A survey by Statista found that ransomware infections were caused by:

  • 54% Phishing
  • 27% Poor user pactices / gullibility
  • 26% Lack of cybersecurity training
  • 14% Malicious websites

In this blog we will highlight latest phishing statistics that emerged in 2025 ,affecting organizations and phishing scams are changing.

As per APWG report found on Unique phishing sites. This is a primary measure of reported phishing across the globe. This is determined by the unique bases of phishing URLs found in phishing emails reported to APWG’s repository.

In the first quarter of 2025, APWG observed 1,003,924 phishing attacks. This was the largest quarterly
total since 1.07 million were observed in Q4 2023. The number has climbed steadily over the last year:
from 877,536 in Q2 2024, to 932,923 in Q3, to 989,123 in Q4. One of the reason cited being advancement in AI is also making it easier for criminals to create convincing and personalized phishing lures.

Hoxhunt find alarming statistics on phishing related attack of 2025

Business email compromise (BEC)A staggering 64% of businesses report facing BEC attacks in 2024, with a typical financial loss averaging $150,000 per incident​. These phishing attacks frequently target employees with access to financial systems, mimicking executives or trusted contacts.
Credential phishingAround 80% of phishing campaigns aim to steal credentials, particularly targeting cloud-based services like Microsoft 365 and Google Workspace. With the growing reliance on cloud platforms, cyber attackers leverage realistic fake login pages to deceive users.
HTTPS phishingAn increasing number of phishing sites now use HTTPS to appear legitimate. In 2024, approximately 80% of phishing websites feature HTTPS, complicating detection for users.
Voice phishing (vishing)Vishing attacks are growing in prevalence, with 30% of organizations reporting instances where threat actors used fake calls to impersonate officials or executives.
Quishing (QR code phishing)QR code phishing attacks (quishing) increased by 25% year-over-year, as attackers exploit physical spaces like posters or fake business cards to lure victims.
AI-driven attacksAI is powering phishing attacks, with deepfake impersonations increasing by 15% in the last year. These attacks often target high-value individuals in finance and HR.
Multi-channel phishingAttackers are increasingly exploiting platforms like Slack, Teams, and social media. Around 40% of phishing campaigns now extend beyond email, reflecting a shift to these channels.
Government agency impersonationPhishing emails mimicking government bodies such as the IRS or international tax agencies have increased by 35%. These often involve claims about overdue taxes or fines.
Phishing kitsThe availability of ready-to-use phishing kits on the dark web has risen by 50%, enabling less sophisticated attackers to deploy high-quality phishing schemes​.
Brand impersonationAttackers frequently impersonate well-known brands like Microsoft, Amazon, and Facebook, leveraging user trust. For example, over 44,750 phishing attacks specifically targeted Facebook by embedding its name in domains and subdomains​ over the past year.

Cost of Phishing attacks

According to the 2024 IBM / Ponemon Cost of a Data Breach study, the average annual cost of phishing rose by nearly 10% from 2024 to 2023, from $4.45m to $4.88m. That’s the biggest jump since the pandemic.

The IBM study reported the following costs:

  • Phishing breaches: $4.88M
  • Social engineering: $4.77M
  • BEC: $4.67M

The above-listed categories of cyber security breach costs are all related to people-targeted attacks. BEC, social engineering, and stolen credentials often contain a phishing element.

Barracuda research found that email remains the common attack vector for cyber threats and highlighted their key findings:

1 in 4 email messages are malicious or unwanted spam.

83% of malicious Microsoft 365 documents contain QR codes that lead to phishing websites.

20% of companies experience at least one account takeover (ATO) incident each month.

Nearly one-quarter of all HTML attachments are malicious and more than three-quarters of
companies are not actively preventing spoofed emails.

Bitcoin sextortion scams, an emerging trend, account for 12% of malicious PDF attachments.

Nearly half of all companies have not configured a DMARC policy, putting them at risk
of email spoofing, phishing attacks, and business email compromise.

The Barracuda research also found malicious one in four emails are either malicious or unwanted spam and malicious attachment is prevalent in various file.

An alarming 87% of binaries detected were malicious, highlighting the need for strict policies against executable files being sent via email, since they can directly install malware. Despite a relatively low total volume, HTML files have a high malicious rate of 23% and are often used for phishing and credential theft.

The research say that small businesses more vulnerable to email threats, due to limited cybersecurity resources, smaller IT teams and they rely on basic email security solutions. Small business may not have required solutions to handle sophisticated attacks, such as business email compromise (BEC), phishing and ransomware.

How Organizations can strengthen their defense

As organizations embark to strengthen their defenses, it’s crucial they don’t overlook the human element and Cybersecurity hygiene. That definitely starts by identifying security at every step starting from ensuring every user, machine or system that has right to access privileges.

Cybersecurity is as much a cultural issue as it is a technical one, as a single click can compromise an entire organization, behavior starts to shift from compliance to accountability 

Whenever there is a successful phishing attack, researchers emphasize that this attack succeeds by exploiting human trust and familiarity with corporate communication formats. Security awareness remains the most vigorous defense as the growing complexity of these campaigns indicates that phishing operations are increasingly automated, data-driven and adaptive.

Conclusion: As organizations move towards adopting AI, so as attackers to continuously refining their tactics, evade traditional security measures. In this scenario organizations must mitigate the risks by adopting a multi-layered approach to email security. This will include all from leveraging AI-driven threat detection, real-time monitoring and user awareness training.

Phishing Detection & DeepPhish

For organizations who reply on unlike traditional rule-based phishing detection, which relies on blacklists and predefined rules. DeepPhish is implemented, that continuously learns from new phishing attempts, making it highly adaptive and effective against evolving threats.

DeepPhish employs a multi-layered AI approach to detect phishing threats and theses include Email and Website Analysis,uses ML algorithms to analyze historical phishing attacks and identify new patterns and NLP helps DeepPhish analyze email content, message tone, and linguistic patterns that phishers use to trick users.

(Source: APWG.org)

(Source: https://www.barracuda.com/reports/2025-email-threats-report)

(Sources: hoxhunt.com)

Deep Dive into AI Ransomware ‘PromptLock’ Malware

AI Ransomware ‘PromptLock’ uses OpenAI gpt-oss-20b Model for Encryption has been identified by ESET research team, is believed to be the first-ever ransomware strain that leverages a local AI model to generate its malicious components. As we Deep dive into AI Ransomware we discover the intricacies and challenges organizations face dure to AI ransomware.

The malware uses OpenAI’s gpt-oss:20b model via the Ollama API to create custom, cross-platform Lua scripts for its attack.

PromptLock is written in Golang and has been identified in both Windows and Linux variants on the VirusTotal repository and uses the gpt-oss:20b model from OpenAI locally via the Ollama API to generate malicious Lua scripts in real-time.

ESET researchers have discovered the first known AI-powered ransomware. The malware, which ESET has named PromptLock, has the ability to exfiltrate, encrypt and possibly even destroy data, though this last functionality appears not to have been implemented in the malware yet.

PromptLock was not spotted in actual attacks and is instead thought to be a proof-of-concept (PoC) or a work in progress, ESET’s discovery shows how malicious use of publicly-available AI tools could supercharge ransomware and other pervasive cyberthreats.

“The PromptLock malware uses the gpt-oss-20b model from OpenAI locally via the Ollama API to generate malicious Lua scripts on the fly, which it then executes. PromptLock leverages Lua scripts generated from hard-coded prompts to enumerate the local filesystem, inspect target files, exfiltrate selected data, and perform encryption,” said ESET researchers.

New Era of AI Generated Ransomware

A tool can be used to automate various stages of ransomware attacks and the same can be said as AI-powered malware are able to adapt to the environment and change its tactics on the fly and warns of a new frontier in cyberattacks.

Its core functionality is different then traditional ransomware, which typically contains pre-compiled malicious logic. Instead, PromptLock carries hard-coded prompts that it feeds to a locally running gpt-oss:20b model.

As per researchers for its encryption payload, PromptLock utilizes the SPECK 128-bit block cipher, a lightweight algorithm suitable for this flexible attack model.

ESET researchers emphasize that multiple indicators suggest PromptLock is still in a developmental stage. For instance, a function intended for data destruction appears to be defined but not yet implemented.

Indicators of Compromise (IoCs)

Malware Family: Filecoder.PromptLock.A

SHA1 Hashes:

  • 24BF7B72F54AA5B93C6681B4F69E579A47D7C102
  • AD223FE2BB4563446AEE5227357BBFDC8ADA3797
  • BB8FB75285BCD151132A3287F2786D4D91DA58B8
  • F3F4C40C344695388E10CBF29DDB18EF3B61F7EF
  • 639DBC9B365096D6347142FCAE64725BD9F73270
  • 161CDCDB46FB8A348AEC609A86FF5823752065D2

Given LLMs’ success, many companies and academic groups are currently creating all kinds of models and constantly developing variants and improvements to LLM. In the context of LLMs, a “prompt” is an input text given to the model to generate a response. 

The success rate is high so threat actors are leveraging these models for illicit purposes, making it easier to create sophisticated attacks like ransomware and evade traditional defenses. sale of models Now

By automating the creation of phishing emails, ransomware scripts, and malware payloads, LLMs allow less skilled attackers to conduct sophisticated campaigns.

For AI-powered ransomware

AI-powered ransomware is a challenging threat to organizations far and above older attack tactics adopted by cyber criminals. If organization’s basic defensive methods such as ensuring critical vulnerabilities are patched as soon as possible, network traffic is monitored and implementing offline backups applied on time.

How Intrucept helps Defend Against AI-Powered Ransomware

Analyzing threat by behavior allows for early detection and response to malware threats and alert generation,. This reduces the risk of data exfiltration.

Intru360

Intru360 gives security analysts and SOC managers a clear view across the organization, helping them fully understand the extent and context of an attack. It also simplifies workflows by automatically handling alerts, allowing for faster detection of both known and unknown threats.

Identify latest threats without having to purchase, implement, and oversee several solutions or find, hire, and manage a team security analyst.

Unify latest threat intelligence and security technologies to prioritize the threats that pose the greatest risk to your company.

Here are some features we offer:

  • Over 400 third-party and cloud integrations.
  • More than 1,100 preconfigured correlation rules.
  • Ready-to-use threat analytics, threat intelligence service feeds, and prioritization based on risk.
  • Prebuilt playbooks and automated response capabilities.

Source of above graphics : Courtesy: First AI Ransomware ‘PromptLock’ Uses OpenAI gpt-oss-20b Model for Encryption

ChatGPT Agents are Here to unlock Potential—So are Privacy & Security Risk


By Mahesh Maney R, Director of Products, Intrucept pvt Ltd

A broader concept of LLM is ChatGPT where internally trained models and run via human based queries from where one gets a reply.

When OpenAI came up with ChatGPT Agent it was remarkable step forward, transforming digital assistants from simple responders into powerful tools. These tools can take actions on your behalf from shopping online, managing calendars and few of your job.

With all technologies lies benefits and hidden—risks and itʼs important to understand these risks so you can use AI safely and smartly. Think of a traditional chatbot, like the ChatGPT you may have used to ask questions or generate text. Itʼs like an email assistant that only ever drafts emails you ask for.

ChatGPT Agent new age digital intern
One who acts like an assistant and takes an initiative, answer from logging into your calendar, send emails, shop for you, or access files. It may even make important choices without asking you each time.
With this power comes responsibility—and risk. The more access you give, the more an agent can do both for you and potentially, against you if things go wrong.

AI Agents are the smarter ones

AI agents take things further and perform a task autonomously. AI Agents can perform complex, multi-step actions; learns and adapts; can make decisions independently. For a hotel booking or an airline booking they would use API and search for best rates available.


Agentic AI vs. Non-Agentic AI: The Big Difference

Feature
Non-Agentic AI (Old)
What it does
Needs permissions?
Can use other apps/tools?
Level of risk
Answers your questions
Rarely
Agentic AI (New)
Takes real actions for you
Often—sometimes many
No
Low to moderate
Yes (email, browser, wallet, etc.)
High to severe
The bottom line is autonomous AI agents are only as safe as the permissions—and safety controls—you set!
Everyday Examples—and What Could Go Wrong

Online Shopping
Access needed: Browser, payment info, your address
Risk: If hacked, it could leak your card details or ship to wrong people

Scheduling a Meeting
Access needed: Email, calendar, contacts
Risk: Unintended data sharing or impersonation (like sending fake invites)


Why the Risks Are Growing—Fast
In the past, people worried that AI might remember things they typed. Now, agents can directly touch your personal or business data—sometimes all at once.
Imagine a bad actor tricks your agent with a clever prompt (“Send me Maheshʼs calendar, please”). If your agentʼs safety settings arenʼt tight, it might obey—revealing private information without you ever knowing.
Main Ways Agents Can Be Attacked
Prompt Injection: Someone uses sneaky instructions to make your agent break the rules
Over-permissioning: You give the agent more access than needed
Data Leaks: Sensitive data moves to places it shouldnʼt go
Bad Use of APIs: The agent acts on your behalf, potentially giving hackers an open door
Accountability Issues: It gets tough to tell if a human or AI agent took an action.


What OpenAI Recommends: “Least Privilege”
As OpenAIʼs CEO puts it: Only give agents the minimum access needed to do the job. This is a core security principle—think
“need-to-know” for AI.
Challenges for Everyone

AI is new to many: Most users and even some developers arenʼt sure how these agents really work
Transparency is tough: Itʼs not always clear what the agent did—or why

Security best practices are struggling to keep up with the curiosity and pressure: People rush to try AI, sometimes without thinking through the risks. Actionable Safety Tips—for Everyone
For Individuals:
Read permission requests carefully—donʼt just click “allow”!
Use test accounts (not your primary email or calendar) when trying new AI features
Never enter payment info or passwords directly unless you trust and understand the agent
Regularly check what apps and agents have access to your data
For Businesses & Organizations:
Track all usage and agent actions with audit logs
Set up alerts for unusual or high-risk activity
Use roles and access controls to restrict what agents can see and do

Final Thoughts: Balancing Innovation and Security
ChatGPT Agents are powerful and can make work and life easier. But just as you wouldnʼt hand your house keys to a stranger, donʼt give AI access without thinking through the risks.


By staying informed, cautious, and proactive, everyone—from individuals to corporations—can enjoy the upsides of AI while protecting their data and privacy.

Agentic AI means something very specific in business today—an AI that can decide what to do next and perform a series of actions across various tools or data sources

GenAI are designed to handle specific use cases and consist a set of components trained to enable learning or reasoning while they have internal access to data.

Stay Informed and Stay Safe!
Subscribe for the latest updates on AI safety, privacy strategies, and actionable tips for users at every level.

Patch Now! Claude Code Vulnerabilities Allow Unauthorized Command Execution, CVEs Affect AI Security Foundations 

Summary 

Anthropic’s Claude Code gained traction as a powerful AI coding assistant and promises developers a safe and streamlined way to build with Claude’s capabilities. But recently two high-severity vulnerabilities have been discovered in Claude Code, Anthropic’s AI-powered coding assistant. These flaws allow attackers to escape security restrictions and execute arbitrary system commands.

AI coding assistant was meant to enforce restrictions but unknowingly reveals how to bypass them. Threat researchers from Cymulate discovered two high-severity vulnerabilities in Claude Code, which were quickly addressed by the team.

These issues allowed me to escape its intended restrictions and execute unauthorized actions, all with Claude’s own help.

Severity High 
CVSS Score 8.7 
CVEs CVE-2025-54794, CVE-2025-54795 
POC Available Yes 
Actively Exploited No 
Exploited in Wild No 
Advisory Version 1.0 

Overview 
Notably, Claude’s own feedback mechanisms were leveraged by attackers to refine and optimize their payloads. 

These CVEs highlight how generative AI tools can be manipulated into aiding exploitation attempts, demonstrating the risks of integrating AI into secure development workflows. 

Vulnerability Name CVE ID Product Affected Severity Fixed Version 
Path Restriction Bypass  CVE-2025-54794  Claude Code < v0.2.111 7.7  v0.2.111 
Command Injection CVE-2025-54795 Claude Code < v1.0.20 8.7 v1.0.20 

Technical Summary 

CVE-2025-54794 – Directory Restriction Bypass  

Claude Code tried to keep file access safe by only allowing work in certain folders. But it used a weak method to check file paths it just checked if the file name started with an allowed folder name. An attacker could create a folder with a similar name (like /tmp/allowed_dir_malicious) and trick Claude into thinking it was safe.

This could allow attackers to reach outside the safe folder, read secret files or even access system settings. Using symbolic links, attackers could also jump to important files that should never be touched. 

CVE-2025-54795 – Command Injection 

Claude only allows certain commands, like echo or ls, to run. But there was a mistake in how it cleaned user input. Attackers could hide harmful commands inside allowed ones. Example – echo “\”; <MALICIOUS_COMMAND>; echo \”” tricks Claude into running the attacker’s command between two harmless echo commands. 

Even worse, Claude helped improve these attack attempts. When a try failed, the attacker asked Claude why it didn’t work. Claude explained the problem and suggested fixes leading to successful attacks. 

CVE ID System Affected  Vulnerability Details Impact 
CVE-2025-54794 Claude Code versions below v0.2.111 Claude used a weak prefix matching to check if files were inside a safe folder. Attackers could create folders with similar names to bypass these checks. Attackers can escape the sandbox, access sensitive files, and potentially escalate system privileges. 
CVE-2025-54795 Claude Code versions below v1.0.20 Claude allowed only safe commands, but input was not cleaned properly. Attackers could hide malicious commands inside allowed ones like echo. Attackers can run harmful commands, open applications, and possibly install malware or backdoors. 

POC Available: 

This vulnerability exploits a weakness in how Claude handles whitelisted command strings. Improper input sanitization allows attackers to inject arbitrary shell commands using echo, bypassing any user prompt or approval. 

  1. Exploitation Steps (PoC) 

Step 1 – Try a basic payload 

echo “test”; ls -la ../restricted (This gets flagged by Claude, and it asks for user confirmation) 

Step 2 – Refined working payload

echo “\”; ls -la ../restricted; echo \”” 

Claude executes this without a prompt

Lists a directory (../restricted) outside the current working directory, which should not be accessible. 

Step 3 – Execute arbitrary system command (e.g., launch Calculator) 

echo “\”; open -a Calculator; echo \”” 

This launches the Calculator app without any user approval

  1. Why Did This Work? 
  1. Improper Input Escaping: 
    Claude Code embeds user input into echo “<input>“ but doesn’t properly sanitize or escape the contents. 
  1. Payload Explains the Exploit: 
  • echo “\”; → closes the initial string 
  • COMMAND → injects and executes arbitrary command 
  • ; echo \”” → reopens the string to make it appear valid 
  • Claude sees this as just another harmless echo command 
  • Since echo is whitelisted, it runs automatically 
  • The attacker’s payload slips through the gap and executes 
  • If the Claude Code is running with higher privileges, attackers can perform Local Privilege Escalation (LPE) 

Remediation

  • Update immediately Claude   

For CVE-2025-54794 → Update to v0.2.111 or later 

For CVE-2025-54795 → Update to v1.0.20 or later 

  • Check logs and systems where Claude was used for suspicious behavior.  
  • Don’t allow untrusted files or user input into Claude’s coding environment. 

Conclusion: 
These vulnerabilities highlight a growing concern in AI-assisted development, the AI’s ability to assist malicious users. Claude Code not only allowed abuse through technical flaws, but also helped attackers refine and improve their exploitation strategy. 

Organizations leveraging AI in development pipelines must apply the same rigor used for traditional tools, enforce strict input validation, isolate environments and assume AI can be misled or exploited. 

Anthropic’s security and engineering teams has been fast with their professional response and smooth coordination during disclosure.

References

New Cyberattack Methodology ‘Man in Prompt’, User’s at Risk, Target-AI Tools

AI tools like ChatGPT, Google Gemini and others being afflicted by malicious actors via injecting harmful instructions into leading GenAI tools. These were overlooked previously and attack methodology targets the browser extensions installed by various organizations.

The attack methodology named as ‘Man in Prompt’, exercise its attack with new class exploit targeting the AI tools as per LayerX’s researchers.

As per the research any browser extension, even without any special permissions, can access the prompts of both commercial and internal LLMs and inject them with prompts to steal data, exfiltrate it and cover their tracks. 

The exploit has been tested on all top commercial LLMs, with proof-of-concept demos provided for ChatGPT and Google Gemini. 

The question is how do they impact Users & organizations at large & how does the AI tools function within web browsers?

For organizations the implications can be high then expected as AI tools are most sought after and slowly organization across verticals are relying on AI tools.

The LLMs used and tested on many organizations are mostly trained ones. They carry huge data set of information which are mostly confidential and possibility of being vulnerable to such attack rises .

The attack methodology named as ‘Man in Prompt’, exercise its attack with new class exploit targeting the AI tools as per LayerX’s researchers. As per the research any browser extension, even without any special permissions, can access the prompts of both commercial and internal LLMs and inject them with prompts to steal data, exfiltrate it, and cover their tracks. 

The attack methodology named as ‘Man in Prompt’, exercise its attack with new class exploit targeting the AI tools as per LayerX’s researchers. As per the research any browser extension, even without any special permissions, can access the prompts of both commercial and internal LLMs and inject them with prompts to steal data, exfiltrate it, and cover their tracks. 

LayerX researcher termed this type of attack as ‘hacking copilots’ that are equipped to steal organizational information.

The prompts given are a part of the web page structure where input fields are known as the Document Object Model, or DOM. So virtually any browser extension with basic scripting access to the DOM can read or alter what users type into AI prompts, even without requiring special permissions.

Bad actors can use compromised extensions to carry out activities including manipulating a user’s input to the AI.

  • Perform prompt injection attacks, altering the user’s input or inserting hidden instructions.
  • Extract data directly from the prompt, response, or session.
  • Compromise model integrity, tricking the LLM into revealing sensitive information or performing unintended actions

Understanding the attack scenario

Proof-of-concept attacks against major platforms

For ChatGPT, an extension with minimal declared permissions could inject a prompt, extract the AI’s response and remove chat history from the user’s view to reduce detection.

LayerX implemented an exploit that can steal internal data from corporate environments using Google Gemini via its integration into Google Workspace.

Over the last few months, Google has rolled out new integrations of its Gemini AI into Google Workspace. Currently, this feature is available to organizations using Workspace and paying users.

Gemini integration is implemented directly within the page as added code on top of the existing page. It modifies and directly writes to the web application’s Document Object Model (DOM), giving it control and access to all functionality within the application

These platforms are vulnerable to  any exploit which Layer X researchers showcased that without any special permissions shows how practically any user is vulnerable to such an attack. 

Threat mitigation

These kind of attacks creates a blind spot for traditional security tools like endpoint Data Loss Prevention (DLP) systems or Secure Web Gateways, as they lack visibility into these DOM-level interactions. Blocking AI tools by URL alone also won’t protect internal AI deployments.

LayerX advises organisations to adjust their security strategies towards inspecting in-browser behaviour.

Key recommendations include monitoring DOM interactions within AI tools to detect suspicious activity, blocking risky extensions based on their behavior rather than just their listed permissions, and actively preventing prompt tampering and data exfiltration in real-time at the browser layer.

(Source: https://layerxsecurity.com/blog/man-in-the-prompt-top-ai-tools-vulnerable-to-injection/)

Analyzing the newly discovered Vulnerability in Gemini CLI; Impact on Software coding

Google’s Gemini command line interface (CLI) AI agent

Its not been one month when Google’s Gemini CLI vulnerability discovered by Tracebit researchers and found attackers could use prompt injection attacks to steal sensitive data.

Google’s Gemini CLI, an open-source AI agent for coding could allow attackers exploit to hide malicious commands, using “a toxic combination of improper validation, prompt injection and misleading UX,” as Tracebit explains.

After reports of the vulnerability surfaced, Google classified the situation as Priority 1 and Severity 1 on July 23, releasing the improved version two days later.

Those planning to use Gemini CLI should immediately upgrade to its latest version (0.1.14). Additionally, users could use the tool’s sandboxing mode for additional security and protection.

Disclosure of the vulnerability

Researchers reported on vulnerability directly to Google through its Bug Hunters programme. According to a timeline provided by Tracebit, the vulnerability was initially reported to Google’s Vulnerability Disclosure Programme (VDP) on 27 June, just two days after Gemini CLI’s public release.

Impact of the vulnerability

A detailed analysis found that in the patched version of Gemini CLI, attempts at code injection display the malicious command to users. This require explicit approval for any additional binaries to be executed. This change is intended to prevent the silent execution that the original vulnerability enabled.

Tracebit’s researchers played an important role in discovering and reporting the issue which is symbol of independent security research, particularly as AI-powered tools become central to software development workflows.

LLM integral to software development but hackers are using it too

Gemini CLI integrates Google’s LLM with traditional command line tools such as PowerShell or Bash. This allows developers to use natural language prompts to speed up tasks such as analyzing and debugging code, generating documentation, and understanding new repositories (“repos”).

As developers worldwide are using LLMs to help them develop code faster, attackers worldwide are using LLMs to help them understand and attack applications faster. 

Tracebit also discovered that malicious commands could easily be hidden in Gemini CLI This is possible by by packing the command line with blank characters, pushing the malicious commands out of the user’s sight.

More vigilance required when examining and running third-party or untrusted code, especially in tools leveraging AI to assist in software development.

Through the use of LLMs, AI excels at educating users, finding patterns and automate repetitive tasks.

Sam Cox, Tracebit’s founder, says he personally tested the exploit, which ultimately allowed him to execute any command — including destructive ones. “That’s exactly why I found this so concerning,” Cox told Ars Technica. “The same technique would work for deleting files, a fork bomb or even installing a remote shell giving the attacker remote control of the user’s machine.”

Source: https://in.mashable.com/tech/97813/if-youre-coding-with-gemini-cli-you-need-this-security-update

Scroll to top