Compute Theft, Identity Laundering, and Tool-calling in the Wild

A new kind of shadow infrastructure is forming in plain sight

A familiar security story is playing out again, except this time the “server” is a large language model endpoint and the default mistake is not an open database. It’s an open brain.

A joint research effort by SentinelLABS and Censys mapped a global population of internet-reachable, self-hosted LLM deployments running through Ollama. The headline number is the part that gets repeated, but the more important detail is what the number represents: a publicly accessible, largely unmanaged substrate of AI compute that sits outside the guardrails, telemetry, abuse controls, and incident response machinery that come with the big hosted platforms.

This isn’t “AI safety” as in abstract debates about what the model should say. This is “you left a service open.” Ports, bind addresses, unauthenticated endpoints, no rate limits, no logs. The model isn’t the vulnerability. The deployment is. It’s the same old internet problem in a new costume: exposed services. People are making LLM endpoints publicly reachable the way teams used to accidentally expose databases and internal dashboards—easy to do, easy to forget, and discovered quickly once it’s online.

What the researchers actually found

Scale is the hook, structure is the risk

The scan data spans 293 days and includes millions of observations across 130 countries, with roughly 175,000 unique hosts visible at some point in that window. But the risk isn’t evenly distributed across those hosts. There’s a persistent backbone of endpoints that show up repeatedly, stay reachable, and behave like continuously available services rather than someone’s weekend experiment. That backbone is where capability, uptime, and attacker value converge.

The uncomfortable takeaway for operators is that this is not a fringe. It’s a real, measurable layer of infrastructure. And it’s forming without the usual institutional controls that hardened the public cloud over the last decade.

Why Ollama is the perfect accidental internet service

One tiny config change, one huge exposure surface

Ollama is designed to run locally. By default, it binds to localhost on 127.0.0.1:11434. That default is sane. The problem is that exposing it to a network is trivial: one environment variable, and suddenly, the same endpoint is reachable far beyond the machine it was supposed to live on.

There’s no exotic exploit required here. No zero-day. No clever bypass. Just the oldest problem in computing: people make something accessible “for convenience,” then forget it’s accessible, and the internet does what the internet does.

Guardrails were not merely absent

In some cases, they were removed on purpose

The Reuters reporting on the research surfaced a detail that matters for governance: the issue is not only misconfiguration. The researchers also found instances where safety guardrails were explicitly stripped out, and where system prompts were set up in ways that could enable harmful behavior.

The SentinelLABS write-up goes further and describes a pattern of standardized “uncensored” prompt templates that effectively advertise “guard-off” behavior. Even if that count is a lower bound limited by visibility into prompts, it demonstrates intent, not accident.

That distinction matters because it separates two risk classes that need different responses: accidental exposure that you can fix with hygiene, and deliberate de-restriction that you can only address with policy, enforcement, and downstream accountability.

The threat model is bigger than “bad answers”

This is compute theft, identity laundering, and agency gone wild

Hosted LLM platforms spend serious money on abuse detection, rate limiting, fraud controls, and red-teaming. Exposed self-hosted endpoints don’t come with that. The result is a new kind of free compute pool that an attacker can use at scale while someone else pays the bill. SentinelLABS frames this directly as resource hijacking: attackers can push work onto these hosts at near-zero marginal cost.

The second layer of risk is more subtle and, for enterprises, potentially more damaging: the “trusted IP” problem. A meaningful share of exposed endpoints sit on residential and telecom networks. Those IPs are treated differently by many services than obvious data-center automation. If a threat actor routes activity through a compromised or exposed residential LLM endpoint, they get a credibility boost. The model becomes a proxy, and the proxy comes with a human-looking origin story.

The third layer is what turns this from a nuisance into a governance event: tool calling. The research reports that a large portion of observed hosts advertise tool-calling capabilities, meaning they’re not just text generators. They’re endpoints wired to execute functions, access APIs, and interact with other systems. That’s not “chat.” That’s agency. And in security terms, that’s the difference between an embarrassing output and an action that moves money, emails files, or triggers downstream automation.

Criminals are already productizing the exposure

The “discount LLM” market is not theoretical

A parallel thread in recent reporting is the commercialization layer: attackers scanning for unauthenticated LLM endpoints, validating them, and reselling access as a cheaper “API gateway.” If you want a clean mental model, think of it as the stolen-cloud-credentials economy, but for inference endpoints. Even if the details of specific marketplaces evolve quickly, the structural pattern is stable: exposed compute gets turned into a service.

This is the moment where “AI safety” rhetoric stops being useful and boring security language becomes the only language that matters. Attackers don’t care about ideology. They care about throughput.

The monoculture problem

When everyone runs the same few model families, blast radius becomes systemic

The research also points to a concentration pattern that defenders should take seriously: decentralized hosting, but centralized dependency. The same small set of model families dominates across measures, and the same packaging/quantization formats show up again and again. That creates a brittle monoculture: if a vulnerability or a highly effective adversarial technique targets the dominant configuration, the blast radius is not isolated. It’s synchronized.

We’ve seen this movie with web servers, VPN appliances, and logging stacks. The difference is that here the monoculture is tied to models that are increasingly embedded into workflows where outputs can trigger actions.

The governance inversion

Accountability diffuses downward while dependency concentrates upward

The most important conceptual contribution in the SentinelLABS analysis is what it calls a governance inversion. In hosted AI, governance flows through centralized service boundaries: terms, telemetry, enforcement, kill switches. In open-weight distribution, the weights behave like artifacts: copied, forked, quantized, embedded, and deployed into stacks the originating lab will never see. Yet downstream adoption repeatedly converges on a handful of upstream releases.

So the entity with the most influence over what is runnable at scale is upstream, but the entity with the most immediate operational responsibility is downstream, often a person with a desktop GPU and a half-finished firewall rule. That mismatch is where today’s policy frameworks go to die.

What this means for enterprises

If you can reach it, attackers can reach it

If you are experimenting with local models, the immediate lesson is brutally simple: treat an LLM endpoint like any other internet-facing service. If it is reachable outside localhost, you need authentication, network segmentation, egress controls, logging, and rate limits. If it has tool-calling enabled, you need to assume that prompt injection and jailbreak techniques are part of your attack surface, because OWASP has been telling you that for a while and they’re not being dramatic about it.

More importantly, you need to separate “local model experimentation” from “operational service.” The second you bind that endpoint to a public interface, you’re not tinkering anymore. You’re operating infrastructure.

What this means for labs and the open ecosystem

Open weights need an ops-grade release posture

The predictable argument is that downstream misuse is not the upstream lab’s fault. That’s true in a narrow moral sense and useless in a practical one. The real question is duty of care: what harms are foreseeable, what mitigations are practical, and what release guidance is credible for a world with uneven enforcement capacity and a huge population of non-expert operators. Reuters captured that debate explicitly: responsibility becomes shared across the ecosystem, and the upstream still has obligations to document risks and provide mitigation tooling and guidance.

If open models are going to coexist with real-world governance, the ecosystem needs release mechanics that feel less like “here are weights, good luck” and more like mature software security: clearer hardening defaults, documented threat models, disclosure channels, and post-release monitoring signals that don’t depend on centralized platform control.

The EdgeFiles framing

This is not an AI story, it’s an infrastructure story

The most useful way to write this for EdgeFiles is to stop treating it as a debate about “open vs closed” or “freedom vs safety.” Those frames produce heat, not insight.

This is the emergence of an unmanaged AI compute layer that behaves like the early public cloud but without the muscle memory we built the hard way: inventory, ownership, access control, observability, and fast response. The LLM is simply the payload. The real story is that we are recreating the internet’s most expensive security mistakes, except now the exposed service can generate content at scale, take actions via tools, and be resold as a criminal commodity.

That is why this story matters. Not because it’s scary. Because it’s normal.