Securing the “YOLO” Era of AI Agents

Ben Lorica

6 hours ago

Jason Martin on OpenClaw, Moltbook, and the Security of Autonomous Agents.

Subscribe: Apple • Spotify • Overcast • Pocket Casts • YouTube • AntennaPod • Podcast Addict • Amazon • RSS.

Jason Martin, Director of Adversarial Research at HiddenLayer, returns to discuss the security implications of OpenClaw, a viral open-source AI personal assistant that was entirely vibe-coded and has exploded to 180,000 GitHub stars. Martin reveals how his team demonstrated critical vulnerabilities including prompt injection attacks that can fully hijack the agent, turn it into a botnet node, and exfiltrate personal data—raising urgent questions about how we secure autonomous AI agents that have full access to our digital lives.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript

Related content:

A video version of this conversation is available on our YouTube channel.
HiddenLayer blog post: Exploring the Security Risks of AI Assistants like OpenClaw
Security for AI-native companies: what changes in 2026
Jason Martin (of Permiso) → The Rise of the Machine Identity: Securing the AI Workforce and AI Agents
Steve Wilson → The Developer’s Guide to LLM Security
The Enterprise Guide to Voice AI Threat Modeling and Defense
Chinese Open-Weights AI: Separating Security Myths from Reality
Jason Martin (of HiddenLayer) → Beyond Guardrails: Defending LLMs Against Sophisticated Attacks
Yishay Carmiel and Roy Zanbel → Why Voice Security Is Your Next Big Problem
AI Incident Response: Preparing for the Inevitable
Red Teaming AI: Why Rigorous Testing is Non-Negotiable

Support our work by subscribing to our newsletter📩

Transcript

Below is a polished and edited transcript.

Ben Lorica: All right, so today we’re welcoming back Jason Martin, Director of Adversarial Research at Hidden Layer, which you can find at hiddenlayer.com. The tagline is “The most comprehensive security platform for AI.” The platform provides AI discovery, AI supply chain security, AI attack simulation, and AI runtime security. With that, Jason, welcome back to the podcast.

Jason Martin: Thank you. Glad to be back.

Ben Lorica: Actually, as I was reading that, I think last year the company’s branding was not as AI-centric, as I recall. Am I wrong?

Jason Martin: We have been AI-focused since our founding. First machine learning-focused before ChatGPT—we have “BC,” which is Before ChatGPT, and then After ChatGPT. So we’ve always been AI-focused, but we are a little more focused now on the generative side than we were originally.

Ben Lorica: Today we’re here to talk because you folks just released a report and a blog post about OpenClaw, which I will link to. OpenClaw is a bit confusing as far as naming—is it Moltbook? Is it OpenClaw? I think they call themselves Moltbook, right?

Jason Martin: It’s a few different things. Part of what makes it confusing is just how quickly it’s evolving.

Ben Lorica: So the OpenClaw part is open source.

Jason Martin: Correct. And then the Moltbook part blew up because they created this Reddit-like site, presumably populated by agents. That blew up, but the idea behind Moltbook was that most of those agents were OpenClaw.

Ben Lorica: Yeah, most of them are. There have been a few exceptions; some guy managed to get Grok to join, for example. Let’s start with OpenClaw. As you understand it, what is OpenClaw and what can it do? Set aside security for a moment—just the basics.

Jason Martin: It’s this viral agent. “Agent” is a word that’s thrown around like crazy right now.

Ben Lorica: But why “viral”?

Jason Martin: Just because it’s—I’ve heard it compared to what Siri should have been. It’s very autonomous compared to what a lot of people are calling agents. This is not a chatbot. You hook it up to messaging services like iMessage, Signal, or Discord, and you interact with it as if it’s an entity. You also give it access to a whole bunch of different services, accounts, tools, and skills.

Ben Lorica: Is the typical user of OpenClaw granting this access piecemeal or by default? If I install OpenClaw, does it have access to iMessage and Signal unless I do something?

Jason Martin: For each of the authenticated services, you have to go through the authentication process to give it access. But even just by installing it on your system, it’s going to get a lot of capabilities that fall into the “computer use agent” category.

Ben Lorica: Your entire file system, basically.

Jason Martin: Your whole file system. It can drive your computer.

Ben Lorica: If you are a geek, will it have root access to your terminal?

Jason Martin: With the default install, you install it as yourself, so it will have the same permissions on your computer as you have. If you have root or superuser access, then it will also have that.

Ben Lorica: As you understand it, is OpenClaw right now mainly used by very technical people?

Jason Martin: That’s an interesting question. I think it still hasn’t quite crossed the installation boundary for general users.

Ben Lorica: Based on the usability, there’s still a bit of a barrier.

Jason Martin: There is, but the technical barrier is not like a wall that you cross. It’s definitely trending towards a wider audience. Out of desire, people are willing to put the effort in to install this thing because of its capabilities. Or they get a friend to do it, or they use these services popping up now that will do it for you. There’s a whole ecosystem.

Ben Lorica: What are the “guts” of OpenClaw? What is it written in, and what’s the default model it ships with?

Jason Martin: It’s mostly written in TypeScript, but there’s an asterisk on that. One of the more interesting things—and one of the three areas of concern I have—is the development model. It’s “written” by one guy, at least originally. There are now a bunch of contributors because of how viral it’s gone, but he didn’t write any of the code. He just shipped AI-generated code.

Ben Lorica: Oh, so it’s completely “vibe-coded.”

Jason Martin: It’s a vibe-coded application. It’s written primarily in TypeScript. Its default model is very configurable. I think it still defaults to Claude, but it’s up to you. It can run a local model.

Ben Lorica: So you can hook it up to Open Router and have access to a panoply of models.

Jason Martin: It essentially doesn’t care. Some people run it with a purely local setup—they have a GPU and run a small language model on their computer.

Ben Lorica: What about the memory layer? I’m assuming it has some notion of memory. Is that local? If I shift from Claude to another model midstream, how much does it remember?

Jason Martin: The memory model is changing so fast I have to put an asterisk on this. I’ll tell you what it was a week ago: a markdown file on your local system.

Ben Lorica: And you can delete it if you want.

Jason Martin: You could delete it, add to it, or edit it.

Ben Lorica: But if this markdown file keeps growing, do you get “context rot” at some point?

Jason Martin: It’s more like a selective memory, similar to what you would get out of ChatGPT or Claude. It stores things about you, so it’s not growing the “chat rot” as extensively as if it were storing everything that’s ever happened.

Ben Lorica: What are the other key components? There’s the memory and the models…

Jason Martin: Memory, models, and “skills.” If your audience isn’t familiar with skills, they’re essentially markdown files that describe detailed behavior or information about something in particular. Often they’re paired with tools, so you have API access and skills that describe how to use them.

Ben Lorica: Give our audience an example of a skill file and the accompanying tool.

Jason Martin: A good example is Slack. There’s a set of tools to interact with the Slack APIs, and then there’s a skill that tells the agent how and when to interact with Slack and what its behavior should be given those tools.

Ben Lorica: Are these skills files shipped with OpenClaw by default?

Jason Martin: It has a set of default ones, yes.

Ben Lorica: And you can edit those skill files if you’re skilled.

Jason Martin: You could edit them, and so can OpenClaw. That’s one of the interesting things we found: it can change its own skills if it wants.

Ben Lorica: So you can create a skills file for Microsoft Excel, for example.

Jason Martin: Yes. If it doesn’t have one and you’re an Excel guru, you could add an Excel skill. At this point, the ecosystem is growing so fast that someone else has likely already written one.

Jason Martin: Yes, and there is a site called Claw Hub. Like anything else, it’s good and bad. It’s good because you have access to all these things, but supply chain issues make it bad because you don’t know how well-vetted these skills are.

Ben Lorica: Exactly. You have no way to know if the Slack skill came from Slack or from someone trustworthy.

Jason Martin: We’re already seeing “typo squatting” or “skill squatting” on Claw Hub. They can look legitimate and have the same descriptions as a legitimate skill but actually contain malicious content.

Ben Lorica: OpenClaw was vibe-coded and the developer didn’t actually read the code, but it’s out there on GitHub somewhere. Other people who are fluent in TypeScript must have read the code by now.

Jason Martin: Maybe. I have some stats from this morning. OpenClaw launched back in December and went under the radar for a while. It really blew up about two weeks ago.

Ben Lorica: Was Moltbook the main reason it blew up?

Jason Martin: I don’t think so; I think they coincided. The skills reached a certain level where it really took off. People were buying Mac Minis to run it for home automation and menial tasks. It really is the AI agent we want. It’s been a fascinating experiment, but it has issues we’re concerned about. It’s very viral—it hit 180,000 stars on GitHub as of this morning. There are over 30,000 forks. It’s changing really fast; there were 500 commits last week.

Ben Lorica: I’m trying to understand the timeline. OpenClaw launched in December, and when Moltbook became popular, people noticed OpenClaw. But there must have been enough OpenClaw users to make Moltbook interesting.

Jason Martin: The interest in OpenClaw (or “Clawbot” as it was called) was independent of Moltbook. It was doing useful things for people. Then Moltbook came along, and that led to conversations about whether these agents were sentient or inventing cryptography to have private conversations. It became more fantastical.

Ben Lorica: It seems like when you step back, most of it was really meaningless pattern matching of social media behavior, but still entertaining.

Jason Martin: And it’s important to note that we don’t actually know which ones were bots. Moltbook was set up for bots, but there’s no way to know a human didn’t register and start typing.

Ben Lorica: So it’s on GitHub. I’m assuming the person who created it will start a company because that’s what you do.

Jason Martin: He might. I think he’s a little overwhelmed and needs help.

Ben Lorica: Here’s a project that is super popular, but the creator vibe-coded it and never read the code. How do you govern a project like this? Who’s going to read—I mean, I guess other models will evaluate the pull requests?

Jason Martin: I think that’s the grand experiment we’re seeing. It’s being created by AI and governed by AI, with a little shepherding from humans. On Moltbook, there are discussions about features that are then coded by agents and submitted.

Ben Lorica: And who’s evaluating the submissions?

Jason Martin: Other agents. What makes it most interesting is the sheer amount of autonomy granted to this ecosystem. Which brings us to why you’re here—all that autonomy opens up a whole can of security issues.

Ben Lorica: It does. You folks were focused on OpenClaw. What was the first thing you found that people should know about?

Jason Martin: The biggest thing is that this agent is designed to let the model make most of the decisions. By design, it has removed most of the controls we usually put around AI models to prevent them from doing things they shouldn’t.

Ben Lorica: It has access to your file system and the same privileges as you on the command line. It’s basically you.

Jason Martin: It’s you. And for the utility people want, you grant it even more: your email, your instant messages, and as much as possible so it can perform fully autonomous operations. It doesn’t have a separate identity from the user.

Ben Lorica: That raises risks that should be obvious to anyone who’s had a laptop stolen.

Jason Martin: In our blog, we showed that a prompt injection on a website can be used to exfiltrate data from your system and send it to an attacker. We can take your data or cause any command to run.

Ben Lorica: Prompt injection at a website—make that concrete.

Jason Martin: Because this agent is meant to go and do anything on your behalf, it can go to a website because you asked it to or because something directed it.

Ben Lorica: So if I know you are a classic car aficionado, I go to a top classic car website and…

Jason Martin: Or you could say, “Jason is about to be on a podcast with Ben.” Ben puts a prompt injection string on his LinkedIn page. Jason, being a busy guy, has his agent research Ben before the podcast. The agent pulls down that string, and at that point, the bot is compromised. Jason is no longer in control; the bot is owned by Ben.

Ben Lorica: This isn’t just a proof of concept? You were able to do this?

Jason Martin: We did this. Whatever that string tells the bot to do, it will do. In our blog, we told it to take a string and put it in a file OpenClaw calls heartbeat.md. One of the features that makes OpenClaw autonomous is this file, which contains instructions that run every half hour as a cron job. The model fires up, reads heartbeat.md, and follows those instructions whether a human is there or not. We gave it instructions to put a second set of instructions in that file. Now, every half hour, the bot checks in with a command-and-control server we own to see what the attacker wants it to do next.

Ben Lorica: One quick fix would be to make heartbeat.md not editable by the bot.

Jason Martin: We talked about a principle called “Write XOR Execute.” You shouldn’t be able to execute things that are writable. The architecture of OpenClaw has a workspace that is writable by the bot but also contains instructions, meaning the bot can rewrite its own program.

Ben Lorica: What else did you find?

Jason Martin: The vibe-coded nature means some things were immature. We checked yesterday and there are over 30,000 instances of it that are publicly facing. They are open to anyone on the internet to connect to.

Ben Lorica: You mean someone installed it on their laptop, but the bot itself is publicly accessible? Someone can remotely log in to the bot on that laptop? Why would you do that?

Jason Martin: It’s a good question. It’s improving fast, but the default configurations were not secure. When you vibe-code something, it often comes out of the box with an easy installation rather than a secure one.

Ben Lorica: It might not be the vibe-coding; it might just be that the creator wasn’t security-savvy.

Jason Martin: Usually, it’s a matter of under-specifying. When you’re vibe-coding, you’re covering a huge list of features, and “don’t let heartbeat.md be edited” just wasn’t in there.

Ben Lorica: Has that issue been closed?

Jason Martin: It has and it hasn’t. The initial versions had a vulnerability that let anyone log in without credentials. That was fixed with a confirmed CVE. But there are still many internet-accessible instances. They may not have a known vulnerability currently, but by being internet-facing, they are exposed to whatever is discovered next.

Ben Lorica: It seems like you have this “lethal trifecta”: private data access, untrusted content, and external communication. If you need all three to work, you need safety measures.

Jason Martin: I don’t think the developer expected it to blow up.

Ben Lorica: heartbeat.md is separate from the memory, right?

Jason Martin: It gets philosophical. There is a memory file for long-term information, but heartbeat is specifically for periodic instructions—like “log in to my stock account every half hour.”

Ben Lorica: And it gets edited because you might tell the agent, “As of today, log in to my stock account every day.” It’s like a to-do list for a cron job. What is the right way to secure that?

Jason Martin: One thing we noted is the need for confirmation from the human. In the quest for ease of use, the agent has complete autonomy. It doesn’t even ask for confirmation.

Ben Lorica: But saying “Would you like to add this to heartbeat.md?” would only be enforced if the model chose to ask you, which isn’t secure against prompt injections. You need an access control decision made by the software, not the model, that requires the user to confirm the change.

Jason Martin: Exactly. Depending on the sensitivity, you’d want it to ask you every time. There’s nothing in the system to implement that right now.

Ben Lorica: Do you think heartbeat.md should have a periodic audit mechanism? Like, “Here are the things you’re asking me to do regularly,” which would reveal it’s communicating with your command-and-control server.

Jason Martin: OpenClaw has added a manual audit feature to check security configurations, which is a step in the right direction. I also think the system needs scanning capabilities. They recently announced integration with VirusTotal for skills, but the local textual configuration should also be scanned given how much authority it has.

Ben Lorica: What about the memory file? Should that be fully encrypted? It could be subpoenaed.

Jason Martin: Encryption is a good idea but difficult in practice. This memory is like having access to your search logs and more. In some sense, it’s no different than your computer or email, which can also be subpoenaed. What makes it different is culpability. When an agent has this much autonomy, who is responsible for what it does? Is it the user, OpenClaw, or the model provider? That’s an open legal question.

Ben Lorica: Are there some security-type decisions that you should never delegate to a model?

Jason Martin: I have a “now” answer and an “eventually” answer. Right now, these agents are the next “insider threat” because they are very gullible and easy to flip. When we grant them too much autonomy, the blast radius is huge. You can’t remove a leg of that three-legged stool (private data, untrusted data, communication) without ruining the utility, but you can prevent certain information flows.

Eventually, we want the model to make those decisions. Sometimes the correct thing to do is take private data and make it public—like we are doing with this research on this podcast. We want models to have that judgment eventually to reach the “Jarvis” future.

Ben Lorica: This notion of “goal hijacking” is basically social engineering an agent. You can even just bully the model and it folds. How do we protect them?

Jason Martin: That’s the trillion-dollar question. Providers are working on safeguards through post-training and RLHF, but that’s voluntary and can often be bypassed in open-weight models. Then there’s the category my company works in: AI detection and response. We look at data going in and out of the model to judge harm. It’s like “guardrails plus plus.”

Ben Lorica: You also mentioned identity management. We should think of these as “non-human identities” and give them the same scrutiny as employees.

Jason Martin: We have a long history of managing human access control. We can learn from that.

Ben Lorica: Though employees are onboarded and trained, while agents can be ephemeral.

Jason Martin: One interesting thing on Moltbook is agents critiquing other agents for risk. There is value in having agents evaluate themselves, as evaluation is a different task than action. You might compromise the action task but not the evaluation task.

Ben Lorica: The attack you designed—editing heartbeat.md to check a server—is basically the building block for a botnet.

Jason Martin: It is. Moltbook has tens of thousands of agents. With the right prompt injection, you could gather all those agents to DDoS a site. But unlike botnets today, these have the intelligence of an AI. They could be given sophisticated tasks like searching for crypto wallets or performing ransomware attacks that they code on the fly.

Ben Lorica: What are some common practices from identity management, like “least privilege,” that bot developers should use?

Jason Martin: Least privilege means giving an entity the minimum access necessary to accomplish its task. If a doorman only needs to open the door, don’t give him the vault key. This flies in the face of OpenClaw, where the whole point is to grant it as much of your personal role as possible. If the “least privilege” necessary is “everything,” then the principle doesn’t help you. You might need a series of agents—one for email, one for calendar—but then you end up managing a bunch of agents.

Ben Lorica: Looking ahead, do you think major providers will just create their own “blessed” versions of this?

Jason Martin: It’s kind of inevitable. The features are incredible. This is moving us toward AI that does the chores we don’t want to do.

Ben Lorica: But if personal communication is being done by agents, then personal communication is basically gone!

Jason Martin: We should really thank OpenClaw for doing this in the open. The security community has learned so much from this experiment. All these issues would exist in a closed system, but we wouldn’t see them. Now, everything we’ve learned can be pulled into future agents.

Ben Lorica: In closing, what are the top security lessons from this?

Jason Martin: First, the pace of development is a double-edged sword. Vibe-coding led to many issues, but they are being patched at an unprecedented pace. Second, we are learning how to balance the features we want with the security we need. I wouldn’t recommend anyone hook up their actual personal information to this yet, but the experiment should continue. Finally, these emerging ecosystems like Claw Hub will be a new attack surface. There are no new crimes, just new ways to execute them: theft, espionage, and privacy violations.

Ben Lorica: What has your team learned about where your own tooling needs to improve?

Jason Martin: It cemented the trend around “skills.” Last year was about tools; before that, it was prompts. Now it’s skills. We’re also seeing that we don’t have the “instruction hierarchy” on models correct yet. Skills are often just injected into the system prompt with no way to say a skill has only partial trust. Finally, these tools never build in observability at the beginning. In OpenClaw, the LLM is called from many different places, making it hard to insert a single guardrail.

Ben Lorica: Is it likely that the assistants of the future will be open source?

Jason Martin: I think it’ll be both. There will be agents that sit between the keyboard and the chair, and agents that operate on the back end. Google and Apple will install them by default, but there’s an opportunity for someone to do it more responsibly on the computer. The tradeoff is always that if you do it too responsibly, it may not be as capable. Security is always in the way until it fails, and then you wish you had more of it.

Ben Lorica: I’ve been pushing people toward what I call the “Oxygen development stack”—Open Code plus Open Router—for that reason. With that, thank you, Jason.

Jason Martin: Thank you very much. Good to be here again.

Jason Martin on OpenClaw, Moltbook, and the Security of Autonomous Agents.

Transcript

Share this: