The Developer’s Guide to LLM Security

Steve Wilson on Prompt Injection, Supply Chains, and Excessive Agency.


Subscribe: AppleSpotify OvercastPocket CastsAntennaPodPodcast AddictAmazon •  RSS.

Steve Wilson, Chief AI and Product Officer at Exabeam and lead of the OWASP GenAI Security Project, discusses the practical realities of securing Large Language Models and agentic workflows. The conversation covers critical vulnerabilities from the OWASP Top 10, the emerging supply chain risks associated with “vibe coding,” and the unique challenges of securing autonomous agents using protocols like MCP. Wilson also outlines how to build an AI-specific incident response strategy and how practitioners can leverage AI tools to enhance their security operations centers.  [This episode originally aired on Generative AI in the Real World, a podcast series I’m hosting for O’Reilly.]

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript



Related content:


Support our work by subscribing to our newsletter📩


Transcript

Below is a polished and edited transcript.

Ben Lorica: Today we have Steve Wilson, Chief Product Officer at Exabeam and an O’Reilly author. His book is called The Developer’s Playbook for Large Language Model Security. He also works with OWASP. For our listeners who are not familiar with OWASP, it is an industry coalition centered on Open Worldwide Application Security. They do a lot of good work, and most recently, they released a report on Large Language Model Applications and Agents, which we will try to cover in this podcast. Steve, welcome to the podcast.

Steve Wilson: Thanks a lot for having me, Ben.

Ben Lorica: Let’s start by setting the stage for our listeners. Now that AI tools are much more accessible to everyone—unfortunately, including bad actors—what makes LLM and Agentic AI security fundamentally different from traditional software security? Now that we’ve democratized these tools, how do they raise the stakes?

Steve Wilson: There are two parts to it. First, when you start to build software using AI technologies, you have a new set of security issues to worry about. That is really the big topic of the book. A lot of these vulnerabilities sound similar to traditional software vulnerabilities—we use words like “injection”—but something like Prompt Injection in AI is different. When your software is getting to near human-level intelligence, it’s subject to the same issues that humans have: it can be tricked and deceived. We see cybersecurity attacks on software that look more like phishing than traditional hacking. The other interesting part is what the bad guys are doing now that they have access to frontier-class AI, and how they are using it in novel ways. It is getting crazy.

Ben Lorica: In your work with OWASP, you have this Top 10 for LLMs, which has become a great resource for teams. If you take that list, what would you say are the top one or two risks that are causing the most immediate problems, or perhaps the problems that teams underestimate?

Steve Wilson: I’ll give you the top three off the list. The first one is the one I mentioned briefly, which is Prompt Injection. This is the idea that by feeding data to the LLM as part of the instructions, you can trick the LLM into doing something that the developers didn’t intend.

The next one is the supply chain overall. The AI supply chain is much more complicated than the traditional software supply chain. Not only are you dealing with open source libraries from GitHub—which we’ve been learning how to secure—you are dealing with gigabytes of model weights and terabytes of training data. You may not know where those come from or what the provenance is. With new sites like Hugging Face where these things are hosted, we see lots of cases of malicious models uploaded in the hopes that people will download and use them.

The last one that people are focused on is Sensitive Information Disclosure. These bots are not good at using judgment about what they should and should not talk about. When you put them into production and give them access to important information—which can help with productivity or define new product features—you run the risk that they’re going to disclose that information to the wrong people. With all three of these examples, you need to rethink your software architecture and testing techniques to ensure you don’t run into these problems.

Ben Lorica: Let’s take a couple of these. First, supply chain security. Anyone who’s worked in Python knows that whenever you install something—like pip install langchain—it installs 50 other libraries that you have no idea about, ranging from NumPy to libraries for PDF extraction. There are a ton of dependencies. This is occurring at a time where everything is democratized. People who may not have been using Python before, or who may not have been hardcore coders, can now do a little bit on their own. Supply chain security is a problem. What can people do about it? What are some concrete steps?

Steve Wilson: I think there are two flavors. One is: I’m building software that includes the use of a Large Language Model. For example, I want to get Llama from Meta as a component. That could be gigabytes worth of seemingly random floating-point numbers. How do I decide that I have a good, legitimate copy? You have to apply skepticism: What’s the source? Am I getting it direct from the source? Do I know the provenance of the AI models I’m bringing into the system so that I don’t get something with problems?

The other flavor is a very hot topic right now. Traditionally we’ve called it “AI-aided software engineering,” but somebody named it “Vibe Coding” a few months ago. Andrej Karpathy, formerly of OpenAI, discussed this. It is absolutely true that people who’ve never programmed before, or people who hadn’t programmed in 20 years, are suddenly coming back into these environments.

Some of the problems we describe in the book relate to the fact that Large Language Models have a propensity to hallucinate. They get facts wrong and guess. We see this with generated code. We’ve seen examples where they will make up the existence of a software package. In the model’s “head,” it is plausible there should be a package that does a specific task, and it guesses a name. The model will write code that imports that package. We’ve seen hackers looking for commonly hallucinated packages, creating them, making malicious versions, and putting them on GitHub.

The good news is, it’s not a completely novel problem, but it’s one that a lot of people are not used to thinking about. As an industry, while we acknowledge that our ability to generate code has gone up by 10x or 100x in the last 12 months, our ability to quality check and security check this code has not kept pace. For people starting today, you need basic awareness of application security concepts and supply chain management.

Industry-wide, the tools vendors are starting to catch on. We need a different generation of software composition analysis tools designed to work with Vibe Coding and integrate into environments like Cursor, rather than just at the tail end of a big CI/CD pipeline where things only get tested at the end.

Ben Lorica: This is especially true for novices. People who are not familiar with the Python ecosystem might think, “I need something to do X.” In the Python ecosystem, there might be 20 options for doing that. As you alluded to, now that there are 20 options, why not add a 21st if you’re a malicious actor?

Steve Wilson: Exactly. We have good basic guidelines for humans: look at the packages you’re importing. Does it have a lot of stars on GitHub? Does it have a lot of users and downloads? Those are indications that you’re getting something respected. But professional developers use tooling. They get things like Snyk and Semgrep to run against their code to augment that. We need to bring those automated tools into this Vibe Coding process and make them just as easy to use as generating the code.

Ben Lorica: What’s your sense of the maturity of what people are terming “guardrails”? There are two main kinds: input guardrails—where an employee submits proprietary information on the prompt—and output guardrails on the LLM side. What’s your sense of the maturity of the guardrail tooling?

Steve Wilson: The good news is the ecosystem around guardrails started soon after ChatGPT came out. It became apparent, especially with the publication of the OWASP Top 10, that Prompt Injection and Information Disclosure were top risks. That led to the idea that you need to map the trust boundaries around your LLM—data coming in and data coming out—and police that, because the LLMs themselves are not good judges of this.

So, there is a really expansive ecosystem.

Ben Lorica: There are open source projects, right?

Steve Wilson: There are both commercial projects—some from multi-billion dollar cybersecurity vendors—and open source projects. Then there are APIs direct from the frontier model providers. You can go into the OpenAI API set and find basic guardrails. One of the problems is simply having so many choices and figuring out what to do.

regarding maturity, we’re still figuring out the science on how to have good guardrails for input. Everybody wants to wave a magic wand and make prompt injection disappear, but it’s too hard. Counter-intuitively, the smarter the models get, the more problems they have with prompt injection. When we added the ability for models to see images, people started sending prompt injections through images. I’ve seen people send them in emojis or foreign languages. The idea that you could just put a Regex that says “Please look for these bad terms” is way behind us.

There is good science and research going on, but it’s still a challenge. I tell people: put in guardrails on the input, but assume they’re going to fail sometimes. This means you need another set of guardrails on the output. You need to look for certain output types or data that you don’t want to disclose. That gives you another pass at keeping things locked down.

The last point is: don’t give access to certain types of data to your models if you’re not sure it’s safe. There are cases where a product manager comes up with a great use case, but you have to ask, “Am I really able to do that right now? I could demo it because it’s easy to build, but can I put it into production and know it’s safe?”

Ben Lorica: Speaking of access, a lot of times when we talk about AI these days, we’re talking about Foundation Models. But as you are aware, a lot of people are not building Foundation Models; they’re building applications on top of them. They’re doing post-training.

One set of tools people are excited about is the ability of these models to connect to different tools and data sources. I guess MCP—Model Context Protocol—is great in many ways. But this is another vector. How do I know that this MCP server is sufficiently hardened? There’s not a lot of vetting right now.

Steve Wilson: It’s interesting. One of the top 10 vulnerabilities on the first version of the list in 2023 was “Insecure Plugins.” At the time, OpenAI had introduced a proprietary plugin standard that was full of security issues. It died off, and in the 2025 version of the list, we took it out. As soon as we did that, MCP showed up.

Ben Lorica: There are thousands of these servers now.

Steve Wilson: Because the good news is that the standard makes them easy to build. When you look at what people are doing with them, it’s great. But fundamentally, my favorite vulnerability on the OWASP Top 10 list is “Excessive Agency.” Everybody’s building Agents, and that vulnerability has been there since 2023.

It’s the idea of how much responsibility I am giving to the LLM. LLMs are brains. We gave them mouths so they could talk. But when you give them fingers, there’s a whole different level of capability. There’s a chapter in my book where I talk about science fiction examples. In 2001: A Space Odyssey, why could HAL turn off the life support system? Because the product manager said that was a good idea. He never should have been able to do that.

There is a first question: as I build these tools and give them to my AI applications, is that a good idea? Do I know how to lock that down so it’s only used in a safe manner? And does the protocol itself support secure usage?

Two protocols have come out recently. One is MCP. The other, which has gotten incredible press recently, is Google’s A-to-A (how Agents talk to each other). The security community is getting their teeth into that and finding issues. I love that the community digs in and finds these issues, which lets the developers of those protocols rev them quickly.

As someone considering building these things into software, I would want to make sure I understand how these protocols work, how they are attached to tools, and what the potential failures are. There are a lot of risks. But given the capabilities, there will be demand. You want to be experimenting with this actively, but understand it from both ends: what’s possible, and what are the risks.

Ben Lorica: At the risk of dating you, are there lessons from web security—like HTTP and HTTPS—that can map over to this MCP world? Right now, a lot of it is based on trust. You stand up an MCP server, you’re Steve Wilson, I trust that’s a good server. But honestly, people are moving so fast that security is almost an afterthought.

Steve Wilson: It’s pretty easy to date me because I have been around the block. Think back to the early days of the web. The internet was built without considerations for security. IP never really thought about security; it was built for open access and information sharing.

Ben Lorica: That’s where we’re at right now for MCP.

Steve Wilson: Right. That’s the way these things always start. Developers often don’t understand security and don’t want to. Security has always been a bolt-on, even as we’ve entered the AI era.

The excitement I see going forward involves the ability to learn from this. At some level, people who develop software don’t think security is fun. That’s why it gets ignored. In this world where AIs help us develop code, we’re figuring out Reinforcement Learning for coding agents. That is why coding agents are getting dramatically better at building code—rapid acceleration in the last 12 months because we figured out reinforcement learning and reasoning models like o1, o3, and o4.

Ben Lorica: Now we’re starting to see companies build tools that will make reinforcement fine-tuning accessible.

Steve Wilson: The opportunity is to build security agents. Just as we build code generation agents that think it’s fun to generate code, we can build security agents that think it’s fun to do security. They are bred that way, and that’s their whole job. We can put those into the development process and start to really “shift left” in a way we never could with the last generation of tools that didn’t fit well into the process.

If we can make the AIs do it—and I’m seeing quality early work in that direction—that’s a really exciting opportunity. The lesson we can learn is: let’s build it into our stacks to do security and not just depend on everybody learning a better way to do it.

Ben Lorica: I think we’ll have to. I’ll drill down further on AI for security later. But before I go into Agents, a quick question. Is hallucination an annoyance, or should people treat it as a security threat?

Steve Wilson: Hallucination is a big threat and a massive gift. We often debate if AIs will ever create original works or just parrot things back. I’m in the camp that they’re already generating original things. They are not fully predictable systems, so they go off and do things you didn’t ask for. That’s wonderful when they generate art or elegant code.

Ben Lorica: Just to let our audience know, hallucination can occur even if you have RAG (Retrieval Augmented Generation). Some people think hallucination is only with the model itself.

Steve Wilson: Let’s talk about the nature of hallucination. People used to traditional software are puzzled by it. Software usually either does the job well or it doesn’t. But these things are more like humans. They do what we train them to do.

Think back to being in school. If a teacher asks a question and you don’t know the answer, you’ve been trained to try and give the best answer. You try your best, and you might get it wrong. The same thing happens with LLMs. If we ask them something they’re not well trained on, they might get the wrong answer because we’ve trained them to provide answers.

Ben Lorica: I refer to them as PPs: Probabilistic Pleasers.

Steve Wilson: When we give LLMs “closed-book tests,” the guessing problem is exacerbated. RAG—Retrieval Augmented Generation, the idea of giving relevant data to the LLM before it answers—dramatically increases the probability of a good answer, but it does not fully solve the problem.

Understanding that these are not fully predictable systems, that they are not binary right or wrong, and that you might get different answers on different days is important. You need to build systems defensively. You want to arm your system with as much of the right information as you can. When you do RAG really well, you can get high-percentage results. But it’s an art. A lot of people are doing simple RAG where they throw PDFs into a vector database. It’s better than nothing, but that’s not the handcrafted way to ensure the best answers.

Ben Lorica: Let’s talk about Agents, because the OWASP report focuses specifically on them. We’re talking about the introduction of agency: planning, memory, tool use, and autonomous operations. Let’s set aside Multi-Agents because I think they’re a bit of science fiction right now. I’m chairing an Agent conference and I’m bullish on them, but recent papers suggest we should focus on Single Agents for now. Aside from LLMs, what should people be the most concerned about regarding Agents and security?

Steve Wilson: There are a few qualities that make something Agentic. It’s not just implementing an “Agent Interface.” It is about Agentic qualities: being active, proactive, and capable of carrying out actions and using tools.

When you assume tool usage, it brings in a new area of worry. There is “Excessive Agency”: if I give it power tools, does it know how to safely use a chainsaw? Should I back up and give it a butter knife? Then there is the question of whether the tools are attached to the Agents in a safe way, or if there are ways to get in the middle of that flow.

With better reasoning capabilities, models are now able to do multi-step processes. We used to talk about one-shot or two-shot tasks because LLMs would get distracted. Now you can have Agents carrying out long-term tasks, looping back, and correcting themselves.

This impacts vulnerabilities like Poisoning. We used to talk about training data poisoning. In an LLM world, that was usually a problem for the Foundation Model builder. But now you have Memory Poisoning. If I can inject something into the system, not only can I prompt inject it, but that injection could be persistent while the Agent works on a long-running task. That’s a different consideration.

Ben Lorica: One of the things that is glaring in this space is that while most companies have incident response playbooks for traditional software, I don’t think many teams have AI incident playbooks. Teams haven’t sat down and decided what constitutes an AI incident.

Steve Wilson: One piece of research we put out from the OWASP team was a guide for response. We were asked a lot about how to respond to a Deep Fake incident—basically the offensive use of AI. There’s a nice guide on genai.owasp.org.

We also released a document on building an AI Center of Excellence (COE) inside your company, particularly for AI Security. By building that expertise, your security teams and executives start to understand the risks. By having a COE, you can ensure you are building response plans and playbooks. You need to augment the playbooks you have in your SOC (Security Operations Center) to deal with new threats.

Ben Lorica: The reality is that things have been democratized. Teams who are not coders can build interesting prototypes and then aggressively roll them out. Many of these prototypes are not robust enough and fail basic evals. People don’t think through what happens when things go wrong. Step one is defining an incident, and step two is the containment strategy.

Steve Wilson: Sometimes it helps to look at the past. It’s easy to get caught up in “Vibe Coding” and think this is the first time this has happened. It’s not. Think back to tools like Visual Basic. There was a time when that presented a new class of “Citizen Developers.” You didn’t need a CS degree; you just dragged and dropped. We wound up with hundreds of crazy applications built around the enterprise without oversight.

Then we put Visual Basic into Microsoft Office, and every spreadsheet became a programming environment with an attack surface. That’s why you can’t open things with macros anymore.

Ben Lorica: People could record macros so they weren’t even coding.

Steve Wilson: That was the 1990s version of Vibe Coding. We survived it, but it was a bumpy path. We know these new tools are attractive and are enabling a new generation of Citizen Developers.

The difference is that previous RAD (Rapid Application Development) tools tended to live in little boxes with specific runtimes. Now, these tools work in every programming environment and language. They can look like any professional project. It’s much harder to spot the “weird” application. People are building businesses on these or deploying them in the enterprise.

I hate when the security community gets on their high horse and tries to gatekeep or make fun of people using these tools. I see memes about “silly Vibe Coders.” Instead, we have to acknowledge that this is a hundred-fold increase in our ability to create software. As a security community, we need to help them. If we can do that, we’re in for a golden age of software development where anybody can build custom software. But we’ve got work to do.

Ben Lorica: In closing, Steve, I have to come clean. I might be guilty of the reverse—an AI person making fun of security people. Every year I walk around the Expo Hall at RSA and get confused. Everyone uses the same buzzwords like “AI,” but you go to the booth and it’s not really AI. Give us a quick overview of the state of AI being used forsecurity. What is real and usable right now?

Steve Wilson: The first thing to look for is places where people were using AI before ChatGPT. Look at User and Entity Behavior Analytics (UEBA). It’s been around for about 10 years. In a Security Operations Center, you collect billions of log lines. A security team wants to find threats like insider threats or compromised credentials.

Ben Lorica: Usually via an analyst tool where the analyst is still in the loop.

Steve Wilson: Traditionally, the analyst searches through millions of lines using brittle correlation rules. With UEBA, you build machine learning models that create complex distributions of behavior: What does Ben do on a given day? Where does he log in from? If you are suddenly on a different computer in a different country accessing different applications, I want to surface that to a human operator. That technology is becoming robust and mature.

Ben Lorica: But you can have a language interface now.

Steve Wilson: Right. Now I don’t have to search in query language; I can search in English.

Ben Lorica: That’s Splunk. People learned Splunk query language and it was a core tenet of what they did.

Steve Wilson: Now you can just say, “Find me the top 10 IP addresses sending traffic to North Korea from my network,” and you get it.

The next step is mashing this up with Large Language Models. Security Copilots and Agents are emerging. How do you take the output from UEBA and augment the operator in the SOC who has to make a snap decision? It’s a great use case for an Agent to look at that data, do it faster and more completely than a human, and help make a well-founded decision to reduce Mean Time to Response.

However, when walking around RSA, be aware that it’s never been a better time to build a great demo of an AI feature. People are demoing outrageous things that may not have efficacy. I tell everyone in cybersecurity: be deeply skeptical. AI capabilities are real and indispensable—you need them because the bad guys have them—but be skeptical when you see “AI Enabled” frosted on the side of a booth.

Ben Lorica: In closing, give us a pitch for why our listeners should listen to OWASP and what they should look forward to from OWASP in the months ahead regarding AI Security.

Steve Wilson: OWASP is a group that is more than 20 years old. It’s not specifically about AI; it’s about producing secure code and applications. It started with the OWASP Top 10, a simple educational tool for web application security.

About two years ago, we realized ChatGPT presented a new set of security issues. We put together a group to attack that problem and came out with the Top 10 for Large Language Models. I thought I’d find 10 people interested, but 200 people volunteered in the first 48 hours. We’ve produced several versions of these documents that have been downloaded hundreds of thousands of times.

Most recently, we’ve branched out from the Top 10 list to producing guidance on building Agents, Red Teaming, and other topics. We just rechristened the project as the OWASP GenAI Security Project. We’re going to have a big event at the RSA conference soon. OWASP is totally nonprofit and volunteer-driven. It’s a great way to get involved.

Ben Lorica: Given that the space is moving so fast, you will need to produce material on a regular cadence.

Steve Wilson: That is a strength of OWASP. Traditional cybersecurity bodies like MITRE and NIST move slowly, on yearly timeframes. We decided to work in timeframes of weeks. Because we have many people involved and lots of vetting, we produce high-quality guidance quickly. We are on our third revision of the Top 10 list, and many other documents are on their second or third versions. It evolves all the time.

Ben Lorica: With that, thank you, Steve.

Steve Wilson: Thanks a lot, Ben. I appreciate you having me on.