Site icon The Data Exchange

Your First AI Employee Is Already Clocking In

Kay Zhu on AI Employees, Open Claw, and the Future of Office Agents.

Subscribe: AppleSpotify OvercastPocket CastsYouTube •  AntennaPodPodcast AddictAmazon •  RSS.

In this episode, host Ben Lorica sits down with Kay Zhu, Co-founder & CTO of Genspark AI, to explore the rapid evolution of autonomous AI agents and the concept of hiring your first “AI employee.” They dive into the technical and security challenges of running open-source AI agents, explaining how cloud-hosted virtual machines can keep personal data safe while executing complex tasks like managing spreadsheets or making actual phone calls. The conversation also highlights the differing ways technical and non-technical users interact with AI, and looks ahead to a future where multi-agent AI teams collaborate in the modern enterprise.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Jump to transcript


Related content:


Support our work by subscribing to our newsletter📩


Transcript

Below is a polished and edited transcript.

Ben Lorica: Today we have Kay Zhu, co-founder and CTO at GenSpark AI, which you can find at genspark.ai. The tagline is “your all-in-one AI workspace.” More importantly, today we’re going to talk about GenSpark Claw, whose tagline is “your first AI employee.” For our listeners, Kay is a veteran of search and machine learning, having worked many years at Google and Baidu. With that introduction, Kay, welcome to the podcast.

Kay Zhu: Hi, Ben. Thank you for having me. I’m Kay, co-founder and CTO of GenSpark.

Ben Lorica: Let’s start with the basics: GenSpark Claw. Obviously, there’s Open Claw, and as of this week, there’s Nemo Claw from Nvidia. I broadly interpret whenever someone uses “Claw” that it’s some sort of personal autonomous agent. First, describe what GenSpark Claw is.

Kay Zhu: GenSpark Claw is our version of Open Claw. We embrace the open-source community, but we did something a little different to help our GenSpark users. First, we give every Claw a virtual machine living in the cloud. You don’t need to install the Claw on your own machine, where it might accidentally destroy your files. We try to avoid that. Additionally, for the virtual machine in the cloud, we can implement various safety guards to make sure nobody hacks into your box.

Second, GenSpark has a bunch of functionalities and sub-agents, like the AI spreadsheet, AI slides, AI documents, the AI developer, and others. In GenSpark Claw, we open up all of GenSpark’s functionality into a GenSpark CLI that the Claw can use. Basically, the Claw is able to use all the GenSpark functionalities one by one and compose them together.

Third, for the user’s safety, we give every user an API key. I’m not sure whether you have installed Open Claw yourself?

Ben Lorica: No, too scared.

Kay Zhu: I’m also scared, even as a software engineer. If I set up Open Claw myself, I need to grab almost 20 API keys from OpenAI, Anthropic, Brave, Gemini, and others, and put them into an openclaw.json file. This is my treasure. If you leak this JSON file, you are basically screwed; everybody else can use all your tokens and credits. However, if you are a GenSpark Claw user, we give you a GenSpark API key. This key can be used across different language models, API services, and GenSpark sub-agents. If we detect your key is leaked, we automatically swap it for you. It’s an easy-to-use, all-in-one setup that lets you access all GenSpark capabilities much more safely.

Finally, if you install Open Claw yourself and ask it to make a configuration change, it often stops responding because it made a mistake and the gateway died. A lot of my non-technical friends often tell me, “Hey Kay, my Claw just died. Help me.” In GenSpark Claw, we have a babysitter agent. You can just click it, and it will help you fix anything. If your Claw stops responding or gets into a dead loop, the babysitter agent diagnoses and fixes the problem on the spot. It is not a rollback; we try to move everything forward. If you ask your Claw to upgrade to the newest version and it dies, simply rolling it back means it will just fail again the next time it tries to upgrade, creating an infinite loop. Instead, we help you debug exactly what happened and fix it. To summarize, GenSpark Claw offers three main things: security (one user, one virtual machine), embedded GenSpark capabilities, and ease of use (with our babysitter agent).

Ben Lorica: I have many questions, so we’ll do a rapid-fire round. First, regarding these GenSpark capabilities—you mentioned spreadsheets, documents, and things like that. How do these relate to the tools people normally use, like Microsoft Word, Google Docs, or Google Sheets? Are these completely separate tools?

Kay Zhu: Yes, they are separate from Google Docs and Microsoft Office. However, we help you get good output in a compatible format. If you build a presentation on GenSpark and are happy with the final result, you can export it to Google Workspace or as a Microsoft PPTX file. Everything is compatible, and we think we do a good job of delivering the final output to the user.

Ben Lorica: So the advantage there is that I don’t have to give you my Google credentials, right? I’m in control of them. What about other tools people normally use, like calendar and email? That’s one of the main use cases for Open Claw.

Kay Zhu: For GenSpark’s original application, you can already link your Google Calendar or Microsoft Outlook account. This seamlessly works in GenSpark Claw as well, provided you give it permission to access those accounts.

Ben Lorica: Is the permission similar to how people give Calendly access to look at their Google Calendar?

Kay Zhu: Very similar. It’s exactly the same mechanism.

Ben Lorica: Secondly, regarding the open-source aspect—what is the license for Open Claw?

Kay Zhu: I believe it’s MIT, the last time I checked.

Ben Lorica: So you’re basically built on top of Open Claw.

Kay Zhu: Yes.

Ben Lorica: I don’t know if you followed this, but Nvidia announced Nemo Claw this week—we’re recording during GTC. Jensen Huang had an interesting analogy: he said Open Claw is like Linux, and Nvidia is essentially acting like Red Hat, providing a more enterprise-ready version. How would you describe your relationship with Open Claw’s codebase?

Kay Zhu: That’s a very good question. I also attended GTC this week and listened to Jensen’s presentation, and we gave a presentation about GenSpark there as well. Currently, everyone is trying to embrace the open-source community, and that’s what we chose to do. At the very beginning, before Open Claw, our team had a lot of technical discussions about how to build this. Some of our best engineers thought they could build something similar in a weekend. Ultimately, we decided to embrace the open-source community because we saw so many contributions going into the Open Claw codebase. As Jensen mentioned—and we’ve also met with Nvidia—Nvidia is dedicating people to work with Peter and contribute code to Open Claw. That’s why we use it. As for Nemo Claw, I don’t have much to say yet since I just heard about it this week. We are looking into it to see what we might adopt later. It’s definitely like Linux, and we are embracing Linux. Maybe we’ll be like Red Hat after some more work.

Ben Lorica: Based on what you just said, my interpretation is that you’re essentially running Open Claw in the cloud with an instance for every user. Since the codebase is Open Claw, you’re moving as fast as Open Claw, not faster.

Kay Zhu: You are right, we are moving as fast as Open Claw. However, we have our own configurations and initial skills, which are different. We also provide GenSpark-specific capability integrations. Out of the box, GenSpark Claw is much easier to use and accesses broader capabilities. You can still build your Claw step-by-step and search for skills on the web—which carries some risk but is fun to do yourself—but we give you a packaged, ready-to-use GenSpark Claw.

Ben Lorica: One of the key features of Open Claw is that the interface is basically messaging. You interact with your agent using WhatsApp or another standard messaging platform. Do you support this?

Kay Zhu: Yes, we support everything. Yesterday at GTC, I gave an on-stage live demo of GenSpark Claw. I used WhatsApp to ask my Claw to publish a LinkedIn post for me saying, “Hello world, I’m Kay Claw,” in 10 different languages. You can check my LinkedIn; it’s still there. Then, I asked it to order five lattes for my office colleagues, and they actually arrived before my session ended. We support all of these features and have made them easy to use.

Ben Lorica: Can you describe at a high level how you handle security? Open Claw is powerful because it can do many things, like posting to LinkedIn or composing an email. What additional steps did you take to make it secure? I wouldn’t want someone to suddenly be able to post on LinkedIn in my name. What happens to all these credentials and third-party authentication events?

Kay Zhu: We noticed that many recent Claw users do not have a technical background. That’s why we made the clear choice to host your Claw in a separate VM in the cloud. This way, if you explicitly log into your LinkedIn account within your virtual machine, you know you’ve granted GenSpark Claw permission to act as you on LinkedIn—but not on X, Instagram, or Facebook.

Ben Lorica: And obviously you guarantee some level of security. If a non-technical user sets this up at home, they might not know what they’re doing, but your team has done security auditing.

Kay Zhu: Definitely. We ensure that nobody else can hack into your virtual machine in the cloud to steal your credentials and impersonate you.

Ben Lorica: Is there some sort of two-factor authentication involved?

Kay Zhu: Currently, no. We don’t change the standard authentication process for individual applications. LinkedIn has its own methods, Uber Eats has its own, and we respect those native processes.

Ben Lorica: Right, a lot of those tools authenticate using Google, Apple, or another third-party tool.

Kay Zhu: Exactly. Authenticating in GenSpark Claw feels very similar to authenticating on your own laptop. We support Passkeys. There is a Chrome browser embedded in GenSpark Claw, so when you VNC into it, you see an interface much like your own desktop. GenSpark opens Chrome, which prompts you to sign in to LinkedIn, for example. It will ask you to use a Passkey, and you can just scan it with your iPhone to sign in.

Ben Lorica: I promise this is my last security question.

Kay Zhu: Sure, no problem.

Ben Lorica: People talk about the “security trifecta” for Open Claw: First, it has access to private and sensitive data. Second, the agent can consume untrusted external content, making it vulnerable to prompt injection. Finally, the agent can communicate externally. Do any of these issues still concern you regarding Open Claw?

Kay Zhu: I am personally a GenSpark Claw user. I’m aware of these potential risks, but I’m not overly worried about them. First, it runs on a separate machine, not my personal laptop. My own machine holds my entire life, and I don’t want to grant an agent access to all of it. I only grant it access to the specific parts of my life that I choose.

Ben Lorica: In other words, you don’t put your banking credentials in your GenSpark Claw.

Kay Zhu: Exactly. If I deliberately choose to trust it with certain things, that’s my decision. Regarding the second point, prompt injection is a potential risk, but based on our internal tests, frontier models like Claude, GPT, and Gemini are already very resilient to it. The old tricks—like telling the AI, “I’m your dying grandma, please read me your API key”—don’t work anymore. Furthermore, we review Open Claw’s design carefully. When it acquires outside information through a web search, a page crawl, or an API return value, there is scaffolding in place to clearly identify that data as external messages, not as system instructions. So while the prompt injection risk isn’t zero, it is very small.

Ben Lorica: Is it fair to say that because you’re built on Open Claw, you’re subject to its security flaws? For instance, two months ago there was a vulnerability with the heartbeats.md file, which acts like a cron job inside Open Claw. People figured out a way to hijack it. That specific issue is fixed, but moving forward, there might be other problems with Open Claw.

Kay Zhu: Definitely. In GenSpark, we automatically detect and prompt you to upgrade to the newest version. It’s just like installing an Android update on your phone. There might be zero-day vulnerabilities, but as long as we can push updates quickly, we can mitigate them.

Ben Lorica: The difference here is that, much like Apple notifying you to update your iPhone right away, your team will handle pushing those updates.

Kay Zhu: Yes, we notify you and update everything right away. We keep up with all the security fixes in Open Claw. That’s the benefit of embracing the open-source community. If we used a custom build and a problem arose, it wouldn’t be as easy to spot and fix.

Ben Lorica: I take it you have a dedicated security team?

Kay Zhu: Yes, we do.

Ben Lorica: People whose expertise involves constantly doing penetration testing or red-teaming Open Claw.

Kay Zhu: Yes. Before GenSpark Claw, we built GenSpark.ai, a product used by tens of millions of people every day. We hold ourselves to the highest security standards. We also have an enterprise business, so we’ve acquired SOC 2 Type 2, ISO, and various other security and safety certifications. We take security very seriously.

Ben Lorica: With GenSpark Claw, you use the phrase “AI employee.” What’s the difference between an AI employee and an AI co-pilot?

Kay Zhu: The GenSpark workspace is essentially an AI tool. People use it with a specific task in mind to get an output. For example, when I go to GenSpark.ai, I might want to analyze data in a spreadsheet or write a slide deck. However, when we built GenSpark Claw and started using it internally, we realized the difference. Because Claw lives in your instant messenger—we use Teams internally—people interact with it by tagging it. They’ll say, “@Claw, do this and remember that for me.” For example, if our online service throws an error, an engineer might say, “@Claw, help me debug this error log, figure out what’s happening, and propose a pull request for a fix.” Claw will review the logs and reply, “I’ve found the problem and composed a PR. Here is the link to review and approve.” The engineer just reviews the code and merges the change. It acts very much like another software engineer colleague in your group chat.

Another interesting aspect is that when Claw has an account in your group chat, you start treating it like a person. You begin sharing your general preferences rather than just task-specific instructions. With standard tools, you might say, “Build this slide with a black background and a professional look.” But with GenSpark Claw, we might tell it, “Hey, the engineering group is a bit boring today, so be funny in your responses.”

Ben Lorica: What you’re describing is something those of us who use coding agents have come to understand—they really act as teammates. You’re basically taking that concept and applying it to regular office work.

Kay Zhu: Yes.

Ben Lorica: If I go to GenSpark Claw right now, what features are integrated out of the box? You mentioned spreadsheets and PowerPoint slides. What else? Calendar, email?

Kay Zhu: Calendar, search, deep research, fact-checking, and several others. We also have a feature called “Call for me.” It can make an actual telephone call to a real number on your behalf. It’s very useful for booking restaurants and similar tasks. You can access the “Call for me” agent directly through GenSpark Claw.

Ben Lorica: Has anyone tested the “Call for me” agent against customer support lines where you have to wait on hold for 40 minutes?

Kay Zhu: Yes, exactly. It can handle automated answering machines very well.

Ben Lorica: When you call customer support and it says, “Choose from the following menu items,” it can handle that too?

Kay Zhu: Yes, it has the ability to navigate those menus by pressing one, two, or the pound key.

Ben Lorica: Wow.

Kay Zhu: We originally launched the “Call for me” capability about six months ago, and rolled it out globally shortly after. A lot of users are utilizing it. We’ve seen some interesting use cases: one user in the US actually used it to break up with his girlfriend because he found the face-to-face conversation too difficult. The agent called her and said, “I treasure everything we’ve experienced, but please don’t call me again.” The agent handled it gracefully.

Ben Lorica: Did it use voice cloning so it sounded like him?

Kay Zhu: No. It discloses its identity by saying, “I am Kay’s agent, and I am calling on his behalf.” It does not use voice cloning.

Ben Lorica: You mentioned that GenSpark AI has an enterprise offering. Is the vision for GenSpark Claw to be for enterprise users, consumers, or both?

Kay Zhu: We are currently testing GenSpark Claw in enterprise environments very carefully with our early access partners.

Ben Lorica: If my GenSpark Claw makes a mistake, whose fault is it?

Kay Zhu: That’s a very good question. The Claw cannot go to jail, which is the problem. I think adoption will happen gradually. Just like in a traditional organization, you wouldn’t immediately give it full autonomy.

Ben Lorica: You’ll review the output first, just like you would with a direct report. If you ask it to make a slide presentation, you review it before sending it out.

Kay Zhu: Exactly. It’s like hiring an intern. Over time, as you teach the intern your preferences and they mature into a capable worker, you give them more responsibility. I foresee a similar adaptation process for AI employees.

Ben Lorica: If someone signs up for GenSpark Claw and loves it, but they also use a service like OpenRouter where they have credits and access to many models, can they hook up GenSpark Claw to their OpenRouter account? In other words, can they bring their own model?

Kay Zhu: Currently, no, though we may support it in the future. Right now, we want to provide an entirely integrated experience. Many of our users are non-technical; they don’t know what OpenRouter or an API key is. An API key is essentially your wallet, but not everyone realizes that.

Ben Lorica: What happens to my data, prompts, and interactions on GenSpark Claw?

Kay Zhu: All storage and memory are kept inside your isolated virtual machine, so it belongs entirely to you. For API and large language model calls, we maintain the same standards as GenSpark.ai. We partner closely with frontier model and inference providers, and we have zero-data-retention agreements with them. They do not store the prompts we send; they simply process the request and return the result. We hold everything to the highest security standards.

Ben Lorica: Privacy advocates want to ensure that law enforcement can’t easily subpoena and access your data without proper court approval. How do users know their data isn’t going to be freely shared with the government?

Kay Zhu: We haven’t had any incidents regarding that yet.

Ben Lorica: But you’re basically going to operate like Apple and other major tech companies in that regard?

Kay Zhu: Yes. We hold ourselves to the highest privacy standards, while also obeying the law.

Ben Lorica: Exactly, you obey the laws wherever you operate. Another topic is end-to-end encryption in messaging apps. Is that something your users ask about?

Kay Zhu: Currently, no. In my honest opinion, the messenger integration for the current Claw still needs improvement. If you use the WhatsApp integration, it feels like you’re talking to yourself. You get messages back with a “Claw” string prepended to them, which is a bit weird. Telegram is better because its technical infrastructure is built to natively integrate bots.

Ben Lorica: What about Signal? Is that supported?

Kay Zhu: Sorry?

Ben Lorica: Signal is another messaging app.

Kay Zhu: I’m not personally familiar with Signal. For Slack, the integration is much more complicated because you need to establish a dedicated Slack app. For Line, you have to complete many steps to allow your Claw to talk to you; you essentially have to register as a merchant using Line’s automated bot capabilities. Currently, many messengers are not very friendly to AI agents. Hopefully, they will develop a more standardized way for bots to embed. The core issue for messaging apps is identity: deciding whether the Claw acts as you or as a separate entity is a major product decision for them.

Ben Lorica: Because you’re built on Open Claw, you are subject to their progress, feature updates, and potential security problems. How do you move faster if you need to? Will GenSpark Claw eventually run on a fork of Open Claw?

Kay Zhu: I see your point. The out-of-the-box experience for Open Claw is split: half of it relies on the open-source harness, and the other half relies on the environment and functionality that GenSpark provides. We offer a bunch of pre-installed apps. Think of it like buying a brand-new laptop versus buying one that already has Office, Photoshop, and other useful tools installed. Furthermore, we accumulate verified experiences and build them into skills for GenSpark Claw. These are pre-tested, and we guarantee they aren’t malicious. While agents are currently great at high-level tasks like coding and terminal operations, they still fall a bit short when navigating browsers or graphical interfaces. It has improved a lot, but it’s still a bit slow and expensive. However, there are many ways to improve this. If we have another conversation in six months, I think the landscape will have completely changed.

Ben Lorica: You’re in a unique position because you see how non-technical people use agents, whereas I mostly talk to developers using coding agents. What is the big difference between how coders and non-technical people use them?

Kay Zhu: That’s a really good question. As software engineers, we’ve been amazed by tools like Cursor, Codex, and Cloud Code. We built GenSpark because we wanted to bring that same magical experience to non-technical white-collar workers.

Ben Lorica: You alluded earlier that agents still fall short when interacting with graphical user interfaces like browsers. Is that because non-technical users are demanding that capability?

Kay Zhu: Yes, they definitely want that capability. One major difference I’ve observed is that non-technical users lack a software engineering mindset. When I use a coding agent, I ask it to make small, incremental changes so we can iterate fast. I’ll propose a small change that can be tested and merged independently, and then move on to the next one. Using this method, the agent can run for hours and commit dozens of pull requests. Non-technical users, however, tend to ask for everything all at once. They aren’t in the habit of breaking complex problems down into smaller, manageable steps. That iterative approach is a mindset we acquire through software engineering experience.

Ben Lorica: It sounds like we need workshops on spec-driven development for non-technical users. If you want the agent to build a 100-slide PowerPoint presentation, you have to give it more than just two sentences of instruction.

Kay Zhu: Exactly. You have to give it feedback, state your preferences clearly, and articulate your intentions well. We have teammates at GenSpark who don’t have engineering backgrounds but possess very strong logical skills. They understand the boundaries of the current agents and adapt very quickly. I have high hopes for everyday users adopting these AI tools.

Ben Lorica: When non-technical users use GenSpark Claw, they view it as their personal representative. Is it possible for them to say, “Actually, I need a team of representatives,” and start defining different roles? Are your more advanced non-technical users starting to think in terms of multi-agent workflows?

Kay Zhu: Exactly. Currently, each user can only have one Claw, but we are working on allowing users to deploy multiple Claws, configure them differently, and have them interact with each other.

Ben Lorica: Interesting. You mentioned earlier that the agent harnesses are basically swappable, and it’s the underlying models that make the real difference. What would you like to see from the foundation model builders moving forward?

Kay Zhu: A lot of things. Infinite context, for example.

Ben Lorica: [Laughs]

Kay Zhu: We have built a lot of scaffolding around our systems to simulate that capability.

Ben Lorica: But with your background in search and retrieval, you know that simply flooding a context window with infinite information isn’t always the most efficient approach anyway, right?

Kay Zhu: What I mean is that if the model is smart enough, it should be able to attend to the relevant information within that infinite context. Infinite context is essentially an abstraction we want to provide to end users.

Ben Lorica: I see. Are you referring to the “lost in the middle” problem, where giving a model a massive context window sounds great, but it actually loses track of information and only focuses on a few parts?

Kay Zhu: Exactly. Currently, you can’t just dump your entire codebase into the context window and expect the agent to understand it completely. But someday, if model providers or labs can provide a true abstraction of infinite context, you could upload everything without worrying about the underlying mechanics. Whether it’s powered by agentic search, vector databases, or another retrieval harness, the user wouldn’t need to know. The interface would simply allow the agent to process massive amounts of information, grab what’s relevant, and respond quickly.

Ben Lorica: And for it to remember and learn from past errors and experiences. That would be great. Looking out 12 to 24 months—if you succeed and people are widely using GenSpark Claw as their enterprise assistant—do you think the interface will stay the same? Will people still be interacting with their agents via WhatsApp?

Kay Zhu: Two years is too long to predict! I only have confidence in predicting the next six months.

Ben Lorica: But do you think messaging apps will remain the primary interface?

Kay Zhu: Not necessarily. For example, we just launched GenSpark Real-Time. In the app, you can click a button and have a full-duplex voice conversation. You talk to it, it talks back, and you can interrupt it naturally. I use it in my car while driving to the office. It feels completely different—like an assistant sitting in the passenger seat. I can ask it to delegate tasks, and it will reply, “I’ve spun off a sub-agent for that and will check the status later.” Natural, conversational voice interfaces will definitely be very important moving forward.

Ben Lorica: What’s the timeline for users being able to have more than one Claw?

Kay Zhu: Very soon.

Ben Lorica: Will there be a limit on how many you can have?

Kay Zhu: I’m not sure. When we launch multiple Claws, the only limit will probably be your wallet.

Ben Lorica: The idea is that they all operate under the same credentials, but you can start delegating to different agents.

Kay Zhu: Yes, but they will be different instances.

Ben Lorica: Are they in the same virtual machine?

Kay Zhu: No, they will be in different VMs. If you want multiple Claws, we will provision a separate VM for each one.

Ben Lorica: How do they collaborate then?

Kay Zhu: They can collaborate through a shared communication channel.

Ben Lorica: I see. Just like human employees using a group chat. But for them to work together on a file—say, a PowerPoint presentation—if they aren’t in the same VM, do they have to send the file back and forth?

Kay Zhu: Not necessarily. We already have an AI Drive in GenSpark. It works very much like human employees collaborating via a shared cloud drive; all the agents can access the same team drive.

Ben Lorica: I see. Wow, this has been a great conversation. Thank you, Kay.

Kay Zhu: Thank you for having me. It’s been a pleasure talking with you.

Exit mobile version