Beyond the Agent Hype

Ben Lorica and Evangelos Simoudis on AI Agents, Application Reliability, and Startup Visas.

Subscribe: AppleSpotify OvercastPocket CastsAntennaPodPodcast AddictAmazon •  RSS.

 

Ben Lorica and Evangelos Simoudis discuss some takeaways for AI practitioners from the recent AI Conference. They cover the disconnect between the hype and reality of AI agents in the enterprise, the critical importance of building reliable and trustworthy AI applications, and the potential impact of new H-1B visa policies on tech startups and innovation.

 

Interview highlights – key sections from the video version:

Jump to transcript



Related content:


Support our work by subscribing to our newsletter📩


Transcript

Below is a heavily edited excerpt, in Question & Answer format.

Agents: From Conference Hype to Enterprise Reality

What was the key disconnect observed about agents at the recent AI Conference?

There was a significant gap between how agents were being promoted at the conference—dominating both expo booths and presentations—and where enterprises actually are in their adoption. While vendors and speakers were pushing visions of fully autonomous agents, the reality is that most enterprises are still experimenting with basic implementations like chatbots for customer support and programming copilots like GitHub Copilot. These are not the sophisticated, autonomous systems being advertised, but rather supervised, narrow-scope tools.

What specific challenges are enterprises encountering with current agent pilots?

Organizations piloting more advanced agent systems from vendors like Salesforce (Agent Force) and ServiceNow are facing several critical issues:

  • High operational costs: Agents that orchestrate multiple LLMs through APIs become prohibitively expensive to run at scale
  • Lack of proven case studies: There aren’t enough concrete success stories to justify broad investment
  • Insufficient organizational readiness: Companies lack the training and infrastructure to implement and manage these systems effectively
  • Shadow agents and security risks: Employees are creating unauthorized “rogue agents” throughout the enterprise to automate personal workflows, operating outside IT governance and creating cybersecurity exposures and uncontrolled costs

What foundational prerequisites must be in place before deploying sophisticated agents?

Organizations need several critical foundations before agents can be effective:

For unstructured data: Robust search and retrieval capabilities are essential. If an organization’s internal search doesn’t work well, general-purpose agents relying on that information will perform poorly.

For structured data: A semantic layer that provides necessary context about business metrics, relationships between data entities, and how to correctly query databases.

Additionally, teams must ensure high data quality, implement proper access controls, and fundamentally rethink workflows to determine which tasks need simple AI-augmented applications versus true autonomous or multi-agent systems. This includes explicitly deciding between single-agent and multi-agent architectures based on actual requirements for observability, safety, and ROI.

How should practitioners think about the current state of agent technology and its limitations?

Today’s “agents” are essentially sophisticated prompt-based orchestrations rather than truly autonomous systems. Much of the intellectual property lives in the prompt sets and routing logic, which creates significant migration challenges. When teams want to switch to a newer, cheaper, or better model, it’s not a simple swap—they must re-test and potentially re-engineer their entire suite of prompts. Small changes in prompts, system instructions, or tool outputs can cascade into large behavioral shifts. Teams should approach agents as advanced workflow automation tools (similar to next-generation RPA) rather than fully autonomous decision-makers.

Building Reliable AI Applications

Why has reliability become the critical gating factor for enterprise AI adoption?

As AI applications move from low-stakes tasks like casual image generation to critical domains such as medicine, finance, and manufacturing, reliability becomes paramount for legal, compliance, and brand protection. The core reliability challenges fall into two categories:

  1. Correctness issues: Hallucinations where models generate plausible but factually incorrect information, and misuse of sources
  2. Consistency problems: Non-deterministic behavior where the same query can yield different answers, creating massive legal risks and potential lawsuits for enterprises

The non-determinism is particularly problematic—for example, a medical AI prompted with identical patient data could give different recommendations depending on whether the user is framed as a “physician” or an “insurance agent.”

What baseline engineering practices should teams implement immediately for AI reliability?

Teams must adopt fundamental software engineering rigor:

  • Version control: Treat prompts as code with proper versioning and datasets
  • Testing infrastructure: Establish golden test suites and regression checks; implement systematic testing protocols for different scenarios
  • Observability: Log all tool calls, track latency and cost budgets per step
  • Safety controls: Add circuit breakers and rate limits; enforce role-based access control (RBAC) and data-scope policies
  • Performance metrics: Define SLOs for accuracy, coverage, determinism window, cost, and response time
  • Real-time validation: Integrate with authoritative data sources, especially critical in domains like medicine where papers may be retracted

What advanced techniques are emerging to improve reliability?

Several sophisticated approaches are being developed and deployed in production:

Grounded Generation/RAG: Using Retrieval-Augmented Generation to anchor model outputs in specific documents, while acknowledging RAG is not a silver bullet—it’s only as good as the source material and requires integration with retraction databases in high-risk domains.

Consensus Mechanisms: Querying multiple LLMs and using voting systems or “judge” models to evaluate outputs, accepting the trade-off of higher cost and latency for improved reliability.

Prompt Optimization at Scale: Production systems are combining TextGrad (gradient descent-like optimization in text space from Stanford) with evolutionary algorithms based on Pareto optimization (Berkeley) to systematically optimize families of prompts. This treats prompt engineering as a formal optimization problem with ELO-like evaluation loops.

Model Right-sizing: Using smaller, specialized models when possible to lower variance, cost, and data-leak risk, rather than defaulting to large foundation models for every task.

Deterministic Envelopes: For absolute repeatability requirements, teams can use lower temperature settings, constrained decoding, fixed tool schemas, canonicalized inputs, frozen tool versions, and seed control where available, with rule-based checks or human approval layers when necessary.

Can teams rely on foundation model providers to solve reliability challenges?

No, this is not a viable strategy. While foundation models will improve, application-level reliability is a distinct challenge that requires deliberate engineering. Teams that proactively build reliability and trust as core product features—with measurable guarantees, evaluation harnesses tied to customer workflows, transparent cost controls, and domain-specific safety integrations—will have significant competitive advantages over those waiting for model providers to “solve it later.”

H-1B Visa Policy Impact on Tech Ecosystem

How does the new H-1B visa policy specifically impact startups versus large corporations?

The policy’s high salary thresholds (around $100K) create a disproportionate burden. While tech giants can absorb these costs or open offices in other countries as workarounds, startups face prohibitive expenses when trying to bring critical early team members to the U.S. When a startup is already paying engineers $180K, the additional H-1B costs become a significant financial burden that can affect runway and hiring decisions. This “one-size-fits-all” approach particularly harms smaller, innovative companies that are crucial to the tech ecosystem’s dynamism.

What are the broader economic and workforce implications beyond tech startups?

The impact extends across multiple sectors:

Manufacturing and skilled trades: There are currently 500,000 unfilled skilled manufacturing jobs requiring sophisticated technical backgrounds that H-1B holders could help fill. These smaller and mid-sized companies now face even higher barriers to addressing critical workforce gaps.

Non-tech Fortune 500 companies: Large companies in retail, healthcare, and traditional industries have historically used H-1B programs to recruit top technical talent from universities who might not otherwise consider “less sexy” brands. This becomes a more expensive and difficult recruiting tool.

Academic institutions and research: Universities are already seeing impacts on research programs, with potential acceleration of “brain drain” where top international talent educated in the U.S. chooses to build companies and careers elsewhere.

Risk to innovation pipeline: The policy could slow product velocity, lengthen hiring cycles, and increase costs across the entire innovation ecosystem.