Ten takeaways from our first NYC Enterprise AI Forum

May 20

Written By Cherae Robinson

New York is sold to AI builders everywhere as the home of applied AI. The city hosts the largest concentration of global enterprise headquarters in the world, offering a genuine opportunity to put AI, and agentic AI in particular, to work in large-scale production environments. However, the version of that story I see inside our own portfolio has considerably more texture than the headline. Meetings are happening, early pitches are succeeding, pilots are greenlit, and yet the larger contracts, customer success stories, and scale are slower and harder to win. The gap between what the models can do and what enterprises have actually adopted is wide, and that pain point is landing squarely on founders building solutions that help enterprises harness the power of AI in their organizations.

I’ve been circling this problem for a while. Last year, as the AI breakthrough moment gave way to the more sober “how do enterprises actually use this” moment, Brian Benedict (co-founder ELIZA, Arcee, ex-Hugging Face) and I ran an eight-week GTM intensive with our portfolio. At the end of it, I had a running list of our founders' questions (and frustrations) with selling AI solutions into the enterprise. How do you find a champion inside a huge company? How do you qualify a POC so the relationship doesn’t quietly die? How do you hold your pricing against a sophisticated procurement team? How do you stay differentiated when the frontier models themselves start doing a passable version of the thing you sell? Some of those questions were answered in the cohort. Others opened the door to questions that would best be answered by people at the seat of the opportunity, enterprise practitioners themselves.

On May 11, we partnered with our friends at Anthropic and the Innovation Banking team at HSBC to gather 40 enterprise AI leaders and startup founders for the NYC Enterprise AI Forum, a closed-door afternoon session and dinner with early-stage founders and operators. We had representation from some of the largest and most diverse enterprises in the world, including BlackRock, JPMorgan Chase, Verizon, Fortune Media, Estée Lauder, GoodRx, and AXA, among many others. This format was designed for transparent dialogue and open engagement, both central to our goals for this gathering.

Here are ten takeaways that stuck with me and hopefully are of use to you, whether you are a founder or sales leader building agentic software, an enterprise leader looking to scale efficiencies and opportunity in your organization, or an investor/platform leader looking to shepherd your portfolio into enterprise success:

1. Human-in-the-loop is becoming a senior capacity problem

Conventional wisdom says human review keeps AI honest. What kept landing in the room was a different framing. Human-in-the-loop is best understood as a senior-judgment capacity constraint, and most enterprises are systematically underestimating what it costs. The squeeze is showing up from two directions at once.

From above, when AI completes a junior workflow well enough, you do not gain a junior reviewer; you skip the junior layer and send things straight to a VP. This means that the volume that used to flow through analyst review now lands on the desk of a managing director, and the work does not get smaller along the way. The senior reviewer is more expensive, slower to schedule, and harder to scale.

From below, the number of people who can ship is expanding. One bank leader described her firm spending years building a senior-engineer review process for AI-generated code. As they rounded the corner, the rollout of agentic coding tools dissolved that process almost overnight. Non-technical employees could now ship code and route around the senior review apparatus, flooding the reviewer queue with new tools and products from people who had never been considered in the design of the process to begin with.

Put both pressures together, and you have a small, expensive senior layer absorbing more review volume from above, while a wider base of producers feeds more material in from below. Capacity is shrinking exactly as volume is growing. Senior judgment capacity, not model capability, is a growing bottleneck on enterprise AI adoption. The firms that take this seriously will start measuring senior review time the way they measure compute. The ones that do not will keep treating human-in-the-loop as a checkbox and quietly burn out their VPs.

2.Governance was built at the legal-entity level. Value happens at the person and function level.

One of the most interesting structural problems raised at the forum had almost nothing to do with the technology. AI governance regimes at large regulated firms were built around legal entities and jurisdictions, because that is where the regulators sit. The workflows that create the most value, however, run horizontally across those entities. They are shaped by the function a person is doing, not by where the regulator happened to file them.

This is how this plays out within companies right now: Every legal entity has its own AI review committee with the authority to approve tools for that entity alone. A workflow that needs to operate across ten entities now requires ten parallel approval processes that were never designed to coordinate with one another. A senior leader asks for tool access because everyone wants it, the request lands on ten committees at once, and it resolves at the speed of the slowest one.

The usual instinct in that situation is to add more reviewers, more compliance officers, more approval steps inside each entity. That instinct is the right one when the problem is capacity, but here the problem is structure. Adding more people does nothing to coordinate across them, so you end up with a thicker version of the same wrong map. What these firms actually need is a horizontal approval body that maps onto the function itself, with explicit reconciliation mechanisms back to the entity-level regulators. That is a different governance shape, not a bigger one.

Rebuilding governance this way is now an open project at multiple firms that were in the room, and the tooling category that supports it, meaning cross-entity approval workflows, function-based access maps, and automated reconciliation with jurisdictional rules, is probably one of the more underrated surfaces in enterprise AI right now.

3.Workload specialization is where enterprise AI startups earn the right to play

The biggest framing reset of the day came from a thread that ran through both buyer- and vendor-side conversations. Public benchmarks make general-purpose frontier models look like they will eventually eat everything. Frontier models are genuinely impressive at broad-capability tasks like coding and summarization, but they hit a jagged frontier on workload-specific tasks, the ones where the buyer needs procurement-grade accuracy.

One of the founders presenting runs a document extraction product that operates across banking, insurance, healthcare, and logistics. The product is workload-specialized, built for structured extraction from documents with complex layouts, handwriting, and low-signal inputs, and that workload depth is precisely what makes it deployable across so many industries. The frontier vendors do this specific workload poorly. The specialized vendor does it at human-level accuracy for roughly a tenth of the cost.

The win for agentic companies comes down to accuracy and cost in this current moment:

On accuracy: The last few percentage points of a specific workload determine whether an enterprise can run straight-through processing, true automation. One insurer described running a benchmark against the leading frontier products and choosing the specialized vendor explicitly because the production-grade accuracy band, ninety-seven percent and above, was where the specialized model pulled ahead. Moving from 92% to 95% halves the errors, and that halving is the entire economic difference between a fully automated process and one that still needs human review. A frontier model hitting 92% on a public benchmark looks perfectly fine in market discourse, but it cannot ship in a production environment. The buy decision lives in exactly the gap that public benchmarks tend to skip over.
On cost: Specialized fine-tuned models run on far less compute, sometimes a single GPU, and can be deployed privately at a small fraction of the inference cost of a frontier model running with thinking budgets and large context windows. The same insurer described a roughly tenfold cost advantage on their particular workload. That advantage compounds with the accuracy advantage, and together they make a procurement case that is genuinely hard to beat.

For founders building for the enterprise, this is the structural opening. The durable enterprise AI businesses of the next five years will be the ones that pick a specific workload, or problem, where general model accuracy and efficiency are not enough to be a procurement-grade choice.

4.International buyers carry procurement gates that U.S. founders rarely plan for

Two non-technical procurement requirements surfaced that most U.S. founders do not plan for, and both came from buyers operating outside the U.S. enterprise default. The first is portability under regulation. The second is per-task carbon. Read together, they point at a category of gates that sit outside the technical conversation and disqualify vendors before product even comes up.

On portability. European regulation, specifically the Digital Operational Resilience Act, requires regulated buyers to demonstrate they can migrate off any single cloud provider within days. The requirement is CSP independence as a baseline operational property, not avoidance of the cloud. One European company described this as a deal-blocker that has nothing to do with price. Model-as-a-service offerings that bundle infrastructure, operations, and software into a single token price cannot be unbundled when token prices double or when the provider quietly swaps a model version. If you are selling into regulated Europe, the phrase “we run on AWS” will not carry you across the finish line.
On carbon. While the US policy environment may not be holding AI companies' feet to the fire on carbon, pun intended, international buyers, especially in Europe, have a different standard. Their ability to remain a green-stock-eligible holding gates their inclusion in major funds, which gates their stock price. That logic flows directly into procurement. The carbon footprint of their AI infrastructure (token consumption, GPU energy, data center sourcing) is balance-sheet material at the company level, not a corporate social responsibility talking point. Vendors who can credibly measure and reduce per-task energy hold a structural advantage with climate-exposed buyers.

Both points share a structure worth pulling out. The buyers who matter most outside the U.S. enterprise default (regulated industries in Europe, climate-exposed sectors anywhere) operate under procurement gates that U.S. founders typically discover late and lose on. These specific gates may not show up in every deal, but it is important to note if you run a company with global ambitions. Every market has its own non-obvious gates, and the founders who research them before the pitch arrive with answers that win the opportunity.

5.Build versus buy is a limited frame. Let the champion claim both.

The most useful enterprise GTM hot take I heard all day came from an operator who has sold on both sides of the table. The right framing is not whether the customer should build or buy. The right framing is how to structure the engagement so the internal champion gets to claim a we-built-it win and a we-bought-it win at the same time.

In practice, this means structuring a lighthouse program with three or four hand-picked beta customers: the small cohort creates scarcity, roadmap input creates the co-development motion of an embedded forward-deployed engineer, and grandfather pricing buys deal longevity. These are political tools, not discount tactics. The deal closes because the champion gets credit for both the technology and the choice, and that credit shows up in their internal narrative long before it shows up in your contract.

6.Counterintuitive procurement: go bigger, not smaller

One operator offered sales advice that might run counter to intuition - a $30K deal is often harder than a $1M deal. Small buyers scrutinize every line item, while large buyers absorb a 3% overrun without flinching. If you’ve fundraised, you’ve probably experienced the same with small check investors over larger institutional firms.

The real bottleneck at large enterprises has nothing to do with deal size. The bottleneck is whether the buyer has a single empowered point person who can coordinate privacy, compliance, and security in real time, rather than letting each team take its own three-week scheduling round. Founders chasing easier wins in the mid-market may be optimizing for the wrong axis. Sometimes, the bigger buyer with one fox inside closes faster than the smaller buyer without one.

7.Shadow AI is escalating faster than shadow SaaS did

People are eager to use AI within the enterprise, and the slow pace of adoption or the blocked-access approach is creating real risks as employees find ways to use these applications. A handful of stories from the room focused on the same security risk scenarios: employees screenshot data into iMessage to get around laptop endpoint blocks, then run prompts on their phones; employees pay out of pocket for tokens on their personal laptops to demonstrate value on a project internally; a homegrown enterprise solution gets bypassed by employees emailing data to their personal machines, using frontier models, and emailing the results back. Different versions of the same moment.

The shape looks like the shadow SaaS movement of the 2010s, but the timeline is months rather than years, and the bypass mechanisms are more aggressive because the productivity delta feels larger to the user. Blocking accelerates this rather than slowing it down. Any large enterprise that has not yet sanctioned a frontier-model pathway already has one running outside the bounds of governance.

8.The forward-deployed engineer model might have a ceiling

FDE’s are the talk of customer success, and every AI vendor now wants to embed them inside their largest customers. However, from the enterprise side, there might be a hard limit on how many vendors can do this simultaneously, due to coordination costs, security clearances, and sheer attention. The model worked beautifully when one or two vendors did it, but if this model becomes table stakes, it produces its own version of the SaaS sprawl problem it was designed to solve. This is worth watching as a market-structure question over the next year. Vendors who can replicate what their FDEs are learning as software will outlast those who are leaning on bodies. We haven’t hit the ceiling, but it’s worth building into consideration.

9. Aerospace offers a surprisingly good model for governing autonomous agents

If senior judgment capacity is the bottleneck on human-in-the-loop, the natural follow-up is what governance looks like when there is no human in the loop at all. The cleanest answer I heard all day came from outside enterprise software entirely. A founder building voice agents that run in production without human review walked through a three-tier model borrowed from aircraft engine design:

First, architectural isolation for catastrophic failures. The bolt for the fuel line physically cannot fit into the lubrication line. Failure is made impossible at the design level.
Second, real-time sensors with alerts and auto-shutdown for the failure modes you can detect mid-flight.
Third, post-flight maintenance review on a separate schedule, where a human inspects what happened after the fact.

Translated to agentic AI, the model becomes: isolate the systems for events you cannot allow to happen, run code-first interceptors that can halt or alert in real time, and run an LLM-driven audit pass on its own clock. It was the only governance framework shared that’s designed for the case of no human in the loop, and it could be the default architecture as scale continues to grow.

10.The user of the future is the agent, not the human

An operator from a major U.S. bank argued that agents need their own identities and authorization scopes, separate from inherited user credentials. Fine-grained access for agents is a distinct governance category, and treating it as a feature of existing IAM will leave most enterprises exposed.

The deeper observation that surfaced underneath this conversation is bigger than IAM. Non-technical employees have always had latent access to a lot of data that they could never quite use. Coding agents made that latent access actual, and that shift forced one practitioner in the room to rethink what people have access to in the first place.

The future is an entirely new landscape. As the primary user of enterprise systems shifts from a human to an agent, almost every assumption baked into existing infrastructure becomes a question worth re-asking. Who is this system designed for? What does authorization look like when the operator is a process rather than a person? What does an audit trail mean when actions cascade across multiple agents? Most enterprises have spent the last twenty years designing systems for human users. The next ten will be spent rebuilding them for agentic ones. The firms that take this shift seriously now will rebuild systems they thought were finished.

Two threads ran through almost every conversation. The first is that the gap between model capability and enterprise adoption is the widest it has been in a decade. The bottleneck has moved off the model and into the harness around it: interfaces, governance, change management, identity, and accountability. Wherever a founder can sell into the harness, they are selling into the place where enterprises actually need help right now.

The second is that almost every best practice being debated in public has a counterintuitive version once you talk to the people deploying these systems at scale. The room I sat in knew the counterintuitive versions and the nuance that makes a difference where it counts. That is the reason we convene these conversations. I’m interested in continuing this conversation. If you’re a NYC enterprise leader who wants to talk AI adoption or connect to our ecosystem of builders, let's talk.

EventAIagentic-aienterpriseoperators

Cherae Robinson