TubeScript Get a Transcript

Thumbnail for Inference Chips for Agent Workflows by Y Combinator

Inference Chips for Agent Workflows

Y Combinator

1m 20s182 words~1 min read

Auto-Generated

Watch on YouTube

Share

[0:00]Most AI chips are designed for a world where inference means prompt in, response out. Agents don't work that way. They loop. Calling tools, branching, backtracking, holding context across dozens of steps. That's a completely different hardware problem. Current GPUs hit 30% to 40% of peak utilization on these workloads because the work is bursty, bouncing between memory-bound model calls, IO-bound tool use, and CPU-bound orchestration. That gap is where purpose-built silicon wins. NVIDIA bought Groq for $20 billion because they saw this coming. Google built TPUv7 for inference specifically, but nobody's designing for the agent loop itself. Fast contact switching between models, native speculative decoding, memory built for KV caches that persist across an entire execution graph. Groq's real insight wasn't the chip, it was the compiler that made the chip work. We think that will be true for whoever builds this next. If you understand both the chip architecture and how agents actually execute, this is a rare moment where both halves of that experience matter. If you're building inference Silicon for agentic AI, we'd love to hear from you.

MORE TRANSCRIPTS

Thumbnail for May 2026 CB8M Roosevelt Island Committee Meeting by Community Board 8 Manhattan

May 2026 CB8M Roosevelt Island Committee Meeting

Community Board 8 Manhattan

Thumbnail for Azərbaycan tarixi. Dərs 4. Azərbaycan e.ə VII-IV əsrlərdə by Bilgə Bilik Yurdu

Azərbaycan tarixi. Dərs 4. Azərbaycan e.ə VII-IV əsrlərdə

Bilgə Bilik Yurdu

Thumbnail for God Grieved That He Made Them — The World Before the Flood by Scripture in History

God Grieved That He Made Them — The World Before the Flood

Scripture in History

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript