Thumbnail for Nvidia Just Open-Sourced What OpenAI Wants You to Pay Consultants For. by AI News & Strategy Daily | Nate B Jones

Nvidia Just Open-Sourced What OpenAI Wants You to Pay Consultants For.

AI News & Strategy Daily | Nate B Jones

26m 29s5,057 words~26 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Pull quotes
[0:00]Right now there's a battle playing out at the heart of Agent World, and it's a battle between Titans, right?
[0:00]If you're telling me, Nate, no, no, no, they're all building agents, I'm the first to agree with you.
[0:00]The point is that Anthropic and Open AI spent a year in 2025 figuring out that the companies they work with did not have the expertise to actually apply the solutions they were giving them.
[0:00]And Open AI ships very, very fast as well, but they weren't seeing those speedups at other companies, and they could not figure out why.
Use this transcript
Related transcript hubs

[0:00]Right now there's a battle playing out at the heart of Agent World, and it's a battle between Titans, right? Nvidia's on one side with Nemo Claw. Open AI and Anthropic are on the other side. If you're telling me, Nate, no, no, no, they're all building agents, I'm the first to agree with you. That's not the point. The point is that Anthropic and Open AI spent a year in 2025 figuring out that the companies they work with did not have the expertise to actually apply the solutions they were giving them. So they would launch cool stuff like Codex and Cloud Code and see it suffer in production when they could not figure out how to get actual teams at actual businesses to adopt them in ways that they themselves were using internally. Right? Anthropic ships, I swear, every eight hours, right? And Open AI ships very, very fast as well, but they weren't seeing those speedups at other companies, and they could not figure out why. And so now, because of that year of failures, Open AI and Anthropic are very publicly tying up with big consulting firms. And they're doing that because they know that they need to find ways to work with services firms to get their actual content, their actual code into the hands of people in a way that's accessible to them. It turns out that AI doesn't teach itself, at least not for most people, and I think that's a bitter lesson that Anthropic and Open AI have learned. I don't know that Nvidia agrees, because on the other side of this, Nvidia just launched Nemo Claw, and the backstory there is very, very different. Nemo Claw came from the open claw moment, right? Jensen walked out onto the stage and he said, this is the future, right? The future is open claw because the future is an agentic operating system, and that's what he saw. And so, regardless of what you think about Open Claw the piece of software that Peter Steinberger coded, Open Claw the system, Open Claw the paradigm, Open Claw the idea, that's what Jensen was talking about. And he wanted to take that idea and bring it securely to the enterprise because, of course, the big thing with open claw if you're in business is, it's not secure. It's not something you can lock down well. There's lots and lots of issues with giving your agent access to your stuff and the open internet, and so Nemo Claw is designed to be a lot more locked down. So what makes Nemo Claw tick? Nemo Claw is actually an add-on to open claw, it's not that it replaces it entirely, it's that it's designed to run in Open Shell, which is Nvidia's proprietary runtime environment, and that ensures that Nvidia is able to wrap the open claw instance in a way that's secure.

[2:33]So it has policy-based guard rails, which are YAML declarations, which the agent has to follow. It has model constraints which do two jobs. Job one is ensuring that Nvidia can validate the safety, but really job two is ensuring that Nvidia gets to serve the model. Because one of Jensen's larger moves here is to go from just managing the chip layer to move into the agentic world because in his business, he needs to go from just selling chips to scaling up to sell more of the value chain. And he's convinced agentic is a big piece of it and hence Nemo Claw. Nemo Claw also runs on local first compute, and yes, as you'd expect, there's an Nvidia play there because Nemo Claw is designed to run safely and efficiently on Nvidia chips that run locally. Nemo Claw is very much a strategic play for Jensen because what Jensen is trying to do is he's trying to figure out how to pivot into an ecosystem play where everybody who has all of this energy around open claw will be indirectly contributing to value Nemo Claw, which he can then sell to enterprise. Like that's the dance he's trying to walk here, and by the way, if you're a contributor to Open Claw and that makes you annoyed, I get it. This is just part of how corporate works. And so the long and the short of it is that Jensen is bolting on enterprise grade compliance and security solutions as a patch as a layer over the top of open claw to make it something with an open framework that runs on Linux that enterprises can pick up and use. Whether or not you find that believable, I want you to step back and look at how this assumes competence on the part of enterprises. Remember we started this video and we talked about the story Anthropic and Open AI have been telling themselves, where they recognized very publicly over the last year or so that their solutions were too complicated to successfully roll out to engineering teams at enterprises. Now here comes Jensen onto the stage and he says, you know what? You developers are smart. You developers can figure this out. People are already using open claw by the hundreds of thousands. You guys got this, let me just roll out this open source framework and we're good to go. And you know what? I think one of the things I notice about Jensen's approach, it's not necessarily the corporate strategy here. It's actually the fact that a lot of what he focuses on are basics that we have known in data back end engineering for a long time. And this is something that I keep coming back to and thinking about as I go through change management processes with companies. I recognize that in many, many ways, what consultants are making complicated today is actually the age old practice of good data engineering that turns out to be super useful in the age of AI. And I can't help but wonder if Open AI and Anthropic change their tune a little bit, and instead of saying AI, AI, AI, isn't it amazing and complexifying it for people. If they actually came in and said, let's talk about what we've always known as developers. Let's talk about how data actually works in the principles of development, and then, and then let's talk about how AI letters onto that data back end in ways that are really useful. Maybe the process of change would be easier. And I think in a way, Jensen understands that. Just for fun, let's go all the way back to Rob Pike's five rules of programming. If you don't know who Rob Pike is, you should because he's one of the creators of Unix and Go. He's an absolutely legendary developer. Rob Pike's five rules are things that get taught in computer science, are things that senior engineers teach to juniors, are sort of written in the stars if you're in the discipline. Rule number one, you can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. I cannot tell you how many times I've used that rule when debugging systems, it actually works. It is very hard to tell until you run a system, where the bottlenecks are going to happen. That is true for agentic systems, people. That rule didn't go out of style, and by the way, yes, I'm going through all five of these because I don't think we talk about them enough, and I don't think we realize amidst all the hype and all the change that some of these ancient engineering practices still hold true. Rule two, measure. Don't tune for speed until you've measured, and even then, don't do it unless one part of the code overwhelms the rest. In other words, if you aren't measuring and baselining your performance, it's really hard to optimize. Do we see that with agentic systems? We sure do. How many times do people tell me they don't like an individual L L M response, and I have to tell them, maybe you should baseline it. Maybe you should measure before you make big assumptions and changes. Rule number three is kind of just don't get fancy, or more precisely, it's fancy algorithms are slow when your number is small, and your number is usually small in computer science terms. Fancy algorithms have big, big constraints. Fancy algorithms usually only work at scale. Until you know that your number is frequently going to be large, don't get fancy. This is true for agentic engineering as well. If you're trying to build agentic systems, simple scales well, and in fact, I would add there's probably a correlary here. Simple scales better than complex, and this is something that may have shifted with agentic engineering. Because we did find for a while if we were writing algorithms that there were times at large scales when you had to have a fancy algorithm. Now, I think we're abstracting a lot of that edge case complexity to L L M's, and that requires us to have very stable, simple architectures that scale. So that's one that I have some interesting nuance around, but fundamentally it's true, right? Don't get over fancy, especially when the system is small. Rule number four, fancier algorithms are buggier than simple algorithms. This was the era, by the way, when Rob had to write his algorithms by hand. I know that everyone here doesn't know that anymore because we all just prompted our L M's, but this was handwritten stuff, right? Use simple algorithms for simple data structures. That's the heart of rule number four, and this is a correlary to rule three because if rule three talked about simplicity and scale. Rule four talks about simplicity and bugs. It is very, very hard to debug complex agentic systems. You're like, is it the prompt? Is it all of this context that I'm pulling in? What's the problem? As much as you can, simplify because the more that you simplify, the better off you're going to be. The better off you're going to be debugging, the better off you're going to be maintaining the system, etcetera. Rule number five, data dominates. If you've chosen the right data structures and if you've organized things well, the algorithms will almost always be self-evident. In other words, write dumb code and have smart objects in your data system, right? That's the short version. This cannot be more true in the age of AI. Data engineering is the key to having good, smart agentic systems, and I think we missed that. This is not new at all, this is decades old. Every time we go through hype cycles and I've been through a bunch of them, right? I've been through the cloud hype cycle, I've been through the mobile hype cycle, now I'm in the AI hype cycle, and we forget, we think it's all new. And we forget little things like the fact that we should keep structures simple, that data dominates, that we should build data structures that enable us to do more complicated things in ways that are sustainable. This is what Jensen is arguing for when he wants a simple set of primitives to build an open source ecosystem for agents. In a way, I think Nvidia's engineers understand this better than a lot of the other engineers in the AI ecosystem right now, and that may be because they have to be so close to the kernel and so close to the metal all the time. You have to have good principles when you're trying to optimize for GPUs, and when you optimize for GPUs over time, you build an engineering culture that demands excellence and adherence to good best practices, and I see that written all over Nemo Claw. And I think that if we look at the story of how much trouble organizations are having adapting to AI. And if we ask ourselves, is it the message itself that's the problem or is it the way it's presented? I would kind of argue it's been the way it's presented because we have presented. I have seen so many consultants pedaling complexity as if it was a good thing with AI. Like presenting some kind of complicated agentic mesh and saying this is the way, or presenting a really complicated change management paradigm, or presenting lots and lots and lots of very hard to read docs and saying, go dig into this. These are your prompting tools. Simpler scales. We need simpler approaches that enable people to understand what we're saying, and ironically, if we go back to the way we always engineered systems, we're going to find that a lot of those truisms like Rob Pike's rules still work. They're not out of style. And that brings me to one of my favorite examples in the age of AI because I want to make this more updated. Yes, there's new things, new changes, but we have to understand how these old structures are informing new ways we work. I think factory dot A I has a wonderful example here. Their agent readiness framework evaluates code bases against eight different technical pillars. Style and validation, build systems, testing, documentation, the dev environment, code quality, observability, security, and governance. And what they find is that consistently speaking, the agent isn't the broken thing, the environment is, which goes back to that data insight. If you can fix your data structures like Linter configs, like documented builds, like dev containers, like an agents dot markdown file, agent behavior then becomes self-evident. It's effectively a correlary to what Pike was talking about years and years and years ago. And so factories data shows that getting these fixes right compound in exactly the way we would expect it to following good software engineering principles. If you have better environments, you make your agents more productive, which frees time to make your environments better, which in turn feeds the loop and your agents get more productive over time. And there's a convergence here around agentic best practices that I want to call out and name explicitly. So I'm talking about factory's best practices, Nvidia's best practices, but also some of the way Anthropic organizes things, some of the way Microsoft organizes things. There are essentially a whole set of agentic rules of the road that we are publishing that are Pikes rules rediscovered by people who know their fundamentals. And I want to name the primitives that are emerging because I think that we should understand these rules of the road that underlie best practices across a bunch of different companies and recognize their old roots because I think it will help us to change more effectively. So with that, I want to walk you through the five hard problems that I've seen in production agent deployment. I'm going to go through each one in detail because the distribution of difficulty here tells you about where people are spending money, where people are expecting engineers to solve it internally, and really, what best practice looks like. The first one is context compression. So long running agent sessions fill up context windows. They just do, even million token context windows or 10 million token context windows, they all fill up, and every compression strategy is lossy. It always loses something. So factory tested three different production approaches to see which was best. They had their own method, which they call anchored iterative summarization. Big words, it maintains a structured and persistent summary with explicit sections for session intent, for file modifications, for decisions made and for next steps. When the compression triggers, the newly truncated span gets summarized and then merged with the existing summary. So the structure essentially forces preservation, you can't break the previous summary, right? Now, they compared this approach against Open AI's compact endpoint, which produces a very opaque, you can't see what's on the black box, and it just gives you compressed representations that are optimized to be reconstructed faithfully. That's a fancy way of saying it's it's compressed very highly, and you can't read the output to verify what was preserved because Open AI famously doesn't expose any of that. And then they tested it against Anthropic's built-in compression through the cloud software development kit, which generates very detailed structured summaries, but regenerates the full summary every time rather than doing it incrementally. That difference starts to matter across repeated compression cycles because you're regenerating the whole summary, you're playing telephone again. The results were clear. Factory's approach of incremental summarization scored the highest, but all three struggle with tracking artifacts. So if you're naming and remembering particular files, all three struggle with that a bit. And the mitigation here is pretty simple. You have to think about your project in terms of milestones and make sure that the milestones can be compressed in ways that allow the agent to continue to work. And that if you cannot do that, you have multi-agent frameworks that allow the agent to pick off and address big pieces of work and then die and refresh the context window with a new agent without losing that context so that you get these long-running tasks. That's how you get these multi-week agent runs and don't stuff out the context window. You see how it all comes back to data, like these are real 2026 agent problems, but they come back to underlying principles around how we handle data and complexity that aren't new. Code-based instrumentation, that's another one. Gee, does that come back to Pike and measuring? It sure does. This isn't even an agent problem, right? This is a software hygiene problem. We have always had challenges when we've been doing engineering projects, especially where we've been in a rush, it's been hard to be disciplined and measured. Making the code base agent ready is partly about being able to measure stuff, and we should not forget it. I don't want to belabor this one too long. If you are an engineer and you're like, I need to be able to make a contribution to AI, one of the simplest things you can do is just do the measuring. It's decades old, it's not new, but figuring out how to say this is our current baseline performance, maybe with our LLM chat window, maybe with our agent, whatever it is, and you can measure it effectively because you understand this is the baseline, this is what latency looks like. This is what a good set of responses looks like and I have a nice golden data test set and I can true that up against what's in production. You have done a tremendous service to your business, and you don't get appreciated enough, probably. But it's really important, and it's not new, it's just that we have to take it seriously because we are giving these autonomous agents a lot of power, and we're not really measuring them if we're not disciplined. Problem number three, agentic coding work is around linting. Now, if you don't know what linting is, I'm not talking about the stuff in your couch cushions. Linting is when you are doing static analysis of the code. You're not making changes, you're just checking it for small style issues, for inconsistencies, for potential bugs at runtime, and you're coming up with a report. Linting rules are how we make linting work, and one of the ways that you can detect issues with agentic code is by getting very, very strict with your linting so that you are insistent on extremely clean code. This isn't new, right? This is about enforcing simple structures. The factory team has this lengthy series of blog posts about all of the obsessive linting rules they have that basically put the code in a straight jacket and say, it must adhere to best practices all the time. Now, individual developers, if they're the ones in charge of linting, may say, ah, I don't know, I'm tired. I don't really want to write all my linting rules, but in a good, healthy engineering organization, you have some common core around linting where you say, okay, this is what good looks like for us. We're going to insist on it, and that's especially important when you have agents involved because the agents are by definition just trying to get the job done. They are lazy developers, they are happy just to kind of throw it off their plates and not listen. And so if you don't have a strict linter that is going to go through and insist on simplicity, you are going to be in trouble. Again, not a new thing, it's just a common thing that we are now applying in the world of agents. An ancient engineering piece of wisdom, if you will. Problem number four, how you handle multi-agent coordination? I've talked about this in other videos, we're converging around a rule where we say planners and executors are the way to do long-running multi-agent coordination, and that makes sense because we're not overcomplicating it. And one of the things that Pike has called us to remember is, hey, you don't need to optimize something prematurely. You don't need to optimize it if you can't measure it. And so when we've actually tried to over-optimize and overcomplicate, and there are engineering teams at many orgs that try and do this. I just, I encourage folks to say, you know what, let's not overcomplicate it. Build the simplest possible version of this agentic development pipeline, and then we can always add more value by complexifying it if we really have to. But we don't need to optimize prematurely if we can't even measure whether it does the job yet. Again, not new. And if you're wondering, why am I taking time to talk about what isn't new? It's really simple. I think consultants often like to sell this as all new because it drums up business. I would prefer to tell the truth and say, these are ancient data engineering practices. These are old software engineering best practices that we can apply in ways that are new to build these systems, but the practices and principles aren't that new, and I think that helps us with our change management. The last challenge is the hardest one. It's around specifications and fatigue. What I find in practice is that teams really, really struggle with a skill of defining a spec clearly up front. It's a lot of work. There are some people who claim it can't be done, or it's so much work, we should just code the thing. I've seen real speedups, but it does require you to be very precise and crystal clear in your thinking, and you also have to be very good at writing evals at the end. And you have to be disciplined about not taking shortcuts. And so if you are going to give an agent a context window, you have to be disciplined about making sure your context graph is really clean so the agent can go search and get the context it needs clearly by navigating a hierarchy rather than just stuffing it all in the context window and hoping and praying because you're lazy. In other words, we humans have to be less lazy if we want the agents to do good work for us. And I know that is counterintuitive because you are often sold a world where humans should just sit back and we just go and get coffee and then we're done. That's not how it actually works, and that's never how good engineering worked, it shouldn't be new, it shouldn't be a surprise. And I think sometimes we're sold agents as like labor savers, and that's just disingenuous, it's just not true. So why does all this hype exist? I went through five problems. I showed you how they're critical now in the world of agents, I showed you how they rest on old engineering best practices. I think if we message them that way, it would be useful to us. I think it would be easier to understand. I think that Anthropic and Open AI would have less issues communicating to developers. I think it's something that Nemo Claw is starting to get right. Part of why as an industry, we have not done this well, is that the chaos is worth a lot of money. Consultants coming in and peddling their wares and saying, this study shows that it's really hard, helps them earn business. And it is hard, right? But it's hard in a way consultants typically don't help you with. It's hard in a roll up your sleeves, get into the code, co-build with me, dig in, help me understand the principles, and so many times consultants don't want to get their shoes dirty, right? They want to come in and just do a power point deck. Ah, they want to deliver a great deck and then move on. That's not how it works, right? If you're going to do real change management, if you're going to help engineers and product managers and designers figure out how their roles are changing because their whole jobs are changing, you can't do it with a power point deck. It's not going to work that way. You have to go back and anchor in things that we all understand and have built on that. As I've shown, you can do that. And then you have to walk forward and say, here's how this applies today. That's why I walk through these problems. That is much more specific than I have seen in any standard run of the mill consultant tech, which so often like level up here and they talk fluffly about how great AI is. It doesn't help you get the work done. And this is what I think we're missing when we look at launches like Nemo Claw. Because Nemo Claw as a launch is interesting. Nemo Claw is a play for Nvidia. Definitely interesting, they're trying to move beyond chips. But Nemo Claw is a way of saying to the industry, you got this, you can figure this out. We've got good engineering best practices that we can rely on and use to do real agent work. Now that's interesting. And that's something that I wish we did more of, and I think if we worked more on that piece as a discipline, we would have less need for these tie ups that we see between consulting firms and big companies like Open AI and Anthropic. Because I think at the end of the day, in a sense, when you're outsourcing the change management, you are losing control of the narrative. And one thing Anthropic and Open AI probably don't want to do is lose control of the AI change narrative in their target companies. It is already fraught enough. There are already enough people producing half true rumors, sometimes completely false rumors about what AI can and cannot do. What AI will and will not do, and by the way, it is both. I see lots of false rumors about what AI can do, I see lots of false rumors about what it can't. I think it's helpful if we go back and we say, this is just computing. We've known about computing for a long time. We understand how computing works. The fundamentals aren't changing, but we have a new level of abstraction to put over the top, and we should talk about it concretely and explain in a detailed way how our old principles of engineering have actually evolved. And that's what I tried to do in this video. That's what I laid out for you so you could understand, we're not doing new stuff here when we design agentic systems. We're relying on good engineering practices we've already had, and in a way, a lot of what I'm doing on this channel is actually teaching good data engineering practices to a lot of people who didn't come up and do data engineering in school. Because it turns out if you want to build these systems yourself, you have to know just enough about data engineering to build systems that work. It turns out it's not scary, it turns out you can learn these principles, you don't have to go and get a CS degree, and that's really empowering, and that's really cool. And that's really fun for me because I'll be honest, I didn't get a CS degree either. I taught myself, I was building computers, I had fun. And I think what's interesting is L L M's are essentially a teachable moment. L L M's are giving so many more people access to compute, we're all coming to this with fresh eyes. Because when we look at change management in orgs, I've talked about engineers, but to be honest with you, it's not just engineers, right? It's product managers, it's sales, it's CS. Shopify was shocked when they first got cursor because there were so many CS people who wanted cursor, right? They were coding under the desk. Coding under the desk is a massive 2026 phenomenon that is by definition not engineering related. And if you want the coding under the desk to work, you got to make sure that we have a little bit of a sense of how best practices works. And if we understand that, we're going to be able to take tools like Nemo Claw and actually put them to work effectively. So hats off to Nvidia for believing in us a little bit, right? For saying, we can roll our own, we can build stuff that works, we can understand how good data engineering best practices, old computer science best practices that age well, are still applicable today. Evolve them appropriately and tackle good agentic engineering challenges. I want more of that, and I hope you do too. Cheers.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript