Thumbnail for "I shipped code I don't understand and I bet you have too" – Jake Nations, Netflix by AI Engineer

"I shipped code I don't understand and I bet you have too" – Jake Nations, Netflix

AI Engineer

20m 25s3,359 words~17 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Timestamped outline
Pull quotes
[0:21]Uh, I've shipped code I didn't quite understand, generated it, tested it, deployed it, couldn't explain how it worked.
[0:40]So, now that we've all admit that we all ship code that we don't understand anymore, I want to take a bit of a journey to see how this kind of has come to be.
[0:40]So, I spent the last two years in Netflix helping drive adoption of AI tools, and I have to say the acceleration is absolutely real.
[0:40]Backlog items that used to take days now take hours, and large refractors that have been on the books for years are finally being done.
Use this transcript
Related transcript hubs

[0:21]Hey everyone, good afternoon. Um, I'm going to start my talk with a bit of a confession. Uh, I've shipped code I didn't quite understand, generated it, tested it, deployed it, couldn't explain how it worked. And here's the thing though, I'm willing to bet every one of you have too.

[0:40]So, now that we've all admit that we all ship code that we don't understand anymore, I want to take a bit of a journey to see how this kind of has come to be. First, we look back in history, we see that history tends to repeat itself. Second, we've fallen into a bit of a trap. We've confused easy with simple. Lastly, there is a fix, but it requires us not to outsource our thinking. So, I spent the last two years in Netflix helping drive adoption of AI tools, and I have to say the acceleration is absolutely real. Backlog items that used to take days now take hours, and large refractors that have been on the books for years are finally being done. Here's the thing though, large production systems always fail in unexpected ways. Like look what happened with Cloud Fair recently. When they do, you better understand the code you're debugging. And the problem is now we're generating code at such speed and such volume, our understanding is having a hard time keeping up. Hell, I know I've done it myself. I've generated a bunch of code, looked at it, thought, I have no idea how this what this does. But, you know, the test pass, it works, so I shipped it. The thing here is, this isn't really new. Every generation of software engineers has eventually hit a wall where software complexity has exceeded their ability to manage it. We're not the first to face a software crisis, we're the first to face it at this infinite scale of generation. So let's take a step back to see where this all started. In the late 60s, early 70s, a bunch of smart computer scientists at the time came together and said, hey, we're in a software crisis. We have this huge demand for software, and yet we're not really able to keep up and projects are taking too long and it's just really slow. We're not doing a good job. So, Dijkstra came up with a really great quote, and he said, when we had a few weak computers, I mean to paraphrase a longer quote, when we had a few weak computers, programming was a mild problem, and now we have gigantic computers, programming has become a gigantic problem. He was explaining as hardware power grew by a factor of a thousand, society's wants of software grew in proportion. And so it left us, the programmers to figure out between the ways and the means, how do we support this much more software? So, this kind of keeps happening in a cycle. In the 70s, we get the C programming language so we can write bigger systems. The 80s we have personal computers, now everyone can write software. In the 90s, we get object-oriented programming. Inheritance hierarchies from hell where, you know, thanks Java for that. In the 2000s, we get Agile where we have sprints and scrum masters telling us what to do. There's no more waterfall. In the 2010s, we get Cloud Mobile DevOps, you know, everything, the software truly ate the world. In today, now we have AI, you know, copilot, Cursor, Claude, Codex, Gemini, you name it. We can generate code as fast as we can describe it. The pattern continues, but the scale has really changed. It's it's infinite now. So, uh, Fred Brooks, he you might know him from writing the mythical man month. He also wrote a paper in 1986 called No Silver Bullet. And in this, he argued that there would be no single innovation that would give us an order of magnitude improvement in software productivity. Why? Because he said the hard part was never the mechanics of coding, the syntax, the typing, the boilerplate. It was about understanding the actual problem and designing the solution, and no tool can eliminate that fundamental difficulty. Every tool and technique we've created up until this point makes the mechanics easier. The core challenge though, understanding what to build, how it should work, remains just as hard. So, if the problem isn't in the mechanics, why do we keep optimizing for it? How do experience engineers end up with code they don't understand now? The answer I think comes down to two words we tend to confuse, simple and easy. We tend to use them interchangeably, but they really mean completely different things. Uh, I was outed at the speaker dinner as being a closure guy, so this is kind of clear here. But Rich Hickey, the creator of the Closure programming language, explained this in his talk from 2011 called Simple Made Easy. He defined simple meaning one-fold, one braid, and no entanglement. Each piece does one thing and doesn't intertwine with others. He defines easy as meaning adjacent, reachable, what's within reach, what can you access without effort? Copy paste ship. Simple is about structure, easy is about proximity. The thing is, we can't make something simple by wishing it so. Simplicity requires thought, design, and untangling, but we can always make something easier. You just put it closer, install a package, generate it with AI, you know, copy a solution off a stack Overflow. It's it's human nature to take the easy path. We're wired for it. You know, as I said, copy something from Stack Overflow, it's right there. Framework that handles everything for you with magic, install and go. But easy doesn't mean simple. Easy means you can add to your system quickly, simple means you can understand the work that you've done. Every time we choose easy, we're choosing speed now, complexity later. And honestly, that trade-off really used to work. The complexity accumulated in our codebase is slowly enough that we can refactor, rethink, and rebuild when needed. I think AI has destroyed that balance, because it's the ultimate easy button. It makes the easy path so frictionless that we don't even consider the simple one anymore. Why think about architecture when code appears instantly? So let me show you how this happens, how a simple task evolves into a mess of complexity through a conversational interface that we've all come to love. You know, this is a contrived example, but, you know, say we have our app, we want to add some authentication to it. say add off, so we get a nice clean off.js file. Iterate on a few times, it gives it message five, you're like, okay, cool, we're going to add off now too. Because now we've got an offjs and offjs, we keep iterating, you know, we find ourselves that sessions are broken and we got a bunch of conflicts. And by the time we get to turn 20, you're not really having a discussion anymore, you're managing context that's become so complex that even you don't remember all the constraints that you've added to it. Dead code from abandoned approaches, uh, test that got fixed by just making them work. You know, fragments of three different solutions because you keep saying, wait, actually, each new instruction is overriding architectural patterns. We said make the off work here, it did. We said fix this error, it did. There's no resistance to bad architectural decisions. The code just more to satisfy your latest request. Each interaction is choosing easy over simple, and easy always means more complexity. We know better, but when the easy path is just this easy, we take it and complexity is going to compound until it's too late.

[7:29]AI really takes easy to its logical extreme. Decide what you want, get code instantly. But here's the danger in that. The generated code treats every pattern in your codebase the same. Yeah, when an agent analyzes your codebase, every line becomes a pattern to preserve. The authentication check on line 47, that's a pattern. That weird GRPC code that's acting like graphQL that I may have added in 2019, that's also a pattern. Technical debt doesn't register as debt, it's just more code. The real problem here is complexity. I know I've been saying that word a bunch in this talk without really defining it, but the best way to think about it is it's the opposite of simplicity. It just means intertwined, and when things are complex, everything touches everything else. You can't change one thing without affecting 10 others. So here's a real example from uh some work we're doing at Netflix. I have a system that has a an abstraction layer sitting between our old authorization code we wrote, say five or so years ago, and a new centralized off system. We didn't have time to rebuild our whole app, so we just kind of put a shim in between. So, now we have AI, this is a great opportunity to refactor our code to use the new system directly. Seems like a simple request, right? Uh, no. It's like the old code was just so tightly coupled to its authorization patterns, like we had permission checks woven through business logic, role assumptions backed into data models, and authentication calls scattered across hundreds of files. The agent would start refactoring, get a few files in and hit a dependency it couldn't untangle and just spiral out of control or give up. Or worse, it would try and preserve some existing logic that from the old system and recreating it using the new system, which I think is not great too. The thing is, it couldn't see the seams. It couldn't identify where business logic ended and off logic began. Everything was so intertwined that even with perfect information, the AI couldn't find a clean path through. When your accidental complexity gets this tangled, AI's not the best help to actually make it any better. I found it only adds more layers on top. So, how do you actually do it? How do you separate the accidental and essential complexity when you're staring at a huge codebase? Codebase I work on Netflix has around a million lines of Java, and the main service in it is about 5 million tokens last time I checked. No context window I've access to, uh, can hold it. So, when I wanted to work with it, I first thought, hey, maybe I could just copy large swaths of this codebase into the into the context and see if the patterns were emerged. See if it would just be able to figure out what's happening. And just like the authorization refactor from previously, it's the output just got lost in its own complexity. So, with this, I was forced to do something different. I had to select what's to include design docs, architecture diagrams, key interfaces, you name it. And take time writing out the requirements of how components should interact and what patterns to follow. See, I was writing a spec. Uh, 5 million tokens became 2,000 words of specification. And then to take it even further, take that spec and create an exact step set of steps of code to execute. No vague instructions, just a precise sequence of operations. I found this produced much cleaner and more focused code that I could understand. I said, I defined it first and planned its own execution. This became the approach which I call context compression a while ago, but you call it context engineering or spec driven development, whatever you want. The name doesn't matter. What only matters here is that thinking and planning become a majority of the work. So let me walk you through how this works in practice. So we have step one, phase one research. You know, I go and feed everything to it up front. Architecture diagrams, documentation, slack threads, I mean, been over this a bunch, but really just bring as much context as you can that's going to be relevant to the changes you're making. And then, use the agent to analyze the codebase and map out the components and dependencies. This shouldn't be a one-shot process. I like to probe, say like, what about the caching? How does this handle failures? And when it's analysis is wrong, I'll correct it. And if it's missing context, I'll provide it. Each iteration refines its analysis. The output here is a single research document. Here's what exists, here's what connects to what, and here's what your change will affect. Hours of exploration are compressed into minutes of reading. I know Dex mentioned that this morning, but the human checkpoint here is critical. This is where you validate the analysis against reality. It's the highest-leverage moment in the entire process. Catch errors here, prevent disasters later. On to phase two. Now that you have some valid research in hand, we create a detailed implementation plan. Real code structure, function signatures, type definitions, data flow. You want this to be so any developer can follow it. I I kind of like it to paint by numbers. You should be able to hand it to your most junior engineer and say, go do this, and if they copy it line by line, it should just work. This step is where we make a lot of the important architectural decisions. You know, make sure complex logic is correct. Make sure business requirements are, you know, following good practice. Make sure there's good service boundaries, clean separation in preventing any unnecessary coupling. We spot the problems before they happen because we've lived through them. A hasn't doesn't have that option. It treats every pattern as a requirement. The real magic in this step is the review speed. We can validate this plan in minutes and know exactly what's going to be built. And in order to keep up with the speed of speed at which we want to generate code, we need to build to comprehend what we're doing just as fast. Lastly, we have implementation, and now that we have a clear plan and like backed by clear research, this phase should be pretty simple. And that's the point. You know, when AI has a clear specification to follow, the context remains clean and focused. We've prevented the complexity spiral of long conversations, and instead of 50 messages of evolutionary code, we have three focused outputs, each validated before proceeding. No abandoned approaches, no conflicting patterns, no weight, actually moments that leave dead code everywhere. To me, when I see this, the real payoff of this is that you can use a background agent to do a lot of this work. Because you've done all the thinking in hard work ahead of time, it can just start the implementation, you can go work on something else and come back to review. And you can review this quickly because you're just verifying that it's conforming to your plan, not trying to understand if anything got invented. The thing here is, we're not using AI to think for us. We're using it to accelerate the mechanical parts while maintaining our ability to understand it. Research is faster, planning is more thorough, and the implementation is cleaner. The thinking, the synthesis and the judgment, though, that remains with us. So, remember that authorization refactor that AI couldn't handle. The thing is now, we're actually, you know, working on it now and starting to make some good progress on it. The thing is, it's not because we found better prompts. We found we couldn't even jump into doing any sort of research, planning and implementation. We actually had to go make this change ourselves by hand, no AI, just reading the code, understanding the dependencies and making changes to see what broke. That manual migration was, I'll be honest, it was a pain, but it was crucial. It revealed all the hidden constraints and invariants, which of variants had to hold true, and which services would break if the auth changed. Things no amount of code analysis would have surfaced for us. And then, we fed that pull request of the actual manual migration into our research process and had it use that as the seed for any sort of research going forward. The AI could then see what a clean migration looks like. The thing is, each of these entities are slightly different, so we have to go and interrogate it and say, hey, what we about to do about this? Some things are encrypted, some things are not. We had to provide that extra context each time through a bunch of iteration. Then, and only then, we could generate a plan that might work in one shot. And the key word in might's the keyword here, is we're still validating, still adjusting, and still discovering edge cases. The three-phase approach is not magic. It only works because we did this one migration in my hand. We had to earn the understanding before we can encode into our process. I still think there's no silver bullet. I don't think there's better prompts, better models, or even writing better specs. Just the work of understanding your system deeply enough that you can make changes to it safely. So why go through with all this? Like why not just iterate with AI until it works? Like eventually won't models get strong enough and it just works? The thing to me is, it works isn't enough. There's a difference between code that passes test and code that survives in production. Between systems that function today and systems that can be changed by someone else in the future. The real problem here is a knowledge gap. When AI can generate thousands of lines of code in seconds, understanding it could take you hours, maybe days if it's complex. Who knows? Maybe never if it's really that tangled. And here's something that I don't think many people are even talking about this point. Every time we skip thinking to keep up with generation speed, we're not just adding code that we don't understand. We're losing our ability to recognize problems. That instinct that says, hey, this is getting complex, it atrophies when you don't understand your own system. Pattern recognition comes from experience. When I spot a dangerous architecture, it's because I'm the one up at 3:00 in the morning dealing with it. When I push for simpler solutions, it's because I've had to maintain the alternative from someone else. AI generates what you ask it for. It doesn't encode lessons from past failures. The three-phase approach bridges this gap. It compresses understanding into artifacts we can review at the speed of generation. Without it, we're just accumulating complexity faster than we can comprehend it. AI changes everything about how we write code, but honestly, I don't think it changes anything about why software itself fails. Every generation of space has faced their own software crisis. Dijkstra's generation faced it by creating the discipline of software engineering. And now we face ours with infinite code generation. I don't think the solution is another tool or methodology. It's remembering what we've always known, that software is a human endeavor. The hard part was never typing the code, it was knowing what to type in the first place. The developers who thrive won't just be the ones who generate the most code, but they'll be the ones who understand what they're building, who can still see the seams, who can recognize that they're solving the wrong problem. That's still us. That will only be us.

[20:01]I want to leave on a question, and I don't think the question is whether or not we will use AI. That's a foregone conclusion. The ship has already sailed. To me, the question is going to be, whether we will still understand our own systems when AI is writing most of our code. Thank you.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript