TubeScript Get a Transcript

Thumbnail for Stop Struggling with CUDA: How Ubuntu 26.04 is Fixing AI Development Forever by AI Native Dev

Stop Struggling with CUDA: How Ubuntu 26.04 is Fixing AI Development Forever

AI Native Dev

23m 39s5,758 words~29 min read

Auto-Generated

Watch on YouTube

Share

[0:01]I'm here not to tell you that Canonical is pivoting to becoming an AI company, but to tell you that it's been here all along. Ubuntu powers the majority of today's AI workloads.

[0:21]Unfortunately, I think I am going to contribute to the AI Native Dev ICT because I have dark slides and light slides, so apologies for that. As Alan said, I'm John, I'm the VP of engineering for Ubuntu. I've been at Canonical for five years. In that time I've mostly worked on cloud native orchestration type tools. Um, but in the last year I've been working on this little open source project that some of you might have heard of called Ubuntu. Um, and I'm here not to tell you that Canonical is pivoting to becoming an AI company. I don't think that's the case. It's certainly not in my immediate roadmap, but to tell you that it's been here all along in a sense, like Ubuntu has all has been here for the last 21 years, 22 years. Um, and I suspect it will be here for the next 20 years if we don't make any huge mistakes in the coming probably three to four years. Um, and we are in the privileged position where I can say this today, which is that Ubuntu powers the majority of today's AI workloads. Um, and you might ask why is that? Some of it is because the wonderful agents that we have today, when you ask them how you do something, are really good at saying, on Ubuntu, sometimes on Ubuntu or Debian, type this command. Um, and I think that is a representation of the fact that for 20 years the Orange Linux has been the one that people reach for. That is a combination of exceptional strategy from the previous people at Canonical and a healthy dollop of luck. Um, but when you launch cloud instances today, most of you probably don't care what kind of Linux you get, but it's probably our one in reality. It doesn't matter whether you're on Google Cloud, whether you're on Amazon, whether you're on Digital Ocean, or Voltaire or Hetzner or whatever you like, if you launch a VM, there's a good chance it's going to be Ubuntu. Um, whether or not we are in fact at the hallowed year of the Linux desktop, we have very much been year of the Linux server for many, many years, right? Um, but none of those things are particularly new. Like those have been facts for some time since long before I got involved with Ubuntu. Um, one of the things that I think is really fascinating in this particular era is something that no one really talks about because it's not glamorous, it's not glitzy. It doesn't get great headlines on Reddit and Foronics, where they like to tell us that our choices are wrong for choosing Rust and such. Um, but it's actually our very, very strong partner business. So we're actually a pretty small company in the enterprise Linux game. We're about 1300 people of which about 1000 engineers. When I joined five years ago, we were 550 in total, but the fact remains that Su is I think five times our size and Red Hat is 25 times our size if you ignore the big IBM shaped thing on the side of it. Right, so we are relatively little, but I think we punch well above our weight when it comes to working with the people you see on screen and more. So I, I couldn't get all of the logos on here, but there's lots of fun ones like Media Tech and uh, Rivoss and all these things. Um, and this makes a huge difference, because this means when you go down to whichever consumer electronics store you want or whichever internet business you buy your computer from and you want to play with AI stuff, you want to get working with it. You open the lid and it is actually going to work. Um, this is particularly true of Nvidia hardware in recent months. But we are expanding massively with AMD. We have always done lots of work with Intel and with things like Qualcomm Dragon Wing, which is one of their kind of edge IoT platforms. We were right there on launch day, good to go, with support in the kernel, support for all of the accelerators, right, not just the CPU and GPU, but all of the NPUs and TPUs. and sort of start PUs that they're starting to introduce. Um, and this, I think was a pretty big moment. So, um, there's this little circu company called Nvidia that some of you will have heard of. And they released an arm 64 only AI workstation called the Nvidia DGX Spark to much applause in the late part of last year. And what was different about this for us is Nvidia have shipped Ubuntu for years, but and they've, they've done that, you know, in agreement with us and it's been called DGXOS or it's been called Nvidia, something OS. But it's been Ubuntu with a bunch of stuff installed and some different kernel configurations. And what changed for this is they just ship Ubuntu and they call it Ubuntu. And it is not only they ship Ubuntu and call it Ubuntu, it is the only thing they will support on the DGX Spark. Um, now I don't have all of their product roadmap and I don't know what's coming up, but I'd take a pretty good educated guess that in the future, their other workstations. And so this is the Grace Blackwell architecture, also seem sold by Dell, and I think Lenovo and possibly HP. All of them run Ubuntu. And the reason they run Ubuntu is because they can boot Ubuntu on it and it just works. Like all of the drivers are there, the kernel behaves properly, all of the accelerators are there. Um, this was a pretty huge moment for us. But I also think it was quite a big moment for the AI development community because working in development on the operating system that is running in your cloud, I think is fairly obviously and has always been fairly obviously a bit of an advantage. Um, this is something I'm very excited about. So we do a two-year LTS release. Um, so these are the ones that realistically everybody runs, something like the late 90% of all of our users run an LTS. And they are 2204, 2404 and the one coming out in April is 2604, and for the first time, you will be able to apt install CUDA or apt install ROCm with no other commands on a base Ubuntu system and get exactly the version of ROCm that works with your version of Ubuntu and your GPU with no messing about. And having personally suffered through this with a collection of quite high-end AMD machines and various versions of ROCm and the driver and various other things. Um, and I'm told it is equally as painful in Cuda land. Uh, I think this is kind of a big deal. And it also speaks to that cloud story as well. In reality, all of the clouds are also running these big shiny green GPUs in their data centers. And not investing engineering time in trying to work out how to get the latest version of CUDA, security maintained for 15 years, which is our kind of promise. I think is a a pretty big thing, and it's, it's particularly big for developers getting started in this industry. Lots of you will already know how to wrangle CUDA into shape and make ROCm work with Pytorch, good for you. If you know, if any ideas do please let me know. But I think it's going to get a little a lot easier because of this. It's just going to be out of the box essentially. Um, having said that we're not an AI company, because I don't think we are in quotes, an AI company. Uh, we have released something relatively new, which is distinctly kind of an AI product. Not a product in the sense that we are monetizing it and selling it to you. A product in the sense that you can go and get it and use it, it's open source and we built it. Um, I think this is really interesting for the hobbyist, the tinkerer, the developer at the moment, but I think it has huge potential to go further than that. So, in the world of Linux packaging and the like microcosm of internet culture that that is, um, there are these things called snaps. Most people in the world don't care that there is a thing called snaps. It is a packaging format. It is a packaging format we invented. And it has some interesting properties in many, many applications on the internet and in computing, but it had those exact same properties make it especially interesting, I think in the world of AI. So snaps are a confined package format, unlike a deb where you up to install, or an RPM package where youDNF install or whatever it is, a snap is a, a little bit more like a Docker container. It is a compressed file system, full of an application and all of its dependencies, but critically, it runs in a security confined environment. So we use the App Armor Linux security module, which is a bit like SE Linux for those of you that have been in the Linux space. And that means that you can install a snap with all kinds of scary stuff in it and run it and not worry about it doing wild things to your machine. Does this sound familiar in the AI space?

[8:30]Um, so we are we started distributing these things called Inference snaps, and this is a bit of a play towards how do we make this stuff easy for people to get, get working with. Like at the moment, it's all very fun to be at the very cutting edge playing on hugging face and looking at the rankings and trying to work out which model is going to fit on your GPU. And do I want LMA CPP or do I not want LMA CPP? Like, the 90% AI engineer, if not now in the next six months, is going to be like, give me the model. Like I've heard of Gemma, I've heard of Nima Tron, I've heard of Gwen, how do I play with it without getting a PhD in hugging face, right? Um, so that's what we've done. Inference snaps are high quality, silicon optimized AI models for everybody. And this goes a bit back to my previous statement. The key here is they are actually optimized by the silicon company. Um, so we are partnering with AMD and Nvidia and Intel and whoever, I mean, when we first started talking about this, no one was really interested in when we launched the first one, everyone started kicking our door down. So, we are sort of like slowly rolling out more models, but the gist of it is you can on any Ubuntu machine in the world, type snap install Gwen VL, snap install Deepseek R1, snap install Gemma 3, Nemotron 3 Nano. And what you will get is a working model that has been optimized for the hardware on your machine by the people who built the silicon, delivered by us and maintained by us. Um, and it doesn't just come with the model itself, because again, what do you do if you just download the model in its bare form? Not super interesting. It comes in a silicon optimized form with the right inference engine for that model according to the manufacturer, right? Um, and it gives you this super nice onboarding if you just want to play with local AI models, you want to hook them up to O Lama, you want to hook them up to like continue, whatever it is you might want to do. Um, this is the kind of like gory guts of it diagram. Um, there's a lot going on here, but it's actually relatively simple. So the snap, the big orange box is the squash FS file system inside which is stuff. Right, and we have this thing called an engine manager, which is shared between all of our inference snaps. The engine manager has some nice capabilities like understanding what a machine is, uh, what API level your card supports, that kind of thing. Um, we have engine manifests, which describe the different kinds of things you might be, like, you're an AMD GPU, you're an Nvidia capable, sorry, a CUDA capable Nvidia GPU. And then we have the inference engine, which is kind of swappable, could be Lama CPP, could be something else, and the runtime and the model. Um, and all of that is hooked up kind of automatically when you install it through snap, through the snap confinement, so it kind of punches just enough hole through the confinement to speak to the kernel and get what it needs about the machine. But the outcome is this. You can snap install Gwen VL, and then you can Gwen VL chat, and in your terminal, wouldn't be a Linux talk without some terminal chat, right? In your terminal, you can chat with the model. Um, we've got four of these at the moment.

[11:54]Um, this is the least interesting thing you could do with it. It is also every single one of them comes with an open AI API spec compatible endpoint, right? Just running on localhost on a port. Each of them comes on a distinct port, so you can co-install them. Some of them can be running one engine and runtime, some of them can be running another. You can have if you've got a big hunky machine with a bunch of GPUs in it. You can have one running on your ROCm capable AMD card, you could have one running on your you know, CUDA capable Nvidia card. That's not a problem. And basically anything that can speak the open AI API spec will be able to speak to this thing.

[12:41]Um, all of the and I'll do a little demo of this at the end. All of the examples on here are talking to localhost, but there is nothing to stop you slapping a Caddy proxy in front of this and sticking it on a cloud machine with a big honking H100 in it or something like that and using it. Right. So this, assuming you have access to cloud machinery that has access to Ubuntu, which is I think everybody, then you can make use of this immediately. But it's particularly fun to play with on a machine that has some capability. Unfortunately when I do the demo, I'm talking to a thing that has absolutely no acceleration whatsoever, so it's real slow, but I promise you it's cool on a machine that's a little bit quicker than my laptop. Um, the other thing, uh, that I think as the de facto operating system for a bunch of engineers is sandboxing agents. Um, I am probably preaching to the choir in here when I say that they were, that agents have clearly been a big step up in the capability. Like that is the lived experience for lots of people I speak to in my job, but also my own experience. It went from, hey, there's this cool bot I can talk to in my browser to, okay, I actually have a fleet of robots that can half do some of the boring stuff and actually some of the more interesting stuff for me. Um, I don't think it's difficult to imagine, I, I imagine lots of you in a position where it's kind of difficult to imagine going back to doing certain classes of work without an agent. Like you're sitting there going, why was I ever doing this myself? Um, you're probably also reading the horror stories on Reddit, uh, sort of laughing and going, haha, that user did something silly and deleted their home directory because they're idiots. And also in the back of your mind going, what if it deletes my home directory? Um, I personally not had any of those big fails, but I did have a fun one recently where a set of parallel agents were set off by Claude. Uh, which as part of a build process or something else, decided to build five copies of Node.js from source, completely exhausted the memory on my cloud server, which resulted in tail scale being um killed and me no longer being able to talk to it. Not catastrophic data loss, but kind of annoying.

[14:55]And so lots of the agents are responding to this by telling, telling you, you can run slash sandbox and it's going to be fine. It might be fine. like your confinement is a tough topic. It is something that we have been doing for years with our snap packages for years with App Armor for years with virtual machines and micro VMs. And chucking a big hunky node JS code base on someone's machine and wrapping it with bubble wrap with a thousand exceptions, not that handy. Right. Like it, it prevents some failures, and that's why the vast majority of you probably haven't had an epic fail and live the like YOLO life dangerously skip, whatever it is. uh, and are mostly okay. But I'm sure you've all experienced it doing something you'd rather it had not have done, but you're all sensible and responsible, so you recovered from it just fine. No tears. Um, the good news is we have a bunch of stuff that is out of the box on pretty much every Ubuntu machine on the planet that just makes this pretty simple. We actually have a really cool product, which is one of the only things I have worked on in five years that is not yet open source, we'll announce it April, which I think is going to blow the doors off this thing. But it is built on top of all the things I'm going to talk about today. So one of which is Lex D. This is a product that until I worked at Canonical, I had no idea about, and about six months after I worked at Canonical, I couldn't work out why nobody was talking about it. It is in the category of something that should be boring but I think is very cool. Lex D is a clustered version of Lex C, which is a decade old piece of containerization technology in the Linux kernel. It lets you get Linux system containers, which are a bit like a Docker container, but feel a bit more like a VM. So it's a, it's a, it's a container, but it also runs system D. And so it, it, it's much, it's kind of heavier weight than a Docker container, but it still doesn't have its own kernel. Um, and it also does virtual machines. And the API is basically identical. Like Lexi launch, and then you can Lexi launch dash dash VM, but all of the other dash dash is and configures exactly the same. But if you want a situation where actually you'd prefer a kernel, another kernel in between you and the agent for a little bit more separation, you can do that. Personally, I switch between the two, depending on what projects I'm working on and the sort of code I'm writing. Sometimes I launch a VM, sometimes I launch a container, the script is identical, apart from one flag. Um, and this is broadly how I've been using claw code for the last few months. So I have a a little tiny script, it's only about six lines. And it basically creates a Lex D container, the image for which is already cached, it mounts my local working directory into the container as a bind mount. It mounts my doc code directory and a couple of dot files and starts Ubuntu and claw code, and it takes about three or four seconds from a cache. So I just typed CC here, claw code here, and I get claw code in a box, just talking to my project and it can basically run wild. It can't make commits because I have to tap my Ubi Key for that. But I can just let it do what it wants, set five of them off. You can set all the usual constraints like CPUs and memory it can use with or without a VM layer. And this is just installed. It's, it's literally ready to go on every Ubuntu machine out the box. It's just there. Um, so I'd employ you to go and take a look at it. It's, it's on all of the cloud instances that you're already hosting your stuff on, and it makes a really nice little kind of wrapper around these tools. Um, we also have multi pass. Multi pass is like the Docker desktop thing in our portfolio. You can install multi pass on your Mac, on your Windows machine, and of course on your Linux machine, you get a nice UI, and it's just the fastest way to get Ubuntu anywhere. You multi pass launch and you will be at a multi pass at an Ubuntu shell in two or three seconds, and that is a disposable machine you can just crap all over, or an agent can crap all over and then you can get rid of it. It's got all of our different versions. You can do blueprints and recipes and things like that, but it has this nice kind of Docker desktop style feel, but a bit more orange. Uh, that you can use for getting disposable Ubuntu instances that are running under QMU essentially. Um, so that's the kind of development side. We describe multi pass as a bit of a cloud sandbox, which is a nice kind of segue into, well, what about you once you finished building your shiny thing, your hundreds of thousands of lines of TypeScript or rust or go, whatever it is that your agent has been diligently coding for you. How do you launch it? How do you deploy it? Again, a question that we have been dealing with for 20 or so years of production Linux. Um, and what we at Canonical have all of these other kind of interesting things, um, which again, I think are genuinely coming into their own as we head into this era of, there's a whole bunch of like semi-autonomous things going off, and we want to, we want to use them to the best of our ability. We want them to run as fast and efficiently as possible, but we also want to have some, uh, we want to have some guarantees around that. We want to know what they can mess with. We want to know that if we set them off in production, they will continue to be patched for 15 years, even if the vendor disappears in the AI boom. Like some of the vendors that we love, lovingly talk about right now, I suspect might not be here in five years. We're at that sort of point, I think, in the AI product launch sphere. The thing that we have done and have always done, and the only reason I can stand here after 22 years of Ubuntu being a big open source project is because we have done the long-term support, security maintenance patching work that lets other people do the exciting work on Linux for years. And that applies here too. And we have this nice initiative that we've called, um, sort of like LTS anything. Um, and essentially, you can come to us with a Docker container. We come to an agreement on what that's going to cost, but we will literally security maintain that thing. Your application and all of its dependencies, even if that's thousands of Python dependencies or tens of thousands of node dependencies, we will keep patching that thing for CVAs for 15 years for you for a cost. Um, all based on Ubuntu and the work that we do for keeping the archive running and keeping all those servers going. And then we have a suite of kind of automation stuff. Cube Flow, for example, we have a one command deploy Cube Flow that works on any Kubernetes. It works on EKS, it works on AKS, it works on our Kubernetes, you can literally one shot a Cube Flow for yourself, play around with it and then destroy it and and not lose any sleep. The same for ML Flow, the same for open search, if you want a vector database, the same for postgress. And so while we don't necessarily directly pedal the newest shineest AI thing, we're not building agents, we're not building models, we're not training models. We kind of very competently take care of all the stuff that that stuff relies on in a very safe, secure, kind of stop worrying about it sort of way.

[23:33]Uh, that is pretty much it. Does anyone have any questions? Thank you so much, John. How on earth is Canonical survive for 22 years giving it all away? Um, thank you very much John. Any questions? Uh, yeah, let's dive straight over here. Hiya. Um, I was wondering, you mentioned uh, how, well, Ubuntu is very popular. Everyone knows about it. Um, do you think now with uh, just LLM's being all over the, like all over the internet, you kind of like are in a position where you will slowly just like end up consuming all of it because every LLM will always be like, oh, let's do it Ubuntu way, blah, blah, blah, blah, blah. And therefore now you get more code on the internet, also using it Ubuntu way. And I'd say with any luck, yes, but I think we need to not rely on luck. And so part of our job is to work out, like, how do we keep either how do we do deals with the AI providers to ensure that happens, or how do we keep being a default enough in other things that the other material the LLM's are trained on, they know about Ubuntu and those kind of things. Like, we're sort of lucky, not lucky that that's the situation. But I think if we're not careful, we could be, you know, a Su or a Red Hat could, could do a big drive on, you know, having an LLM's.txt on all of their endpoints and like really going for it. Uh, so I think we need to pay attention. I also think it's very easy for somebody who works at Canonical on Ubuntu to get all wound up on which Linux distribution there is. I'm quite aware that most people don't care. They're like, give me a cloud instance and it like, it is what it is. Um, again, I think because for so long that has been Ubuntu, a lot of people have got muscle memory around apt install or whatever it might be. We just need to basically keep that up. Um, in the same way that a lot of marketing teams are flapping over SEO, uh, because AI models work a bit differently, we have to think a little bit differently about how we make sure that our content is in the right places in the right forms, and is actually relevant. Like I said, we're not in quotes an AI company, but that doesn't mean we don't have something to say in an era where people, everyone is building with AI, right? So it's about how do we position it in a way that is actually authentic towards what we can actually do, what value we can actually add, because it isn't going to be competing with Nvidia to train models and build GPUs, right? Is this something that you actually have to like as a company now think about and like, oh, how do we stay relevant in the Yeah, I'd say I'd say pretty much every company wants to be having that conversation. In our case, it's about how do we make sure that LLM's keep telling people to apt install rocm. Yeah. Cool. Thanks, thanks for the great talk. Um, I'm going to ask you something controversial and I'm sure you're watching the news and Open AI is now planning to do something operating system related. We've we had browser use, computer use, soon it's going to be operating system use. Um, what is the water cooler chat in your organization and maybe you've heard whispers in other operating system worlds on this. Um, where is this going and what impact does it have? Great question. Yes. Uh, so I don't know. I mean, I wouldn't say there's a lot of water cooler chat. Like I'm certainly not sat at my seat at home at the moment going, oh my goodness, Open AI is going to out Linux us tomorrow. Um, I follow it with interest. Like I thought Claude Code building a C compiler was kind of a fun science experiment. I'm sure you all saw this one, right? And Claude Code built a browser and various other things. Um, I think what that shows is like it's a very interesting technology, they these are interesting experiments, but like building an operating system isn't just about building an operating system. Like the fun part of building an operating system in Ubuntu was done 20 years ago. Like the success has come from grinding away patching security vulnerabilities, doing usability studies, shipping the latest and greatest open source. doing things that have set the internet on fire a little bit, like we're doing this big Rust transition where we're replac replacing all the core utilities and sudo and the time sinking demon with new Rust-based alternatives, everyone's like, oh my God, what are you doing? But like it's those sorts of things that have actually meant people have kept using Ubuntu, right? Like I'm sure that Claude Code can build an OS. Like I'm absolutely sure of it. Can like, But are Anthropic going to be in the business of actually maintaining an operating system for use on definitely tens of, if not hundreds of millions of cloud instances? I don't know. Like does doesn't seem like their business model, right? Like people are still going to want security maintenance and rock solid Linux, I think. So at the moment I'm not sweating too much. Uh, thank you for the talk. So I I usually run models using Oma. So have you noticed any significant performance improvement if you use uh on your local Ubuntu optimized version, because I imagine hardware is a limiting thing over here. So half of it, I think, is just selecting the right model. Like I'm sure everyone has felt the overwhelm of like, I'll go and get something from hugging face, it's like, okay, now what? Um, so part of it is just we're just going to restrict that down. Right. Like we're just going to say, this is, you know, it's, it's of the deep seek R1 variety. And there's maybe 10 variants and it's going to pick the best one for your machine. And best one for your machine is not only the one that's actually sized for your machine, but has been tuned to some extent by the silicon vendor. Now, does that mean it's going to be perfect for your use case every time, probably not. Does it mean it's going to be better for 90% of people 90% of the time, probably, right? Like if you are doing something hyper specific in like a really, really optimized way, I'm sure it's worth you investing time in re-shooting a model and parameterizing it, quantizing it very, very specifically. But if you're like, I'd like to write some Python, I'd like to get some help from an AI model that isn't hosted in the cloud, I think we're going to be pretty helpful to you. One final question. Thank you very much. Uh, it's quite interesting talk actually. My basically have two questions essentially. And one is, you you you share the code base in your country. I can't quite hear you, sorry. Now the historically share your code base in your country. Yeah. That means give that kind of value to. Um, so I think you'll still get those kind of errors. The point is, um, how you, you can kind of choose how much access you give that container to your machine. So for me, I like everything is in git and I like the only way I can sign commits is by tapping my little UBI key. So I'm a bit like go throw git commits at my git history. It doesn't matter, do what I mean, like if even if I tell Claude not to and it does, it doesn't matter because it can't push and overwrite what I've got. So I think to a limited extent, if the agent really, really wants to delete the working directory on the thing you're working on, it's going to delete the working directory. The question is, how far do you want that blast radius to possibly be, right? So I, like, it's really to me about limiting the damage. And also about not you talked a lot about context engineering. I've noticed Claude code does this fun thing where it asks if it can read other places on your file system and then gets all bogged down in other code bases that have nothing to do with what you're working on. And so putting it in a box where it can only look at the thing that you wanted it to look at is kind of handy, just from an efficiency perspective as well, I think. And so there's like win on both sides there. Cool. Thank you so much, John. Don't worry. thank you once again, John.

MORE TRANSCRIPTS

Thumbnail for 'Gap Year' becoming trend among students by WTNH News8

'Gap Year' becoming trend among students

WTNH News8

Thumbnail for If You’re in Pain Right Now, Watch This Before You Give Up by The Mindset Mentor Podcast

If You’re in Pain Right Now, Watch This Before You Give Up

The Mindset Mentor Podcast

Thumbnail for Who is Elon Musk? by Biography Timeline

Who is Elon Musk?

Biography Timeline

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript