[0:00]I think the expectation is that demand and amount of tokens generated for the enterprise will completely jump once you are not bound anymore by humans asking questions or reading them. As soon as you have enough trust to have agents running in the background, you're not really limited by the number of tokens. The term we use is control. The software stack once deployed is in the hands of our customers, they own the model changes that we make. And I think it's really important as a customer to consider that your expertise and what makes your company valuable stays yours. Hi, I'm Matt Turck. Welcome back to the Mad podcast. Today we have a special episode with Timothee Lacroix, the CTO and co-founder of Mistral, the company that proved that you could build frontier models with a fraction of the compute of the US giants. But recently Mistral has quietly evolved into a much more ambitious full stack industrial power, building not just the models, but the platform, the deployment stack, and their own massive supercomputing clusters. We covered a lot of ground in this one. The engineering behind Mistral 3, what sovereign AI actually means in practice, and Tim's contrarian view on why trust matters more than autonomy for agents. If you're tired of the AI hype, Tim is refreshingly no nonsense. Please enjoy this great conversation with Timothee Lacroix. Hey Timothée, welcome. Hi. So, as I was prepping for this, I was struck by how much has been going on at Mistral over the last few months. I think most people probably know Mistral as a provider of open source models. It seems that you guys evolved from an AI lab to more of a full stack solution focused on enterprise and sovereign customers. So just to set it up, in the last year, you guys raised a 1.7 billion euros series C led by ASM L at an 11.7 billion post money valuation. You launched a bunch of models, which we're going to talk about is the big vision behind all of this, that enterprises and sovereign states are going to need their own AI infrastructure, and Mistral is going to be the provider. So the big vision has been evolving and as you stated, we started as a company that built models, because with Arthur and Guillaume this was what we knew how to do at the start. The premise on which we built Mistral AI was immediately solving for enterprise needs and we started with open weights model. After this, and working with enterprise, we realized the need for basically the rest of the stack. So we built the serving platform because infrastructure was needed, and then all of the tooling around it was also something that we saw was missing. More than the tooling, it also requires a lot of work and expertise still to get deep into an enterprise workflows and really help that transformation. And so we built that FD function, and more recently, with Mistral Compute, we're going a bit lower in the stack as well. So we've done all of this, because it was required for enterprise success, while still continuing on our models journey. All of this stack, being modular is really important to us as it gives full control to enterprise and our clients, as to which part of this stack they decide to own and control, which is maybe more involved, or that they decide to have serverless or basically this modularity that we like. All right. So let's take some of those modular components in order. Let's start with Mistral compute. So that was a big announcement I guess in June of 2025, including a big partnership with Nvidia to help with this effort. What what's the current status? Is that live yet? Are you building it? You know how does one go about building data centers or or leveraging data centers in in Europe? Maybe first to go into the reasons why we decided to start building our own data centers. We tried a lot of different partners over the years and we realized that our use of the AI compute for large scale training was not necessarily well understood by a lot of providers and our need for stability especially. Like when you run inference on a few GPUs or when you run small scale trainings on 100s of GPUs, margin for error is a lot larger than when you run trainings on thousands of GPUs at the same time. And so to address this need for stability, we saw a way for us to basically build our own data centers and maintain it with our understanding of what quality looks like. And so that was why we launched Mistral Compute and when we decided to do it, we also realized, well, maybe others will benefit from it. We launched into basically development than what was previously intended, and so this was announced in June as you said. Since then, the building of the facility has progressed quite well. It's in the south of Paris, and we are right now running through the stabilization stabilization of the first trench. So it's quite a large data center, so delivery doesn't happen in one day. And the first part of this data center is something that we are working on as we speak. We have a few jobs running and we're fine tuning basically all of the last things to run at speed and with the right stability. Okay, great. And did I understand correctly? It's going to be for your customers and your own needs around training, but also you'll be providing it as a service to others in in Europe and beyond? Yeah, exactly. So we will use part of that capacity for ourselves as one of our training clusters, but we will also provide a managed Kubernetes and managed Slurm stack on top. Okay. Any lessons learned so far? I mean, as as you said, you guys come from a very deep background in AI and AI research. It's a whole different thing to build a whole like data center facility. How have you gone about it and what what are some things that that surprised you and any lessons so far? As most new experience as a founder, I relied on the knowledge of others. And so I was lucky to have a very a few seasoned HPC experts, and a lot of cloud software experts as well to build that solution. For me personally, and it's one of the things I love about my position at Mistral is that I get to discover so many new things, and so many new problems I hadn't thought possible. Having to learn to like all of the different parts of building a data center, all of the different trades that you have to coordinate, all of the potential synchronization between all of the different trades.
[7:12]I mean it's a huge building, it involves hundreds of people working on it. You have this, then when you stand up the thing, you have to question what works, you have to filter through the blades that are faulty. It's just an entire new area of work where I get to see experts in their field, go through things and try to explain to me what their daily work is. It's always fascinating to see an expert in his field like do something that you don't know how to do. I think the logistics of it, and the timelines are also quite different from what I'm usually dealing with in software and research. For new capacity to be built, you have to plan around having energy available, you have to plan for the space to be available and on time. And so it's a lot more long-term planning than a few software features. How do you guys go about power, since you mentioned energy? In what we've been doing in Europe so far hasn't been a huge blocker, although there is constraint. I think the grid in various parts of Europe is not necessarily easily extensible. I know it's an issue in in France. A lot of the sites are contended. So we'll we'll see how it all develops. We are lucky in Europe to have very clean and affordable energy, either with green energy in the Nordics and nuclear in France. So it's it's been relatively okay for us today. As you describe this, what what comes to mind is the gigantic amounts of money that are being invested in the US around data centers. How do you guys go about that from a financing standpoint and perhaps even more taking a step back, if you think about the race between the big AI labs globally, whether that's, you know, the Open AIs and Anthropic of the world and XI. It seems that all of them are affiliated with a gigantic pocket of money somewhere. Obviously this is in and Google to add to the list and and Meta. I'm just curious, like how where where do you guys stand on on that? You have a bunch of partnerships with SAP, Nvidia, but there is you don't have one of these gigantic companies on your cap table. So how do you how do you think about competing in that general context? So with those companies, so the hyper scalers, it's there are two parts to the game and we've played the partnership part quite well with them. And we're integrated within Google's Vertex, Amazon Bedrock, and Azure AI Studio, and that is the choice that we've made. In term of having access to gigantic pockets of moneys, we've been focused on efficiency from the start, and I think we've done quite well at building models that are competitive with the investments that we've put in. For us, it's important to build the company as efficiently as we can. And I deeply believe that with the capabilities that we have today in the models, there is so much to be unlocked in enterprise. that I I don't think my main focus today would be into going into the gigawatts of power. We still need to build so much with our clients and unlock so much values with the capacities that we have. All right. So let's go into the enterprise reality of all of this. Um, so if I'm an enterprise, if I'm a sovereign, and I want to deploy a Mistral open source models, what is it that I do these days with everything that you that you've built? The way we work with, um, enterprise, I mean, as you mentioned, like we have a few of our models that are open source and Apache, and all of our clients are welcome to use them as they need. What we have seen in terms of success is that given the current stack, it still requires a lot of expertise to manage to come to actual value and and things that go to production, basically.
[11:25]The way we interact is that we usually stand up our Mistral AI Studio, which is our platform, and we can deploy all of our stack on the client's choice of deployment methods. So it can be on prem, it can be on their VPC, it can be on in several places. The reason we do this is that it lets clients build where their data is, and without having to shuffle things around, which, as I've learned as a CTO, is something that you don't want to do ever. Because it asks, it raises a lot of questions and it's quite a stressful thing to do. So once this is deployed, we then work with the business units to understand where their pain points are. Sometimes it's knowledge management, and I think it's the most well known use case from the output, from the outside of the enterprise world. But it's also around automating core workflows for the enterprise. It's, you know, some tooling that you wouldn't expect. Where one thing that we've done is around code modernization, where you turn a bunch of Excel sheets into an actual like Python app. And if you have many, many of those sheets, then potentially you want to use AI for this. So once the infrastructure is built, then we basically look for what's the most valuable to the customer, and we start accruing value inside a stack of AI assets that then accelerates all of the other developments with that customer. And is part of the idea that you do actual model work at the customer and for the customers in particular fine tuning? Yes, we we customize in various ways. So we have done continued pre-training, and this is most useful when you want to change the capabilities of a model more deeply. So we've done this to sometimes change the mix of languages in a model to get something that's a lot better at Frenchization languages, for example.
[13:34]Or you could have require this if your internal data, which doesn't happen on the public web, is something that's so new that you need a large amount of token to get a model that understands it and becomes fluent with it. So we do these kinds of continued pre-training. Fine tuning we also like and this is more for an efficiency reason. When you get to smaller models, you have to make tradeoffs. The models won't be as good in their knowledge of the world. And so when you lose a lot of things, you have to focus on what you really care about. And so this is typically important if you want really fast, really cheap models that will be really good at a specific task. It's also useful if you want models that run on the edge, that get very, very tiny. And so for all of these, fine tuning is a tool of choice. Another reason to do fine tuning, it can be to adapt to data that's not necessarily massive, but that's also not available on the web. So typically in coding, what happens is that you will have massive code bases, sometimes accrued over decades, that the model will need to be able to work with, in terms of having like vibe deployed on it, typically.
[14:57]And so being able to come in, not move the code base and learn an actual coding agent for that code base is really powerful as well. And who does the all of this? You have evolved towards an FD model? So we have indeed a large FD section. It's it's a mix of software and FD. And we split our FDs into what we called AI engineers and applied scientists.
[15:26]And so applied scientists will tend to use the tools that we've just talked about. So fine tuning, continued pre-training and the likes. Where AI engineers will focus more on adaptation to the enterprise environment and figuring out what workflows to automate and all of this. They work with the customers to make sure that the use cases are indeed providing values and going to production. But it's also a fantastic way for us to understand what matters in an enterprise context and be faster at building the right platform. And again those customers are the kind of customer for whom customization and privacy is essential. How do you how do you position again open AIs and Thropic of the world that are going very hard at the enterprise? Is that data sovereignty? Is that customization? The term we use is control. The value that we see is both in our expertise and the software stack that we provide. The software stack once deployed is in the hands of our customers and they can change it, they can add to it, they own the model changes that we make. And I think it's really important as a customer to consider that your expertise and what makes your company valuable stays yours. And so in working with us and building, because it it takes effort to build an AI advantage today. And so having this effort built into something that you own is I think a choice that makes sense.
[17:00]Great. Where does the edge fit in all of this? There are a few reasons to go edge. Uh first there are some regions where it's more convenient to uh be able to work without internet, and there are also a lot of capabilities that don't necessarily require uh a huge model.
[17:46]So if you just need something that goes uh voice to action on any device, uh today with uh typically the Voxol models that we develop, this is doable.
[18:07]Again, an area where the more focused your use case is, the smaller you can make the model through fine tuning, or uh through just distillation in a in an even smaller architecture. I think voice to uh action is going to be a big use case. I think it will simplify a lot uh the current stocks uh for these types of things. There is also some privacy things, uh where you could imagine all of the context consolidation stays on your personal device. And for most things, uh you can deal with a small model that answers a lot of your questions and then you potentially can gate what goes out to uh another like cloud based models. I myself take the train a lot. Uh I like having coding assistance, uh having the dev stroll run on my laptop while I code on the train is uh comfortable, despite the bad Wi-Fi.
[19:46]And uh presumably there are some uh defense use cases as well. So you you guys do quite a bit of defense work, as I understand it, with France, with with Germany. I think you mentioned a partnership with Helsing. Is AI on drones and the kind of stuff? Is that is that a reality? Reality. It's uh it's something that we work on. Yes, we have a robotics division that works with these uh partners. Having a very well defined use cases makes us able to really take the model down to uh lighter uh types of sizes. And it's of course uh use cases where control is super critical uh and you need to be um yeah, able to really validate the solution. All right. Let's switch to the model part of the discussion. In December, you guys released Mistral 3, which was a big release. Still with the MOE architecture, which is at the core of what you guys have been uh doing. You mentioned efficiency earlier in the conversation. Maybe walk us through the general thing and approach, like in a highly competitive world, uh of AI models, both in terms of uh closed source, but also very much open source and all the Chinese labs. What is it that you guys are trying to do and how do you position?
[21:03]Yes, so we've released Mistral Large 3, which is an MOE. MOEs are really nice systems to train, because of the lower amount of FLOPS, which makes us able to push performances a lot more during training. They are not necessarily the best format for uh on prem deployments, because as of today, uh if you want to get the best efficiency out of uh a mixture of experts model, you require a lot of volume. Because you're looking at deployments across dozens of GPUs usually, and to justify that amount of GPUs, uh you need to have the right throughput.
[21:44]We are training large MOEs to get the best performance with the most efficiency during training. We are also continuing to train dense models at other scales, because depending on the environment in which our clients want to deploy, this might be the more uh cost efficient solution.
[22:29]I think both architectures are still valuable. On edge as well, uh sometimes you just don't have the RAM capacity uh to deploy something like a sparse mixture of experts. And so going dense is helpful there as well. But yeah, definitely for training, mixture of experts and their lower FLOPS are very interesting.
[22:56]What is the ultimate goal of the model effort? I mean clearly you guys are a frontier AI lab, but um, are you trying to create the best models and and solve AGI or are you trying to be the best open source model compared to the Chinese labs or you know whatever open source eventually comes out of the US? What is it that you're trying to do? We're trying to get the best models that we can and the model that's most useful for uh the use cases that we cover uh in enterprise. And so typically with the rise of uh agentic uh behavior, one thing that's very important is how you deal with uh various contexts, how you deal with various um documents uh being added to the input. And so having the capabilities to do architecture iterations, really trying new things in terms of model training, is critical. Um so we're pushing the boundaries of what the current models can do with uh the compute capacity that we have, but we're also trying to focus on the things that are is most annoying in our deployments today. And so one of the consideration that has been solved with a few earnest uh tricks is the context of uh those agentic systems. So it's visible typically in vibe coding, but it's uh definitely uh applicable to a lot of other use cases, where through all of the tool calls, you'll have to consolidate uh and summarize the context to be able to fit everything and uh have the model focus on the right parts. To me, this is just an artifact of the current architectures. Uh we're trying to fit uh things in a linear context windows where essentially the questions that we're asking aren't really necessary or linear. And so we rely today on the file system for this, and that I think that was the big change in uh and realization through vibe coding is that agents are good enough at manipulating file systems. That they can use this as a replacement for uh their context window, basically. Uh they can select parts of what they want to read, they can select parts of the tool results. Uh and this minimizes uh the context length requirement. This is the say today, I think we can do much better, and I think there is a lot of improvements to be done on those types of questions. Do your agents run on sandboxes? It depends on the types of agents, uh but the answer would be yes. If it's uh if it's coding agents, usually uh we have uh sandboxes that will let the agent iterate uh and run. I think the depth of the uh isolation will depend on the use case. Uh typically, if the file system is just representing textual context, and you're not expecting the agent to do much action on it, then you don't really need a full sandbox. You just need some representation of that context as a file system, and it can be any sort of abstraction. But if you are, I don't know, typically running asynchronous code development, then yes, you need a sandbox. Great. What is the current constraint uh that uh you guys are facing to make Mistral 4, when it eventually comes out, uh do much, much better than Mistral 3? Is that a question of Mistral compute or is it a question of data and uh in particular, are you guys doing anything around synthetic data that you can talk about? Definitely compute and uh the current deployment that we have will help, uh as it's going to be giving us a lot more race black wall capacity than we had in the past. And so that's uh something that we're very excited about. And when you add compute, you also have to add data. And so we've been hard at work, uh making sure that our uh data mixtures are uh as high quality as ever and growing in size. But as you mentioned, one of the ways to do this is through synthetic synthetic data. In terms of uh where we use synthetic data the most, I think a lot of the interesting work that's happening is for the post training part, where we can uh build environments uh that look similar to uh an enterprise. And then uh try to uh synthetically create queries that are hard and that will require multiple hops. And so all of this work, um is in addition to the coding work, the reasoning work is really what makes the final model able to perform uh in the various uh environments that we work in. So before, it was about accruing world knowledge and the web helps a lot with this.
[27:56]Now, it's more and more about acquiring know-how. Uh and for this, uh it's really about trying to find what are uh customers are trying to do, trying to replicate it inside of our training environment, and uh let the the model run basically. I think it's been quite uh impressively for coding. And it should be something that happens a lot more widely. Um so fast uh time to success, uh larger and larger uh use cases being built and really democratization uh of building tools with AI in enterprise. I think this is really what I target for our customers. Uh it should be easy uh and most people should be able to accelerate themselves through the use of AI. I think we've seen this happen quite impressively for coding, and it should be something that happens a lot more widely. I was uh struck uh throughout this uh conversation by how pragmatic uh you you are and and focused on precise goals around enterprise success. What do you make uh of the whole, you know, rush to AGI conversation and people being AGI pilled in San Francisco and other places? Is that is that something that you see happening, or does that to some extent not matter from your perspective? I mean, it it matters because the the better your systems are, the more impressive things you'll be able to do and it'll become easier and easier. Requirements I see for control and governance in enterprise, make me think that even if I had some AGI ask model on my servers right now. If I were to go into a large bank and say, here is a thing, please let it control everything for you, they wouldn't be happy to let it do it. And so I think building the infrastructure properly is uh quite key to following the progress of these models and really being able to quickly unleash all of their capabilities. So to me, it's it's two directions that are necessary. You need to improve the capabilities of the model, and it's super exciting to do so. But the journey of making it trivial and easy for everyone to unleash those models on your enterprise workflows, uh without really wondering what's going to happen is equally important. And honestly, super uh super fun as well to develop, there are lots of super interesting questions. Wonderful. Well, Timothée, thank you so much for uh doing this uh deep dive on Mistral with us. It's been fascinating. Congratulations on everything that you've built again in this very short period of time. Uh and excited for what's uh coming next. So thank you for spending time with us. Thanks, it was a pleasure.



