Thumbnail for How to build AI agents with memory by Google for Developers

How to build AI agents with memory

Google for Developers

30m 0s5,526 words~28 min read
Auto-Generated

[0:03]So, did you know that the most expensive bug in your AI agent might be its lack of memory? When your agent starts a conversation from scratch every single time, you're not just frustrating your users, but you're also paying more for your agent to learn the same facts again and again. And today, we're going to fix that. We'll use Google's Agent Development Kit and Vertex AI Memory Bank to show you the complete playbook of building agents that can also remember. And this makes your agent smarter and less expensive to run over time. In the next 20ish minutes, we'll walk through the different memory services in ADK, how memory generation, consolidation, and retrieval works, and how do you use memory in your agents. And finally, we'll show you how to customize which memory should be persisted and set up a TTL. And by the end of this video, you will know all the strategies to build agents that are smarter and less expensive to run. Today's guest is Kimberly Milam, who's also the tech lead for Vertex AI Memory Bank. But before we invite a special guest, let's do a super quick crash course on core memory concepts. I promise you, it won't be boring. So what is memory in relation to agents? Basically, it's the ability to store and retrieve information from conversations across sessions. Think of it like this, your agent has a short-term memory for the current chat and a long-term memory for everything else. The short-term memory is also called volatile memory, it's quick, it's easy, but once a conversation is over, it's gone. Great for when you're just messing around with development, but for the real world, you need persistent memory. This is like the agent's long-term knowledge base, the stuff that sticks around. And with Google's Agent Development Kit, you've got options. For short-term memory, ADK gives you the in-memory memory service and for the long-term memory, you can go the managed route by our Vertex AI Memory Bank, use other third-party solutions, or create your own custom database. Which brings us to the question, how do you actually store and retrieve memory from your agents? I have a four-step workflow for it. Step one, initialize your memory service. And step two, add your session data to your memory service. The session holds all the information for the current conversation, including the prompts, the agent's communication between another and the state. And the memory service then decides which information to persist from the session and consolidates this raw data. And step three, make your memory service accessible to your agent. And step four, use the ADK built-in tool like the preload memory tool to auto-fetch memories or use the search memory API to retrieve the context as needed. All right, enough talk. Let's get a hands-dirty with code. Off to you Kimberly. Awesome. So let's get started with Memory and AI agents with ADK. So with this tutorial, we're going to cover the ADK in-memory memory service and the Vertex AI Memory service. Memory services are essentially ADK's way of orchestrating calls to underlying memory, um storage systems like Agent Engine Memory Bank to read and write memories. Then we're going to cover generating memories using both in-memory memory service and the Vertex AI Memory Bank. Then we're going to cover retrieving and then lastly, we're going to cover how to customize your Memory Bank to your own business needs. So this could be customizing what information Memory Bank extracts from your conversation and also setting the TTL for any of your memories. So let's get started. So first I'm going to set my project and my location which are required by ADK so that it knows where to send any model requests. Now let's get started and actually use memory in our ADK agent. First, we're going to start by defining our ADK agent. And for this example, we're just going to use a simple conversational agent, but we're just instructing it to answer the user's questions. I'm not defining any tools here, but you could always plug in your own tools into the agent so that ADK can orchestrate all those calls to the tools for you. Then we're going to define a session service. In this case, I'm using the in-memory session service. And so for this session service, all of your events only persist as long as you're using your computer. As soon as you restart your computer or maybe make a request to a different VM, all of those events are lost. So we don't recommend this for production use cases. If you're using this in production, we recommend using a persistent database like Agent Engine sessions. Then we're using a runner that orchestrates the calls to your agent and to your session service. You'll notice I'm not defining a memory service here and although we'll do this later on, we want to at first orchestrate the calls to the memory service manually so that we can compare the different offerings that ADK has. So, now let's define our memory services. So all ADK memory services are based off the BaseMemoryService. And the BaseMemoryService essentially identifies a common interface that ADK can use to generate and retrieve memories. So let's look at GitHub to actually see the source code for BaseMemoryService. So you'll see there's two key methods. The first is add session to memory, which takes your session, um does some sort of logic to extract information from it and then persist it in whatever underlying database you're using. Then we have search memory, which searches over that that corpus of data to actually find memories that are relevant to your app, user ID and query.

[6:12]And the actual business logic, um depends on which session or which memory service you're using. So the logic for Memory Bank for, for instance, may differ from the logic of using an in-memory option. So first let's cover what in-memory uh service does. So to define it, we don't have to actually provide anything because there's really no setup. It's just on your computer similar to the to the in-memory session service. And looking at the underlying source code, all of your data is actually just stored in a dictionary. So it's stored on your computer, your VM, and it's keyed by the app name and the user ID and it says all of that raw conversation events. So it saves that turn-by-turn conversation, um not actually extracting any of the information or using LLMs under the hood to extract information from the content. For search memory, it uses um the query to see overlap in the in the words that you used. So for instance, if you use the word bicycle in your search query and then the memories that you're storing and in-memory memory service, it's likely that those memories will will be cut up. But again, these memories are pretty verbose. They're pretty identical to the turn-by-turn conversation that you store in your session. So now let's move on to defining the Vertex AI Memory Bank service. So this is a layer on top of Agent Engine Memory Bank. And if we go and look at the source code in GitHub for the Memory Bank service, it again uses those similar methods at session to memory and search memory. And so for add session to memory, it's actually making HTTP calls to Memory Bank. So it's a asynchronous non-blocking call. So this means that memory generation happens in the background. So just because you make a call to Memory Bank to generate memories doesn't mean that memories have yet been generated, um or that memories will be generated. The processing is actually happening in the background so that you as the client are not waiting for the response. And the information that's sent to Memory Bank is again that turn-by-turn conversation, but even though we're sending that turn-by-turn conversation to Memory Bank, Memory Bank will extract information from that conversation and won't persist everything. We're also using the scope to identify the isolation key for our memories. So scope essentially acts as a key for the memories. So memories will be consolidated for other memories that have the same scope and you'll also retrieve memories for the same scope. So when you're interacting directly with Memory Bank outside of ADK, it's very important to have the same scope as what ADK uses. Otherwise you might not be able to find the information because information is isolated for that specific scope. The other key method is search memory, which takes that app name and user ID to again build that scope key. And then it uses the query that the is of the like current user turn, to retrieve memories using similarity search. Then it takes the response from Memory Bank, um and packages it in a object that ADK uses, in this case search memory response. So this actually means that the information that you retrieve from Memory Bank might not necessarily match or it might not exactly be the same as the information that you see in search memory response. There may be more information in the response for Memory Bank than there is in the ADK response, so a lot of times if you want more transparency into what's happening under the hood, I recommend using the Agent Engine SDK directly so that you can see everything rather than just what ADK shows you. So let's go back to our notebook, so we can actually define um our Memory Bank. To get started with Memory Bank, you need to first create an Agent Engine. And an Agent Engine is kind of like an umbrella over multiple Vertex agent products, including Memory Bank. So to get started with Memory Bank, you need to first create an Agent Engine. And here, I'm using my own GCP project. But if you don't have a GCP project, you can actually get started with Vertex in Express mode, which only requires an API key. So you don't need a project, you just need a Gmail to get started. Here, I'm defining a configuration for my Memory Bank, although you can always use the default configuration that Memory Bank provides. And here, I'm just using Gemini 2.5 Flash, and I'm using that model because I want to make sure that my Memory Bank is good at following instructions because there's a lot of instructions that happen under the hood. Um to use memories and to instruct the LLM to actually extract memories that are meaningful from a conversation. So let's actually run this. And this just takes a couple of seconds to run. Um the actual longer part of this execution is the definition of the client. Um but the Agent Engine creates just takes I think like one or two seconds to get started. And that's it. I'm already set up with Memory Bank, so I can actually use it in my agent now. Before I move on, Sita, do you have any questions about getting started with Memory Bank? Not exactly questions, but to summarize what we've just seen so far, I think we've discussed what base memory service is in ADK and how the two different types of services, which is the in-memory service, and then the Vertex AI Memory Bank, which extends the base memory with these two APIs to provide their own flavors of memories, right? Yes, exactly. So they both interact with ADK in a consistent way. But the actual logic under the hood completely is different between the two services. So that's actually a great segue to digging into the logic to compare what information is persistent with in-memory memory service versus Memory Bank. So let's start by defining some simple sessions. And sessions again are the source content from which we extract data. So first, I'm going to do a chit chat session. It doesn't really have that much meaningful information. Um I'm just asking my agent to define what types of questions do you answer. And this is information that's probably not meaningful across multiple turns or multiple sessions. Um so if you just want to persist what's meaningful information, you probably don't want to persist this. And you can see it's just describing the types of questions that your agent and the LLM can answer. It's quite verbose. I really don't want all of these tokens um to be persistent in future conversations because that gets very expensive um as the number of tokens adds up and just isn't the information I need to persist. Then let's define another session. So this one has information in it, although not all of it that is meaningful. There is a lot of information in here that is meaningful. So for instance, I'm telling it I have a three-year-old niece. It um I'm telling the agent that I like the idea of a bike um for a present. And it's information that although isn't directly um like content that I might need in the future, there's elements of it that is meaningful to persist. That being said, again, the response that I'm getting from the LLM is very verbose, lots of tokens. Probably don't want those tokens in future sessions because that token usage adds up quickly. But overall, um we we mean the agent decided that for a three-year-old, a balance bike is a great option. So let's see what information is persistent by the two different memory services. So for the in-memory memory service, I'm going to peek at that dictionary that was saved. And you can see it saved all of that like very verbose information. So all the information is keyed by my app name and user ID and then it stores that turn-by-turn information.

[14:47]All of it, including both the user turns and the agent conversation. And this might be hard to action on, it's hard to use this in future conversations. It also includes a lot of extraneous information where if I'm doing full text search over this, um there might be a lot of false positives.

[15:11]But let's actually test that out to see what information is returned. So I'm going to do the classic query and see what is returned when I say test. And I guess the word test was never used in my turns, so no memories are returned. But then, let's try what should I get my mom for Mother's Day? And it returned all of that turn-by-turn conversation that again, very verbose, lots of tokens to be used, and um may not necessarily be actionable.

[15:48]And um now before I move on to Memory Service, um C, do you have any questions about the Memory Service? No, I think this was a great demonstration, um Kimberly. So what we've seen right now is you've created two different sessions, but with completely different questions and different outputs. And both are very verbose, so we've tested this so far with in-memory memory service, which actually persisted all of these verbose information and did not do any consolidation. So our token usage is high and everything is high. And when we ask a query to the in-memory service, this did not even return anything that was useful, but like looks for conversations that's already in the memory and then returns it, is that right? Exactly. So it's really that raw information, it's not condensed or any way. And actually if I asked the same information over again in the session and uploaded those new turn turns into memory, that information would then be duplicated. So that not only would I have that original high token usage, I'd have it again, um in future interactions and in future memories. Yeah, I think with that said, we still have to like give a little bit of props to the in-memory service because that is actually great for development, like when you're building out something new, when you're just testing it out. I think it's a nice little tool to use and then when you're actually maturing your service or moving into prod, then you can like seriously consider about like adding a memory layer into it. Yeah, I think the in-memory memory service is good when you want to prove with concept to basically be like, hey, is memory useful for this type of conversation? But I think as you're talking about production and talking about like higher quality memories, you do want to use a managed offering um because there's a lot of complexities that go under the hood with memory generation and retrieval. And that's actually a great segue to talking about how Memory Bank works with ADK. So with memory generation, there's two key methods that happen under the hood. Um they're automated by generate memories. The first is memory, memory extraction, where we extract memories from the conversation. And only information that matches the memory Bank definition of what is meaningful will be extracted from that conversation. And let's say that our definition of meaningful isn't per se what the your definition as a developer's um is for meaningful information. And for that, we offer customization, which I'll cover later on, where you basically define the definition of meaningful and the Memory Bank will extract information that hits your topics. Second, is consolidation, which takes that new information extracted from your conversation and sees if there's duplicative or contradictory or even complementary information in your corpus that should be combined with those new memories. So this makes sure that you don't have duplicative information. Um it also just kind of self-curate your memories over time so that they could evolve based off of the new new information or the new environment around. Um so it's a it's a it's kind of like agentic memory where not only is it a record of information, but it's curating itself as well. So let's actually move on and use it. Um so one key thing to call out here is that memory generation happens in the background. So by default, ADK will um will return a response immediately. And this doesn't mean that memory generation has completed. It also doesn't mean that memories will be extracted. Again, memories are only extracted if the information is meaningful. So I'm going to run this as soon as I get a response and you'll see that the response is empty. And that's because memories are generated in the background. So I still need to wait a few seconds um for memories to actually be extracted, consolidated and added to my corpus. The reason that this is a non-blocking function is that memory generation under the hood requires multiple LLM calls to be chained together. This could be relatively latency intensive, and you don't want your client um to be waiting for memories to be generated because generally memories aren't useful until the next turn. Um you don't need them for the current turn, unless you want to give users um like dialogue to tell them, hey, we extracted memories from your conversation. That being said, if you do want memory generation to be blocking, you can um use the Agent Engine SDK directly and just tell it that you want to wait for completion and then it will be a blocking function.

[20:43]But again, with ADK, the default behavior is that it's not non-blocking. So let's rerun this. So now you can actually see the memories that were extracted from that conversation. It is not the turn-by-turn conversation, unlike the in-memory memory service. It just extracts the ideas of I like the idea of getting a bike as a birthday present for my niece, and then also the information that I have a three-year-old niece. So that was information that was kind of like hidden in my content. Um the other thing that you'll notice is that that chit chat conversation didn't actually result in any memories because there wasn't information that was meaningful for future interactions. So you actually saw that the memory Bank service had a no-op when I tried to add it to Memory Bank. And this actually makes the retrieval step much quicker. So in the previous example, you saw that a lot of memories were returned that weren't necessarily relevant to my query. And you didn't see that here, you saw just the exact relevant information that was in the context in Memory Bank. And the second tool is the ADK Load Memory tool. And this tool acts like a standard tool. Your agent needs to decide whether to use this tool or not. The Agent decides to look up memory if the content is useful for answering the user's current query.

[22:46]So let's actually use our custom callback. So first, I'm going to set my agent.

[22:56]Now, it's also worth noting that ADK defines the default scope keys as app name and user ID.

[23:13]Since we're using a different scope than the ADK Memory Service, we don't have access to the prior memories. Memories are isolated by their scope dictionary.

[23:29]So we're going to say, hi, and it says, hello, how can I help you today? So first, let's generate memories for our new scope. So we're using that same scope that the callback's using, which is just the user ID key. You can see it created a new memory. I'm going to ask it, what information do you know about me? And it knows that I have four nieces. Um so it's able to access this new information and retrieve it using a callback. Um so that I can connect memory generation with memory retrieval. And uh Sita, before I move on to customizing behavior, do you have any questions about memory retrieval? Yeah, just a quick highlight of what you've explained so far. We've seen two tools for memory retrieval. One is the preload memory tool, which automatically gets called before an agent is called, and then adds the memory to the system instructions for the agent to use.

[24:32]And then the other way that provides a little more granularity is using the callbacks where we can choose where we want the memory to be inserted by using like a before tool callback or an after agent callback, which will point you back and then using the actual memories API service, right? Exactly. So if you're okay with basically using the default behavior that ADK provides, you can use one of the tools, um but again, you you kind of have to be okay with the behavior that ADK provides. Whether that's the scope or how the information is added to the context or what's sent to Memory Bank. Um so if you want to customize, you can always create a callback to automate it yourself.

[25:42]Sounds good. So far, we've just been kind of using the default settings for Memory Bank. But what if I want to customize how Memory Bank deems information to be meaningful? So by default, Memory Bank will always persist information that's deemed to be personal information, user preferences, key conversation events and task outcomes, and explicit instructions to remember or forget. And this is what we call managed topics where Google defines the instructions and the label and the few shot examples of like what what matches with this information. But let's say that you only want to use like a subset of these memories, like you only care about user preferences, but don't care about key conversation events and task outcomes. And for that, you can actually customize the behavior of Memory Bank. You can see that took like less than a second to run to actually update my agent engine with my new customization. And specing the scope IDs or scope keys, user ID. So this configuration only applies to requests that use exactly the user ID scope. So here, you can see that memories that fit user preferences category will be persistent, so it persisted information from here. You can see a memory was created, but information that does not fit user preferences will be persisted. So there's actually an empty response here where no memories are returned. But what if your use case for memory doesn't fit into one of our managed topics? You can also define your own custom topics where rather than relying on Google to provide your label, description and few shot examples, you bring your own.

[27:32]Um so for this example, I'm going to use a a topic that you might not even think of using memory for, but it's basically to condense and extract feedback for your business. So maybe you have like a feedback form where people can provide um like information that they information and recommendations for your business. But there's a lot of like unmeaningful information or duplicative, you don't want to do all that data mining yourself to extract only the meaningful information.

[28:13]So you can actually use Memory Bank for this use case. You can create your own custom memory topic to define the label and descriptions. So I'm saying I want to save specific user feedback about their experience at the coffee shop. I'm also providing some few shot examples where I'm providing like a sample conversation here. And then the expected outcome for memories, um that should be extracted from this conversation. I'm also saving an example of a conversation where like the information is not meaningful and shouldn't be persistent. So basically it's like a no-op where nothing should be, there should be no outcome for this. So I'm going to update that same agent engine with my new topics. Again, I'm still using that the same agent engine from the beginning. I'm just updating it and kind of fine-tuning how I want it to extract information. And my expectation would be no information is extracted from here because although I used to want to extract user preferences, I now care more about business feedback. You can see nothing's extracted in this case. That now I'm going to give it the feedback, you should have more milk options. And you can see it created a memory since this fits into the memory topic that I provided. And then you can actually see the memory. So the memory that was persistent is, I think that the copy shop should offer more milk options. So it gives me a condensed record of the um of the feedback that my my customers are providing. And let's say I actually provide something similar where like you should have almond milk. That was kind of similar to that last query I sent. And my expectation would be that this combines it with the memory that I already have because it's it's pretty duplicated information. And you can see that rather than creating a new memory, it updated the existing one with, I think that the coffee shop should offer more milk options, such as almond milk. Um C, do you have any questions on customization before I move on? Yeah, I think this is a nice little trick because you don't always fit into the four predefined labels for memories, right? So having the custom options is like a nice little way to uh have any kind of information that you want your agent to remember. So which brings me to the question, like, do we have um can we set a TTL for these memories, like how long should the memories live? Yeah, so you can set TTL um for your memory Bank and the you can you can always set it on like the individual memory. But you can also set it like more programmatically where Memory Bank will set TTL for all of your generated memories. And this is especially important because the memories that you generate aren't explicitly created by you or your agent. That's more a agentically managed under the hood where Memory Bank is creating and mutating them. So you, when you're generating them, you don't have full access to the resource to be able to set the TTL. And so to make this easier, what we allow you to do is essentially define a default TTL when you're setting up your Memory Bank. And so for example, this is a 30-day TTL. And then let's just generate some memory for an example. And I'm saying I have four nieces, and you can see here that the expiration time is set to September 28th. And that's a month from when we're filming, and so this memory will no longer be available after September 28th. There's actually a lot of operations that can create memories. You're not just always using um like you might want to have a different retention based off of what operation created the memory. Or maybe you even want to just create set the TTL when the memory is created and not refresh the memory when it's updated. And so to enable this more like granular control, you can also define a granular TTL where you define the TTL per operation that created the memory. So for example, I just want to find a TTL when I create a memory, when I update that memory, I don't want to refresh the TTL. I just want to start the timer when I actually create it. So you can say, like, remember I work at Google. You'll see here that the the TTL for this is one year from now on, um this date. But let's send the same request and because of consolidation, we expect the same memory to be updated. And you can see here that the TTL hasn't changed between my two requests because I use this granular TTL where I defined which specific applications should set TTL. And that concludes the the information that I have prepared. I'd say the best place to learn more about Memory Bank is the Vertex AI documentation that goes deeply into how memory generation works, how you can configure Memory Bank. And that also provides quick starts for using both the Agent Engine SDK and ADK. Thank you, Kimberly for the walkthrough. All right, it's now time for our inch talk section. We've carved the internet and found some burning questions about Memory Bank and ADK. And we'll have Kimberly to answer them. So let's dive in. Kimberly, why should someone not use these databases, but instead use Memory Bank? Yeah, so if you're just using a database, the content that you're starting will be pretty similar to what we saw previously with the in-memory memory service where you're storing that raw turn-by-turn dialogue. So a ton of use um tokens, a ton of information that might not be meaningful to persist for a long period of time. When you're using memory services like Memory Bank, you're only saving the information that's meaningful for the future and you're also using consolidation to basically de-duplicate information and curate your information over time. So you really only store what's necessary for the future rather than storing everything. You're also doing that processing at storage time to reduce the information that's meaningful. Rather than putting more onus at retrieval time to just retrieve what information is helpful. And if I added my agent's instruction to remember a specific piece of information, does that even like make the Memory Bank remember them? Well, so there's a difference between what the agent's responsible for versus what Memory Bank is responsible for. So the agent ultimately is responsible for orchestrating the calls to your different tools and services. So if you have a tool to extract memories and then send it on to Memory Bank, yes, that instruction would be helpful. But ultimately like Memory Bank is responsible for extracting that information. So rather than providing that instruction to your agent, I would provide that as a custom topic so that Memory Bank knows what type of information to persist. Can I use Memory Bank with other models, other than Gemini, like Olama or Mistral or anything else? So Memory Bank um only supports Gemini models. Uh you saw in the beginning of that collab, I actually provided my model name when setting up my my Memory Bank and you can provide any Gemini model name. Thank you, Kimberly for this deep dive walkthrough. And that's about it for today. Awesome. Thank you so much for having me. And don't forget to use ADK and Memory Bank. We've learned how crucial memory is for creating personalized and intelligent AI agents. And we've explored the differences between volatile memory and persistent memory. And we've also seen how memory generation, consolidation, search, and retrieval works with Google's ADK and Vertex AI Memory Bank. And now it's your turn to build your own memory-powered agent. Check out the description below for all of the resources and links and let us know what you build. Until next time, happy coding.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript