Thumbnail for Getting started with Agent Development Kit by Google for Developers

Getting started with Agent Development Kit

Google for Developers

27m 33s2,878 words~15 min read
Auto-Generated

[0:06]If you build multi-agents, you know how complex they are. Today, let's simplify that using Google's Agent Development Kit, an open-source framework to build AI agents. We'll not only build and run our agent, but we'll also see how to spin up a client UI to debug our agent within a few minutes. Agent Development Kit, or ADK for short, lets you not only build and run your agent, but also evaluate and deploy seamlessly to any provider of your choice. And this is powered by code-first flexibility, meaning you know, there are two ways to build agents. One is the config-based, another one is a code-based. And inherently, code-based provides you with more granular and finer control over your agents because it lets you orchestrate them using the coding or programming language constructs. And ADK was built with this in mind. So, it is naturally powered by the best practices of Pythonic language, think of classes and functions. The idea was to make building AI agents as simple as just software development. And now let's briefly discuss the architecture before actually diving into the code. In today, we're going to be building a YouTube short agent. This YouTube short agent has three sub-agents internally. The first one is the script writer agent, which writes scripts given an idea. And the second one is a visualizer agent, which actually takes the scripts returned and creates visuals for it. Meaning, it creates detailed descriptions of what the visual should look like that match the script. And we have a third agent, which is the formatter agent, which takes in both the script and the slides and combines them in a nice little markdown format just to make it pretty. And if you see, the first agent, the first sub-agent actually uses a Google Search tool. And this tool comes in packaged right with the SDK. Meaning it's built-in and you don't need to write any extra code. And we use a Google Search agent here because that would provide the script writer the ability to maybe search current trends that's going on and create your script outline according to it.

[3:17]And now, let's dive into the code. Let's grab the Google IDK package from the PiPi and jump into our development environment. And what we have here is the agent.py file, which contains all of the definitions for our root agent and the sub-agents. Let's start with the root agent. Here, we have a YouTube shorts agent, which takes in a name and the model. And we're using Gemini 2.5 Pro here, but you're free to use any model of your choice. And that is one of the key capabilities of ADK itself. It is model agnostic, deployment agnostic, and interoperable. You can bring in any models from anywhere. You can deploy your agent to any cloud or any provider that you can use. And you, it is also interoperable, meaning you can bring your agents that you've built using other frameworks into ADK and it's all going to work together. And coming back to the definition, you also see this description and an instruction that's being passed to our root agent. So, the description is maybe, you can think of it as a one-liner of what an agent is, even though it's more than one line in most cases. It's helpful to think of, it describes what the agent is. And we have an instruction, which actually tells, which goes over details of step-by-step instruction to our agent on how it should accomplish a specific goal. And then we see three sub-agents here, like we've seen in the architecture diagram before. Now let's briefly go over our sub-agents. The first one, script writer, takes in most of the parameters that our parent agent actually had. And a few, few key differences are, we now pass on the tool, which is a built-in Google search tool here. And then we have a variable called output key. And this is an interesting concept. So, if you want to pass state between an agent, or if you want to pass information between one agent to another agent, you can do that via output key. Now, here, output key is assigned to a placeholder variable called a generated script. Now what will happen is when this sub-agent runs, the response from the LLM that you get will be stored into the key called generated script. And every other sub-agent can now access the state and retrieve this particular key generated script to get the LLM's response. And you also see that the instruction here is being loaded from a file, and we've actually loaded all of the instructions for our agents from a file. So if we switch over to one of the instructions, it's going to look like this. It's going to detail step-by-step on how an agent can accomplish its goal.

[7:38]Now, if you go to the visualizer agent and the formatter agent, the definition of it is going to stay the same more or less, except for the instruction. Now if we, if we look at the instruction for the visualizer, this will now call the state of generated script. Which, if you remember, is what we put into the output key in our scriptwriter agent.

[8:18]And when we switch over to the formatter agent, you will see that it takes in both the state of the script and the visual concepts to create the final markdown. Now, before we run this agent, let's expose this agent in Python's init file by saying a simple line from. import agent.

[8:50]And you see that we're calling Gemini 2.5 Pro model here and in order to authenticate to this model, we have to set environment variables, which could be an API key and depending upon the model you're using, this could vary. Now, I've set my environment variables and it's time to run our agent. You can run your agent in four different ways. The first one is the ADK run, which is a CLI command and it directly runs your agent in the command line. The second one is an ADK web, which spins up a brand new Angular UI for you to interact with your agent. And this also has multi-modal capabilities where you can interact with the agent using voice or video. And the third one is the ADK API server, which actually exposes your agent as a rest endpoint. And the fourth one is the programmatic way to call your agent. We'll briefly discuss most of these options in this video. And to start with, we're going to use the ADK run.

[10:35]Let's call the ADK run with our agent name. And this is going to spin up the agent. And now I'm going to type in a prompt saying, write me a script on how to build AI agents.

[10:59]And now, if you see, we get a response from a script writer. Okay, let's see how the ADK web experience looks like.

[11:15]I'm going to type ADK web to spin up a client UI for us.

[11:29]Let's copy paste the local URL in a new browser tab. And now we can see our agent listed here. Let's ask it to write a script. Write me a short script on how to build AI agents. Now, in the UI, you should see all of the events that happens within the agent and all of the transfers that happens as well. So, if you see the first event is handled by our YouTube shorts agent, which is a parent agent, and then it has transferred to another agent, which is our script writer agent. And this response is from our script writer agent. But if you see in both these responses using the as ADK run and the ADK web, only the scriptwriter responded even though we had other sub-agents like the visualizer and the formatter. And this is where inherently the problem of multi-agent surfaces in that the parent agent has the LLM capability, which it can use and reason to pick and choose which sub-agents should solve a particular user prompt. Now, in this case, our parent agent decided that script writer was more than enough to handle our user query and so it didn't pass to any other agent. But what we can do is we can use ADK's capability to to convert this into a workflow agent. So we can make sure all of our sub-agents are run.

[13:51]So Agent Development Kit has three different types of agents. The first one is the LLM agent, where you have the traditional definition of an agent is something that has an LLM and also has tools. So, those kind of agents fall into the LLM agent. And the middle ones are the workflow agents, where you say, "Hey, I don't need an LLM to actually decide which sub-agents to pick. But I know how to pick them and I know how I want to run them. So these kind of deterministic controls when you want them, you should be picking one of these. In the sequential agent is something that can run all of its sub-agents one by one in a sequence, whereas a parallel agent runs all of them together at the same time, parallelly. And for loop agent, yeah, you guessed it right. It's going to run its sub-agents iteratively until a loop condition is met. And the third class of agents is the custom agents, where you have the capability to combine all of the elements together to create an agent. Let's say, for example, you're building a custom agent that requires a sequential agent and a loop agent inside it with an FLS condition. You can do it all together in the custom agent. And all these three kinds of agents inherits from the base agent class.

[15:56]So now going back to the earlier problem that we saw, our parent agent only called the scriptwriter agent to give a response, even though we had other sub-agents. So let's improvise our agent a little bit by using Loop agent. And we're going for this architecture because our Loop agent is going to iteratively run all of our sub-agents and this works for us because we can then iteratively improvise our script, visuals and formatting. And now, let's see how to code the loop agent. And here, I have the code for the loop agent, which is slightly different from the earlier code that we saw. For starters, we replace the LLM agent with the loop agent workflow. And we've also removed few of the parameters like the model, description and the instructions. Now, since this is a workflow agent, it will not need an access to a model or reasoning capabilities. And we've introduced one new parameter, the maximum iterations, which actually refers to how many times the sub-agents would run in a loop. Now, there are two ways to do this, you can either set max iterations in the parent agent or you can set an exit condition in one of the sub-agents. Now, let's see how to run this and verify if all of our sub-agents are being called.

[17:59]Let's run ADK web to quickly spin up our Angular UI again. And here, let's select the YouTube shorts assistant and ask it the same query. Write me a short script on how to build AI agents.

[18:35]Now, from the responses, we'll be able to verify if our parent agent actually called all three of the sub-agents in an iterative way. Let's see. And if you see, a model returns responses and the first event is actually from the scriptwriter. And the second event is from our visualizer. Let's scroll down a little further and you will see that the third event is from the formatter. Thus, we can see all of the agents, the sub-agents were called, and from the list of the events, we can also see that this is being called iteratively in a loop. Great. Now we've seen how to run a workflow agent. And the final part of this video is to learn how to do that in a programmatic way.

[20:01]But before that, we need to know a few key capabilities, which is the services, runner, and the event loop. So services, think of it as memory, session, or artifact services. Now, when we ran the ADK run and the ADK web command, we didn't have to instantiate any memory, session, or artifact. The SDK took care of all of it. But when we want to invoke the agent in a programmatic way, we need to give it memory and a session to store all of its conversations. Think of session as the duration or the length of the conversation. Like, for example, let's say if you're running this agent in a Collab runtime, the length of the runtime is the length of your session. And if you're running it, let's say, for example, maybe like me in an IDE, then when I stop the agent or kill the agent with Control-C, that's the length of the session. And for memory, we have a few different options. The first one being the in-memory session services, which is a managed memory service, and all of the conversations that happened within an agent will be stored in memory and will disappear with the session disappears. Or you can also hook up your memory to a more persistent storage like a database. And artifact storage is actually interesting because all of the conversations that an agent can, you know, internally take place within an agent when it produces an output, like, for example, let's say a text file, PDF file or image files, you can store that into an artifact storage and then retrieve it as and when necessary by other agents. And going to the runner, think of runner as the heart of the agent or more of an execution engine. The runner is responsible for taking the input prompt, gathering all of these services, and then invoking your parent agent. Now, when the parent agent starts running, it internally executes all of its sub-agents, and each of these actions produce something called an event. And this event is streamed asynchronously from the runner. So think of event as anything atomic that takes place within an agent. Like say, for example, you're passing an input prompt to the runner and that is an event. When the agent calls a tool, it's an event, and when the tool returns a response, that is another event. So you get the point. Anything atomic that happens within an agent is an event, which is streamed back to the developer. Now, we can inspect each of these events to see if that's a final response from the agent and take action accordingly. Okay, with all of these basics, let's now jump into the code and make it programmatically runnable. And walking through the code, you will notice that most of the code stays the same from up top here until where we define our parent agent. All that's changed is the bottom half of the section. And you will see we've defined an in-memory session service and we've passed an app name and a user ID and a session ID. Let's break down these. In application name, think of it like a namespace for your agent. And a user ID uniquely identifies the user who's interacting with your agent. And this is helpful in case of a multi-user interaction with a single agent. And the session ID identifies a particular session for a particular user. That's a more granular level. And with all of this in place, we'll call the runner with our prompt in. And here, I have a prompt, "I want to write a short script on how to build AI agents." Let's open up a terminal and we'll invoke it in the Python way. Now, until our agent runs, let's briefly go over the code for the events. As you see, the runner.run returns a stream of events, which we then loop to identify if that's a final response and then print the response if it is so. And now you can see that our agent did run and the first, the script writer agent gets called, then the visualizer, and then we also have the formatter here. And that brings us to the end of our demo. To quickly recap what we've seen today, we saw what is Agent Development Kit and the different types of agents that's available to us. We also saw the different ways of running our agent and we briefly discussed what is session, state and runner. Thank you everyone for tuning in today. Hope you found this helpful. To learn more about ADK, do go check out our docs, which is linked in the description. And if you want to try out the sample agent that we created today, again, check out the links in the description which contains a link to the GitHub repository. Leave us comments on what you think about this video and what we should build next. Thank you.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript