Thumbnail for Prototype to Production with ADK by Google for Developers

Prototype to Production with ADK

Google for Developers

45m 30s5,121 words~26 min read
Auto-Generated

[0:02]So you've seen impressive AI agent demos, but what separates a cool proof of concept from an enterprise grade agent that your businesses can actually rely on? It's making the agent robust, scalable and secure. And that's exactly what we're going to build today. In this video, we're going to build, deploy, and monitor a complete agentic application from scratch using Google's Agent Development Kit. And to guide us, we have a developer advocate IO Adej. IO, welcome to the show. Great, thanks for having me. We're going to build something I think is really practical, a code reviewer agent. It takes a user's Python code, validates it, checks the style, tracks the progress, and even uses long-term memory to give personalized feedback. And that sounds incredibly useful, but also very complex at the same time. And most simple agents actually handle one task pretty well, but this sounds like it needs a whole workflow or process, right? Exactly, and that's the problem we're tackling. A real-world agent isn't just a single prompt, it's a workflow. If you try to build this as a giant monolithic agent, it becomes brittle and hard to maintain. And in this workshop, we're going to show you the complete playbook. You will learn how to build a functional multi-agent with custom tools to create the Python code, manage context using session state, long-term memory using vertex memory bank and artifacts. You will also learn how to deploy your agent as a scalable service on the Vertex AI agent engine and finally, monitor your deployed agent using Cloud Trace. Let's begin. But before we deep dive into the code with IO, let's walk through the architecture we're building. And this is the big picture map that will guide us through the concepts we're about to cover. The most important thing to grasp here is that we're not actually building a single monolithic agent, but instead, we're actually building a system of specialized agents. For the review phase, we use a sequential agent, and this actually forces a strict order, like in assembly line. First, the analyzer agent parses the code structure, and second, the style checker agent reviews for PEP8 compliance. And then third, the test runner agent actually executes the code to find bugs. And finally, once all this is done, the synthesizer agent takes all those results and combines them into helpful feedback that you can then see in the chat. Um, but what if the code has bugs? That's where the fix pipeline comes in. This relies on a loop agent, and inside this loop, the fixer agent writes new code, and the test fixer agent verifies it. And then, the validator agent checks the results. If the tests fail, the validator triggers a retry, sending the agent right back to the start of the loop again. Now, once the code passes, or if we run out of attempts, the report agent takes over to generate the final report. And directing traffic between these two pipelines is our orchestrator. So, we have linear workflows and looping workflows running together, but how do they actually share information with each other? Well, uh, they use a shared context, which is basically their shared brain. This brain has three key parts for different kinds of shared context. First, for the current conversation, they have session state. Think of this as their short-term notepad. It's where the code analyzer can leave a note for the style agent. We can control if that note lasts for just one turn, the whole chat, or even for every chat a specific user has. Next, for remembering a user from last week or last month, they need memory. This is their long-term memory, and it will allow the agent to truly offer personalized feedback over time, like noticing a student's improvements. And finally, what if the user submits a massive code file? We don't want to clog up the system's memory with that. For large files, they use artifacts. It's their filing cabinet for storing and retrieving big chunks of data. Once we've built our agent along with the tools and the shared context we just discussed, you can run them locally to see how they work. But I wish it stopped there. But the next part is to productionize the agent. And today, we'll be looking at the bare bones of what you need to productionize an agent. So, until now in development, we would use something called an in-memory session or an in-memory memory service. Like the name conveys, all the shared context is in the memory. But when we move to production, we need to switch this to a persistent service like the Vertex AI session service and the Vertex AI memory bank. Now, once we've built an agent with its tools and the brain, the next step is deployment. And this is where we make our agent available to everyone else, and it brings up a crucial question IO. Where should we deploy this agent? Great question. Your deployment choice really depends on your use case and your architecture. If your agent is a self-contained microservice and you just need a secure API to call it, deploying directly to Agent Engine is a fantastic choice. Choose Cloud Run when you have variable traffic patterns or want serverless scaling. It's perfect for agents that need to scale to zero during idle periods to minimize costs. And go with GKE when you need full Kubernetes power, like custom networking, GPU nodes for ML inference, or when you're orchestrating complex multi-agent systems that require stateful sets or specialized operators. For our use case today, because we want that managed session state and memory right out of the box, our choice is going to be Vertex AI Agent Engine. Think of this as a fully managed office building for your system. It handles the security, the scaling, and the infrastructure, so you don't have to write any code. And finally, no production system is complete without a way to see what's going on inside it. Agent Engine is wired for observability using Cloud Trace. Think of these as security cameras, letting us watch every step of your agent workflow to find bottlenecks and to debug issues. So there it is. A team of specialist agents orchestrated by a manager or a workflow agent using a three-part shared context, all running within a managed platform with full observability. All right, with that complete map in our heads, let's hand it over to IO and start writing some code.

[6:31]In this code lab, you will go from being a user of vibe coding tools to a creator by building a production grade AI code review assistant. We'll build a multi-agent system that analyzes code with deterministic tools, executes real tests, provides feedback, and deploys to Google Cloud with full observability. Let's begin with chapter two, your first agent deployment. We'll start by properly setting up our Google Cloud environment. First, inside the Cloud shell terminal, we need to set our active project. We'll run the command G cloud config set project, using the Google Cloud project environment variable, which is automatically set for you. With the project set, we can move on to verification. Next, we'll run a quick command to confirm the project is set correctly. The output shows our active project ID, which looks right. Now, let's check our authentication status with G cloud off list. You should see your account listed with active next to it. If your account isn't active, you can simply run G cloud off application default login to authenticate. Before we build any agents, we need to enable the essential APIs. We'll enable the AI platform and compute engine APIs, which are necessary for running our basic agent. This may take a minute. Okay, the operation finished successfully. Our core APIs are ready. Now, we'll install the Google agent development kit or ADK, using PIP install. We'll add the upgrade flag to ensure we get the latest version.

[7:59]To verify the installation, we'll run the command ADK-version. It may take a few seconds to run right after installation.

[8:16]As you can see, we have version 1, 15.1 or higher, which is exactly what we need. With ADK installed, we can now create our Hello World agent. The ADK create command scaffolds a new project for us. We'll name it my first agent.

[8:38]We'll choose Gemini 2.5 flash as the model, select Vertex AI as the back end, and accept the auto-detected project ID and default region. The ADK has now created an agent directory with three essential files, a .env file for configuration, an init.py to market as a Python package, and the agent.py file containing our agent's definition. Let's change into the my first agent directory and list the files to see what was generated. And there they are, just as expected. Now, let's quickly check the configuration file.

[9:19]Running cat.env shows that the ADK correctly populated our project ID and location from the interactive prompts. If this were incorrect, you could easily edit it with Nano or any text editor. But for now, this is perfect. Let's examine the heart of our new creation, agent.py. As you can see, it's incredibly simple. It's just a single agent object with a model, a name, a description, and a simple instruction. This is the Hello World of agents.

[10:00]To test it, we'll navigate back to the parent directory and use the command ADK run my first agent. This starts an interactive console session where we can chat directly with our agent. You can see the user prompt is waiting for input. Let's ask it a basic question. Hey, what can you do?

[10:22]And there's the response. It's a standard helpful reply based on the simple instruction we gave in agent.py. Now, let's test its limits. We'll ask for the current weather.

[10:40]As expected, it correctly identifies that it doesn't have access to live data. This is a crucial limitation of a basic model without tools. Let's see how it handles code. I'll give it a simple function.

[11:02]The review is reasonable. It identifies the function's purpose and even suggests good practices like adding doc strings and type hints. But it's just talking about the code. It can't parse its structure, run tests or check style compliance. To do that, we need architecture. Let's exit with control C. This brings us to chapter three, preparing your production workspace. That simple agent was a good start, but a real-world system needs a robust foundation. We'll start by cleaning up our basic agent.

[11:34]and cloning the full production scaffold from the Git repository provided with the code lab. We'll also switch to the code lab branch, which contains the complete project structure with placeholders for us to fill in. As you can see, this is a much more comprehensive structure. We have dedicated directories for sub agents, tools, and deployment scripts. This separation of concerns is a key production principle. If we look inside tools.py, you can see the placeholders, such as module four step two add state storage where we'll be adding our code. Note, these modules are based on the code lab sections for those following along there. Now back in the terminal, we'll create and activate a Python virtual environment. This isolates our project's dependencies from other Python projects on the system.

[12:24]Your prompt should now show .vn at the beginning. Next, we install all the necessary production dependencies, including Google ADK, Pycode style, and Vertex AI by running PIP install with the requirements file.

[12:43]and then we run PIP install-E. The editable mode flag is important because it allows Python to find and import our code review assistant modules from anywhere in the project, which is essential for our structured application. We'll copy the example environment file to create our own .env file. Then we'll open it to edit.

[13:06]We just need to replace the placeholder for the Google Cloud project environment variable value with our actual project ID. The other defaults are fine for now. The other values will become relevant when we deploy our code review assistant later on.

[13:23]Let's make sure to update the project ID value.

[13:28]and then save the file. We can quickly verify the changes by running cat.env. Everything looks correct.

[13:42]Now we need to enable the additional APIs for our production deployments. This includes services for Cloud SQL, Cloud Run, Cloud Build, artifact registry, storage, and Cloud Trace. This step ensures our Google Cloud project has all the necessary permissions. We also need a place to store our container images, so we'll create an artifact registry repository.

[14:06]Finally, we grant the necessary IAM roles to the Cloud Build service account, which is a critical security step that allows our automated deployment script to manage resources on our behalf. And with that, our production workspace is fully prepared. We have our code, our isolated environment, and our cloud infrastructure ready to go. Let's move on to chapter four, building your first agent. We'll start in tools.py. Here's the scaffold for our analyze code structure tool. The first step is to enable state storage, which allows our tool to share its findings with other agents in the pipeline. We'll replace this placeholder with these lines. We're using our state keys constants to write the original code, the detailed analysis, and the line count to the shared state. Think of this as writing on a shared whiteboard for the other agents to see. This constants pattern prevents silent bugs from typos. Let's open constants.py and take a closer look.

[15:08]At the top, we have our session level keys, like code to review and style score. As we scroll down, you'll see keys grouped by purpose, such as those for the review pipeline and the fixed pipeline. This organized structure makes it immediately clear what data each part of our application produces and consumes. Next, let's make our tool non-blocking. We'll replace the async placeholder with this run and executor pattern. This runs the CPU intensive ast.parse function in a separate thread, preventing it from freezing our application. And as you can see, this works hand in hand with the async deaf keyword at the start of our function. The async keyword alone isn't enough. It only gives the function the ability to be paused. The run underscore in underscore executor is what actually does the work in the background during that pause, preventing our entire application from freezing. Now for the core logic. We'll replace this placeholder to call our helper function, extract code structure, which will also run in the thread pool. And now we'll paste in the helper functions themselves.

[16:15]This extract code structure function is where the real work happens. It walks the abstract syntax tree or AST to deterministically extract detailed information about every function, class, and import in the code.

[16:33]With our tool complete, we can now wire it up to an agent. Let's open the code analyzer.py file. We'll replace this placeholder with the complete agent definition.

[16:47]Notice the choice of model and the detailed instruction prompt. We are being very explicit, telling the agent its exact task and crucially, what not to do, like fixing the code. This prevents the agent from being over helpful and corrupting our analysis pipeline. We pass our analyze code structure function into the tools list, and we set the output key. This tells the ADK that whatever this agent produces should be stored in the session state under the key structure analysis summary, making it available to the next agent in the chain. Now, let's test our work. The project includes a test script for our analyzer. This script loads our .env config, instantiates the agent, and runs it against a sample piece of code.

[17:41]Let's run it.

[17:58]And there's our result. The tool correctly parsed the code, and the agent generated a perfect summary, identifying the two functions and one class. Our first production ready agent is working. To quickly recap what we just accomplished, we built a deterministic tool using Python's AST library, wrapped it in a non-blocking async function, and connected it to a precisely instructed agent that knows how and when to use it. This separation of deterministic work from LLM reasoning is a cornerstone of reliable AI systems. The key takeaways are, use tools for deterministic work, use agents for reasoning and orchestration, and use state with constant keys to pass data reliably between them.

[18:40]Now, in chapter five, we will assemble a full review pipeline. We're back in tools.py to add our second tool, the style checker. This function uses the Pycode style library to identify PEP8 violations.

[18:54]Now, notice the very first thing this function does. It retrieves the code from the shared state. This is the crucial connection in our pipeline.

[19:17]After performing its analysis, the style checker then contributes its own findings back to the shared state. As you can see here, it writes the style score, style issues, and style issue count, making them available for the agents that will run later in the pipeline.

[19:38]We'll also add its helper functions. These helpers perform the actual style check, add our own custom naming convention checks, and calculate a weighted score. This gives us a more nuanced view of code quality than just counting the number of errors. The main check code style tool orchestrates these helpers and handles writing the results, the score, the issues, and the issue count back into the shared state for the next agents to use.

[20:11]Now, let's create the agent. In stylechecker.py, we'll first paste in the dynamic instruction provider. This is another production pattern. Instead of a static string, we'll use a function that injects data from the session state directly into the prompt. Here, the inject session state utility will automatically replace the curly brace placeholders with the actual output from our first agent, giving the style checker valuable context about the code it's about to review.

[20:47]Next, we define the agent itself. It's configured with the fast worker model, it's given our new check code style tool, and its output is stored in the style check summary state key.

[21:07]Moving on to our third agent, the test runner. We'll paste in its instruction provider. This prompt is crucial. It tells the agent to generate 15 to 20 comprehensive tests, execute them, and most importantly, output its findings in a very specific JSON format. This strict JSON output is not for human eyes. It's a structured contract that our final synthesizer agent can reliably parse to understand exactly what bugs were found.

[21:41]Now we define the test runner agent.

[21:46]Notice we're using the more powerful critic underscore model, because generating meaningful tests requires a higher level of reasoning. We also attached the built-in code executor, which gives the agent the power to actually run the Python tests it generates in a secure sandbox.

[22:05]This is the key that separates our system from a simple chatbot. The built-in code executor provides proof, not just speculation. When our agent reports a type error, it's because it actually ran the code and witnessed the crash firsthand.

[22:26]Now for our final and most sophisticated agent, the feedback synthesizer. It uses three different tools. The first, search past feedback, will query the ADK's memory service for past reviews for this user, enabling personalized, context aware feedback.

[23:11]The second tool, update grading progress, is a state management workhorse. It updates temporary, session-level, and persistent user-level metrics, like calculating the score improvement since the last submission. This is how the agent tracks a developer's progress over time.

[23:33]The third tool, save grading report, gathers all the data from the entire pipeline, the code, the analysis, the style score, the test results, and saves it as a comprehensive JSON artifact. This creates a complete audit trail for every review.

[23:53]Notice the dual storage strategy. It tries to save to a persistent artifact service first, but falls back to saving in the session state if that fails. This is a great production pattern for resilience. The final tool for the review pipeline is the feedback synthesizer agent itself. It is configured with the critic model. It takes the output from the previous agents, then combines it with past feedback from memory to generate actionable feedback.

[25:19]Now, we assemble our pipeline.

[25:39]We define a sequential agent named Code Review Pipeline.

[25:55]We then define our root agent, whose only job is to delegate user requests to this pipeline. The root agent's instructions are simple but critical. When it sees code, it must delegate to the code review pipeline and do nothing else. This makes it a reliable router. With our pipeline fully assembled, let's test the entire system.

[26:21]We'll run ADK web to start the ADK's development UI.

[26:28]In the browser, we'll open the web preview.

[26:33]Select our agent and paste in a depth first search algorithm implementation that has a logical flaw for the purpose of testing and validating performance of our code review assistant.

[26:49]Now, watch the pipeline in action. In the events tab, you can see each agent executing in sequence. First, the code analyzer runs. If we click on the check code style event, we can expand it to show the details. We can inspect the raw data for any event. Here, in the request, we can see the full prompt and tools pass to the LLM. As the agents complete, the final polished response from the feedback synthesizer is assembled.

[27:27]Let's take a look. We can see the code analyzer agent results, then the style checker agent results.

[27:38]Then the test runner agent get activated and the test cases generated and the corresponding results.

[27:52]Then the search feedback tool,

[27:57]as well as the save grading report and compile fix report tools get called by the feedback synthesizer.

[28:10]Here is the result of our four agent pipeline. It's a complete report that includes a summary, strengths, a detailed analysis of structure, style, and the test results. correctly identifying the critical bug, and provides clear actionable next steps. This is the power of a multi-agent pipeline. It transforms raw, technical data from deterministic tools into a helpful and educational experience. Now, let's stop the server. Welcome to chapter six. We've built a pipeline that can find problems. Now, we're going to build one that can fix them. Let's implement the agent's starting with the code fixer.

[28:50]First, we'll add in the instruction provider. Like before, this uses a dynamic template that injects data from the shared state. It pulls in the original code, style score, and test results from the review pipeline. This gives the agent all the necessary context about what went wrong. Now, look closely at the critical instructions. We are being extremely explicit. The agent must output only the corrected raw Python code. No explanations, no conversation, and no markdown code blocks. This is because its output will be consumed by another agent, the test runner agent, which expects a clean, executable file, not a chatty response. Next, we'll define the agent itself. We'll give it the name code fixer and assign it our worker model, Gemini 2.5 Flash, which is powerful enough for this code generation task. We're also providing the built-in code executor, and finally, and most importantly, we set the output key to code fixes. This tells the ADK to take the raw Python code generated by this agent and save it to our shared state. The next agent in the loop can then access this corrected code by using that exact key. Next, in fix test runner.py, we define the test runner for our fix loop. It takes the newly generated code from the code fixes state key and runs the exact same battery of tests, comparing the new results with the old ones. Its job is to report back in structured JSON whether the pass rate has improved and which tests are still failing.

[30:46]Now for the validator, which needs three new tools.

[30:54]The first in tools.py is validate fixed style. It's very similar to our original style checker, but it specifically runs on the fixed code and compares the new score to the original.

[31:37]The third and most important tool is exit fix loop. It contains a single critical line, tool context actions escalation equals true. When the validator agent calls this tool, it signals to the loop agent that the fix was successful and that it should stop iterating.

[31:57]Now we create the fix validator agent itself. Its instructions are very clear. Use its three tools to check the style, compile a report, and then, only if the fix status is successful, call exit fix loop. This conditional logic is critical. If the fix is a failure or only a partial success, the agent does nothing. This inaction is deliberate. It allows the parent loop agent to see that the success criteria weren't met and proceed to the next iteration. Okay, our three loop agents, the fixer, the tester, and the validator are defined. Now, let's go to our main agent.py file to orchestrate them into a new pipeline. First, we'll add the necessary imports, including the loop agent from the ADK and the new agents we've just created for the fixed pipeline. We replace the placeholder to define our code review pipeline.

[32:55]We then define our root agent, whose only job is to delegate user requests to this pipeline. Next, we'll define the core of our fixed capability. We create a loop agent named fix attempt loop, which will run our three agents in a cycle. We set max iterations to three as a safety net.

[33:11]Then we wrap this loop and a final synthesizer inside a sequential agent called Code fix pipeline. This ensures the synthesizer runs only once after the loop completes.

[33:31]Now we need to make our root agent smart enough to use this new pipeline. We'll replace the old definition with this updated one. The key changes are adding code fix pipeline to its list of sub-agents and, importantly, updating its instructions. Now after a review, if it finds significant issues, it will proactively ask the user if they'd like to attempt a fix. This makes it a truly interactive assistant. With our main orchestration complete, let's implement that final synthesizer agent.

[34:02]In fix synthesizer.py, we'll add its instruction provider. This template pulls the final report, the corrected code, and the fixed status from state to create a comprehensive user-friendly summary of the entire fixed process.

[34:25]Now, we define the agent itself. It uses the powerful critic model to ensure the summary is high quality and is configured to use a tool called save fix report. This is the last piece we need. Let's create that final tool now.

[34:47]Back in tools.py, we'll add the save fix report function. It gathers all the data from the fixed pipeline and saves it as a JSON artifact, giving us a complete audit trail for the successful fix.

[35:02]We'll also add it to our modules all list to make it importable. With all our components in place, let's test the entire system end-to-end.

[41:41]We're receiving the full streaming response directly from our deployed instance. Our multi-agent system is live and fully functional. Now for the final critical piece of production readiness, chapter eight, observability. Your agent is deployed, but how is it performing? The trace to Cloud flag we used during deployment automatically instruments every request with trace level observability. To see this, we'll navigate to the Cloud Trace Explorer in the Google Cloud console. The list shows the timeline of every request made to our agent. Let's click on one to examine the waterfall view. This Gantt chart shows the complete execution timeline of a single request, breaking down every operation into a span. We can see the code review pipeline and its children. Let's inspect the code analyzer. Here in the attributes, we can see the exact model used and even the token counts for this specific LLM call. If we click on the check code style tool execution, we can see the tools inputs and most importantly, the JSON output that it returned, which includes the style score of 98. All right, it's time for our eng talk section. We've searched the internet and found some community questions on productionizing agents in ADK. Let's dive in. IO, what are some considerations to take into account when I'm integrating my agent with a web app or a client? Well, beyond the polling pattern we discussed, think about authentication. You'll want to secure your endpoint with API keys or all off. Consider implementing rate limiting to prevent abuse, and always validate user input before passing it to your agent. Also, design your UI to handle partial responses gracefully. Agents can stream responses, so show user's progress as it happens rather than making them wait for everything to complete. What is the end-to-end workflow for deploying an ADK agent? What are the considerations should we be talking about in addition to the ones we've discussed today? Well, the full workflow includes developing locally, writing tests for your tools and agents, containerizing your application, setting up CICD pipelines, deploying to your chosen platform, like Cloud Run or GKE, and configuring monitoring. Also setting up evaluation metrics to track agent performance over time, and implementing proper error handling and fall back responses for when things go wrong.

[44:19]How can I safely run untrusted, LLM generated code in my agent? Well, never run untrusted code directly. Use sandbox environments like containerized execution with Gvisor, set strict resource limits for timeout, memory and CPU to prevent infinite loops or resource exhaustion. Consider using static analysis tools to pre-screen to pre-screen code before execution, and always run in isolated networks without access to sensitive resources. That was our last question. IO, thank you so much for walking us through this entire journey. It is actually now clear that building a production ready agent is so much more than just writing a prompt. That's right. Today we built a complete multi-agent system from scratch using Google's ADK. So thank you for having me. I can't wait to see how our viewers take their side projects and turn them into full-fledged production apps. Amazing. Share your projects, your ideas and your questions in the comments below and we'd love to see what you create. For other agentic concepts like evaluation, A2A, MCP, etcetera, check out the workshop playlist. We have some amazing guests covering so many hot topics and until next time, happy building.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript