Thumbnail for Cloud Adoption Framework for AI Agents – Single vs Multi-Agent Design and Build Pattern (Part-4) by MadeForCloud

Cloud Adoption Framework for AI Agents – Single vs Multi-Agent Design and Build Pattern (Part-4)

MadeForCloud

11m 32s1,578 words~8 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Pull quotes
[0:00]Now we are into part four of this series on adopting AI agents right way, based on the Microsoft Cloud Adoption Framework for AI agents.
[0:00]We talked about planning, organizational readiness, data architecture, responsible AI, governance and security.
[0:00]So that when you actually start building AI agents, you don't make architectural decisions that feel convenient today but become very hard to reverse later.
[0:00]In part four, Microsoft focuses on two questions that every organization eventually runs into.
Use this transcript
Related transcript hubs

[0:00]Hello everyone. Now we are into part four of this series on adopting AI agents right way, based on the Microsoft Cloud Adoption Framework for AI agents. Up to this point, we have spent a lot of time on foundations. We talked about planning, organizational readiness, data architecture, responsible AI, governance and security. And all of that work exists for one simple reason. So that when you actually start building AI agents, you don't make architectural decisions that feel convenient today but become very hard to reverse later. This part is where things become very practical. In part four, Microsoft focuses on two questions that every organization eventually runs into. The first question is, whether you should build a single AI agent or multiple AI agent. The second question is, how you build AI agent in a secure, standard and repeatable way across the organization. These two decisions directly impact scalability, security, maintainability and long-term trust in your AI system. So let's start with the first question. Single agent versus the multiple agents. When most teams began their AI journey, they naturally think about a single agent. One agent that talks to the user, understand the intent, reason over the data, call the different tools and produces outcome. And Microsoft also clearly acknowledges that this is often the right place to start. A single agent architecture is simpler to design, easy to test and cheaper to operate. And in fact, faster to deliver. For early experimentation or the POCs or when it's small scope business use cases, a single agent is usually the best choice. However, Microsoft is also very clear about the limitations. As responsibilities grow, prompts become more complex and the permissions expand and the reasoning paths become harder to control. And at that point, the agent start behaving like a monolithic system. Changes in one area start affecting the others behavior. Testing becomes very difficult, security boundaries start getting blurred. And from real world experience, this is something I have seen many times. A single agent starts small and clean. Then one team adds a new tool, another team adds extra prompt rules, someone adds the another data set. And over the time, the agent still works, but no one can confidently explain why it made a specific decision. And that is usually the moment when organizations start losing trust in the agent, even if the answers look correct on the surface. And this is exactly where Microsoft introduces the idea of multi-agent architecture. Now, multi-agent does not mean making things complex for the sake of complexity. The core idea is the separation of concern. Instead of one agent doing everything, responsibilities are divided across the multiple specialized agent. Each agent has a clearly defined role and the limited permissions and the focused scope. Then the agents are coordinated through an orchestration layer. Microsoft highlights several scenarios where multi-agent decisions make sense. One scenario is, when the different steps in a workflow require different level of trust, access and validation. And if I'll talk about a very common enterprise example, it's the financial services. One agent might analyze the customer input and prepare a request. Second agent validates the compliance and the regulatory rules.

[4:08]And the third agent executes the approved actions. Now, each agent operates independently, with its own permission and the guard rail. And this approach reduces the risk and improve the auditability and aligns with the regulatory expectations around the separation of duties. And this structure makes it much easier to evolve each capability independently. It also allows you to apply the stricter controls to the agent that interact with the sensitive information. And Microsoft also makes an important point here. Multi-agent does not mean that you start with a multi-agent. In fact, the recommended approach is often to start with a single agent, learn from the real use case and then refactor into the multiple agents once the clear boundaries are emerged. This avoids the premature complexity while still supporting the long-term scalability. Another key consideration is the orchestration. In a multi-agent system, agents cannot operate in an unstructured way. There must be a clear orchestration model that defines how agents hand off work, how context is shared and how the errors are handled. And Microsoft point out that the orchestration should be determined whether possible even the individual agent reasoning may not be. And you know, this predictability is critical for the enterprise workloads. In practice, this usually means using workflow engines or the orchestration service that control the execution order, manage retries and force approvals and maintain state. Without proper orchestration, multiple agents quickly become difficult to debug and even harder to trust. Now let's move on to the second major focus of the part, building AI agents through a secure and standardized process. And Microsoft is very direct about this. One of the biggest risk in the large organization is the agent sprawl. Different teams build agent in different ways, using different tools, different models and the different security assumptions. The result is inconsistent quality, duplicated effort and the serious governance gap. Cloud adoption framework addresses this by recommended a standard build process for all the AI agents. The first step in that process is defining a clear agent charter. This charter describes what the agent is supposed to do, what it must not do, what data it should access, what tool it can use and how success is measured. From an architectural perspective, this charter becomes the contract that everything else is built around. In real project, this charter is extremely valuable. It prevents the scope creep, simplifies the security reviews and help maintain the agent behavior to the stakeholders. When something goes wrong, teams can always go back to the charter, check whether the agent state within its intended boundaries and then decide what happened. Okay, now the next step is the model strategy. Microsoft strongly advises against defaulting to the largest or the most powerful model for every task. Instead, every organization should define a tiered model strategy. Simple task should use smaller, faster and the cheaper model. More complex reasoning should escalate to more capable model only when required. And this approach controls the cost, improves performance and reduces unnecessary exposure. And I have personally seen organization save a significant amount of money simply by routing most requests through a lightweight models and reserving the premium models for the edge cases. And this optimization is only possible when the model strategy is planned up front and applied consistently. Another critical area is defining the knowledge sources and the tool access. Agents become powerful through the data and the actions. But this is also where the most risk exists. Microsoft recommends treating the knowledge access and the tool invocation as the privileged operations. Agent should only access approved data sources through secure connections with filtering and access control applied. For agent that can take actions such as writing to database or triggering the workflows, Microsoft strongly encourages human in the loop control, especially in the early stages. So in practice, this means agent recommend actions, but a human approves the execution. Over the time, as the confidence increases, some actions may be automated, but only after proper risk assessment. And the security testing is also a core part of the build process. Traditional applications, security testing is not enough for the AI agent. Microsoft highlights the need to test the prompt injection, data leakage, misuse of the tools and attempts to bypass the guard rails. In more mature environment, this includes dedicated red teaming exercise designed specifically to break the agent. Now memory and state management is another area that often causes issues when overlooked. Agents that maintain conversational or long-term memory can accidentally store sensitive information or retain the outdated context. And Microsoft recommends defining clear rules around what is stored, how long it is retained, when it is stored and memory should be centralized into the secure system that support encryption, auditing and life cycle management. And finally, observability must be built from the day one. Agent should produce the detailed telemetry around the reasoning paths, tool usage, latency errors and the outcome. The telemetry is critical for debugging, performance tuning, compliance and continuous improvement actually. And Microsoft recommends integrating this observability into existing monitoring and the CICD pipeline so that the quality and the security checks happen continuously, not just a one-time deployment time. And when all of these elements come together, the result is a repeatable and scalable way to build AI agents that teams can trust. New agents can be created faster because the pattern already exists. Security reviews become smoother because the expectations are clear. Governance teams are confident because the controls are consistent. And to close the part four, this is where the architecture meets execution. The choice between the single and the multi-agent system determines how the solution scales. The decision to establish a secure and standard build process determines whether your organization can adopt AI agents confidently at enterprise scale. In the next part of the series, we'll move into the operating AI agents in the production. We'll talk about the monitoring, cost management, performance optimization and the long-term life cycle management. All the foundations we have covered so far exist to make the operational phase predictable, safe and sustainable. And that's all I wanted to show in this video. Stay tuned for the next part.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript