Thumbnail for Build AI agents for e-commerce with ADK + Vector Search by Google for Developers

Build AI agents for e-commerce with ADK + Vector Search

Google for Developers

30m 6s4,228 words~22 min read
Auto-Generated

[0:00]Hi everyone. Welcome to another episode of Hands-on with AI Agents. And in this episode, we'll be building a rag agent using Google's ADK and Vector Search.

[0:14]In the first part of the video, we'll be seeing what Vector Search is and what are the different kinds of embeddings. And in the second part, we'll be diving deep into building the agent as a rag agent. And to do that, we have here. Hi guys. Hi, sir. Hi, I'm Kaz, developer advocate from the Cloud AI team. I focus on the building a demos and broad posts and documents for the Vector Search embeddings and the ADK. Lovely, let's get started then. So, first, let's think about the typical design of the rag or retrieval augmented generation systems for the e-commerce chatbots. Here, we have the user query, can you find Pixel, Google Pixel 9? Then LLM receives this query and decide it needs more information to provide an accurate answer and suppress hallucination. So, it uses the retrieval backend like Vector Search and uh get the result from the for the query, and then generator the answer to the user. That's the typical direct system setup. Now, let's look at the some challenges for the usual rack systems. Specifically, multimodal search and keyword search. This example shows the user asking, can you find cups with dancing figures? And the SKU or product number is 123-ABC. LLM receives this query and thinks, how do I found the images and keyword search? This query is more complex than the previous one because it involves the descriptive information like cups with dancing figures, which implies the visual features rather than the text semantics. And also, the specific identifier or product name like a 123 ABC. This represents a common challenges for the rack system in e-commerce site that uses a simple text similarity search. They need to be able to understand and effectively process different types of the information in a single query.

[2:20]Another challenge is recommendations. Here, we see a user asking, can you suggest a birthday present for my son? The LLM receives this and thinking, how do I make a recommendations to the user's query? This questions demonstrates the complexity of the recommendations. It's not just about finding a similar items to the text query, but understanding what might be the suitable suggestion for their son. This highlights the another area where the simple retrieval text similarity search doesn't work. And we need something more sophisticated and provide meaningful recommendations. So, to solve those problems, we'd like to dive deeper into the advanced practices for the Victor search that go beyond the the simple similarity search. Here are the some best practices for the higher search quality. Specifically, I'd like to discuss about multimodal search, hybrid search and task type embedding. This is how much modular search works. So, with much modular search, uh you would use the much modular models to generate the embeddings. Those much model embeddings is shared across the much modality uh like the images and text. They share the same the embedding space. That means, if they have the similar meanings, that result in a closer embedding distances. This enables the text to image search or image to text searches. And to do so, you can just use the Vertex AI embeddings API to use the much module models to generate those much module embeddings. Here's another solutions uh we'd like to use, which is the hybrid search that uses the keyword search and the semantic search with the single the the search index. So, the problem of the using the simple similarity search is that it is limited to the what the embeddings model understands. It struggles with the product names or newly added product names. This limitation is could be a major issue in the production rack systems. Hybrid search addresses this by combining semantic search and keyword search, or so-called sparse embeddings into a single Vector Search index. Allowing a single query to retrieve the best mix of the results. Okay, I'd like to share the the actual example of the multi-model search with this demo. This demo has the 3 million items provided from the mercari.com, which is the popular e-commerce website.

[5:07]And with the built with the much modular embeddings and index. So, for example, if you type the queries like cups with dancing figures. Then from the 3 million items, you can instantly find those cups. Please note that these items are found only by looking at those images. So, this demo doesn't look at the any text title or categories or the text descriptions at all. The embeddings model understands what's going on in those images. And find the items that has the similar meanings to the text query, cups with dancing figures. Now, let's take a look at the another example, which is the hybrid search. In this demo, I'd like to use the semantic similarity uh search index combined with the sparse embeddings for the keyword search. So, from the 3 million items, you can find things like, 1234. That doesn't have any meanings. But you can do the Vector Search with that, with the sparse embeddings keyword search. So that you can get your items like, you know, this items. Where you have the 1234 as the keyword in the product description or the titles. So, you can combine the both the result from the keyword search and the semantic search in a single result. So, we have covered the two topics in the advanced topics for the Vector Search. Now, I'd like to discuss about the the third one. Task type embeddings. Why we would need this? So, problem is that in many cases, the simple similarity search doesn't work for the production systems well. Because the in most cases, the query and the embeddings has the different semantics. Like this one. Why is the sky blue? This is the query, the answer, scattering of the air. They are actually have the quite different semantics or meaning as sentences. So, in last 10 years, the many researchers in the information retrieval area has been struggling to to solve this problem by using the machine learning or deep learning models. The popular solution here is the so-called dual encoder or tutorial model. These total model, are you looking at the animations, that has the two part for learning the different domains, like a query domain and the database or document domains. The thing is, this total model runs the relationship between the different query and documents. So that you can get the best relevant the result rather than the similar items found on the documents database. But you don't have to hire your own teams with the data scientist to build your own total model and train it by yourself. Rather than that, you can just use the Vertex AI embeddings API to get the task type embeddings, embeddings that is generated from the pre-trained tutorial models we provide. So, with that task type embeddings, you can ask the questions like, why is the sky blue? Then that will be, that will generate embeddings that has the closer distance with the answers like scattering of the air. Okay, let's take a look at the actual the demonstration of the task type embeddings. First, I'd like to use the semantic similarity uh search with the query like birthday present for my son. Then if you are using the usual semantic search, then you'll be getting the results like this. So, you are looking at the many key chains because they have the very similar uh the text descriptions like the uh the son on the birthday, but maybe they are not the ideal items you want to get as the result.

[9:16]So, instead, we are switching the model, embedding model to the question and answering task type embeddings. And by using this, with the same query and the same items, the result will be quite different. Because your task type embedding models runs the relationship between the query and the relevant items. In this case, these are the present items. Also, these the the query, the result has the totally different semantics like a Lego Duplo Mickey Mouse, uh but the modules can recommend the result like this. Yeah, it's nice to see that the results are not all keychains, right? But now that we've seen all of these different types of embeddings. How do we take this and integrate with our AI agent? Yes. So, let's dive deeper into how you can take advantage of those advanced practices of Vector Search combined with the AI agents. So, let's think about another challenges for the e-commerce website that is smart recommendations. So, we have already discussed about the recommendations by using the task type embeddings. But sometimes user would ask like this. Can you suggest a birthday present for my son? What's the latest trends? Then the LLM receives the request and wonders, how do I make a smart recommendation like a concier? This questions requires more than just finding a specific items. It demands an understanding the context like what's the latest trends? On on uh recommending the items for the birthday present and making a personalized suggestions. This highlights the need for more sophisticated approach for handling personalized and trend aware recommendations. The solution here is combining the AI agents with the Vector Search practices. The UI agent takes the user query and triggers the Google Search to research the late strength. Then generate a bunch of queries for finding the interesting items and asking Search agent to search items. The Search agent acknowledges this and running a 20 queries in power.

[11:31]Then start generating related more specific queries like STEM toys for 10 or science kit, science kits for 10 and experiments kits for experiments for kids. These queries are then passed to the Vector Search, which can provide a much broader and more relevant set of results. This multient approach allows a more intelligent, much faceted search strategy going beyond simple keyword matching to provide a fully helpful recommendations.

[12:02]Okay, I'd like to show the actual demonstration of the combination of the AI agent and Vector Search. That is called Shoppers Concier. With this demo, you can ask any ambiguous or vague questions, right? birthday present for 10 years old son. Then now the AI agent uses the. Then now the AI agent uses the Vector Search and issues a couple of queries to get those results. You can take a look at what's going on under the foot by looking at the console. You can see there are three queries like toys and games and action figures are generated and ran against the 10 million items to find those results you are seeing. But now you can ask a deep research to the agent. deep research. Yes, with this deep research mode, the agent now uses the Google Search to make a research on what's what kind of the items people are buying for this kind of the query, like a birthday present for 10 years old son. Then with that, it defines the five different item categories and issues 20 different queries for each five item categories and get the results like this. And under the foot, you can see on the console that it generates the bunch of the queries, total 100 queries for single deep research request. So, rather than having the users typing their own queries on the uh the search box, uh you can use the AI agent to make a smart recommendations based on the research result from the Google Search. And the picks the best result from those 100, 100 queries, like this. Those are all diversified uh interesting inspiring result, you can see. This is this goes for beyond from the usual recommendation systems you would use uh with the e-commerce website. And also, because the AI agent has the capability of the understanding images, you can make a query by using your own images. For example, you can approach your own images for your room. Then the AI agent understands what's going on here. It's on home office setups and find the some suggested items like this.

[15:03]Now, that's a really cool-looking UI, Cass. Now tell us how do we implement this agent that we just saw? Thank you. Yeah, let's dive deeper into how you can build this by yourself. So, I have used the two technologies. One is the Agent Development Kit or ADK, and Vector Search product from Google. So, what is ADK? ADK is the new open source framework developed by Google and announced in April this year. This is an open source multi-agent framework by Google. It supports, of course, Gemini as the large language model, but it also supports the third-party models. And also, it supports the live audio and image streaming. So, the demo you have the seen, actually that is capable of have the live the audio voice communication with the user. Let's examine how the deep research mode is built with ADK and Vector Search. When the user UI agent takes the user request, it uses the Google Search for the grounding. Uh that learns what's going on uh for finding the items for the birthday present, uh by looking at the the current trends in the Internet. Then the UI agent asks the Search agent to generate 20 queries per item category. The agent repeats this for the five categories, total 100 queries for single search request. All queries use much model embeddings, task type embeddings and keyword embeddings, the practices I have discussed earlier. These 10 queries are sent to the Vector Search in power. So, you get the much faster result. Then with the result for the 100 queries, the Search agent performs multi-model item curation. It reviews the item images and descriptions one by one, just, you know, sharing the all the results with the user directly. The agent selects the items actually relevant to the user's intent and item category. So, usually those the 100 to 200 items you got with the search are filtered to under 50 items. Gemini curates item by analyzing images and user intent. The item images are passed to the Gemini as a curation prompt like this one. So, this the the image tiles are sent to the Gemini, and Gemini actually look at the actual the item images one by one and select most valuable items for the user. Okay, let's take a look at what kind of the code you would like to build something like the Shoppers Concierge demo by yourself. This is a published notebook sample, so you can take a look at in detail if you're interested. So, I'll skip those in details for now. And just start with installing the ADK. So, to get started with ADK, it's super simple. You can just pip install Google ADK and that's it. And you may want to uh import those required libraries at first. Also, you have to set those the environment variables to get started with the ADK by specifying the project IDs and the locations and so on. So, before diving our agent, uh before diving into the details of your agent, uh will be defining an test functions for the agent. Usually, agent will be running in a runtime environment, like agent agent engine product. But uh for this notebooks, we define a simple runtime called test agent function. Uh I don't discuss in details about what is the runtime for the agents, but uh take a look at the agent runtime documentations for details.

[18:59]So, here's the our the first definition of the our Shop agent. It's just a simple the basic agent with the Gemini 2.0 flash model without any external tools or search capability at these times.

[19:16]So, let's test it by using the test agent runtime. What kind of this site is this? Then agent would respond, I am a shop agent for an e-commerce site with millions of items. So, this is the foundation of the our demonstration. Now, we'll be adding the Vector Search capability, item search capability to the agent. To do so, at first, uh we'd like to define a call Vector Search function to call the Vector Search backend. The actual this is just an HTTP request uh the function. As we will be using the existing public Vector Search backend. So, we don't actually going to building the Vector Search index and the end point and so on. We just make an HTTP request to the existing backend. It takes this function takes the URL of the the endpoint, the actual query from the user and how many rows uh we want to return to the agent.

[20:20]So, let's define that. And now we will wrap the call Vector Search function with an ADK tool named find_shopping_items tool. So, this is just a wrapper function for the the previous function.

[20:41]But, thing is, by defining the proper function names and their signatures like the parameters and the doc string. So, the agents can take a look at those the function signatures, like function name, parameter names and doc strings to understand the functionality each tool can provide.

[21:03]So, you have to be very clear on on writing those doc string. Like this one, find shopping items for the e-commerce site. And that takes the arguments, which is queries, the list of the queries to run. And this function returns a dict of the following one property, status, or the items, uh which is list of the items found on the e-commerce site. So, by defining this kind of the two function, the agent can easily take uh use understand that and use that uh on the fly.

[21:40]So, let's try this tool by parsing the two sample queries. Cups with dancing people and dancing animals. And then it issues the actual queries the Vector Search backend, get and get the results like this. So, now it's ready to extend our shop agent with the search capability. And here we have added our one paragraph to the instruction. To find items use find_shopping_items tool by passing a list of queries and answer to the user with its items, name, description and image URL. And also another things we have added is the tools parameter when creating the agent object. By passing the find_items function. So that now the agent is aware of the existing of the find_items tool and use that uh when it it is required. Let's try the agent with cups with dancing figures. Then the agent thinks, oh, to solve this problem, uh you have to use the find shopping items tool and call that and passes the result to the user, like this.

[23:05]Now, we will be adding one more agent called the Research agent. That is a market researcher agent using the Google Search on generating a query based on the search result from the Google Search. Let's take a look at the actual instructions we passed to the agent. When you receive search request from an user, use Google Search tool to research on what kind of items people are purchasing for the user's intent. Then, generate five queries finding those items on the e-commerce site and return them. So, this is the role of this agent.

[23:45]And uh one big benefit you can get with ADK is that to use Google Search as the grounding source, you can just reverse to the Google Search tool. This is a built-in tools. So, you don't have to define it by yourself. And and so you can just write this line and import the Google Search, like this, and that's it. You can easily access to the Google Search as the grounding source. Let's test this. For the query like birthday present for 10 years old boy. It now made a query with the Google Search and generates some generated the five queries, like this. STEM kits, sports equipment, building sets, outdoor gear and games. Now, with those all capabilities, we can finalize the Shop agent. We define the instruction like this. First, the agent should do the market research using the Research agent, as we defined previously.

[24:50]So that the the agent, Research agent will generate the five queries and share those generated query with the user first and ask if they want to continue with the search. The second step is to find items using the find_shopping_items tool. So, by pressing those five queries to the this find_shopping_items tool, you can get the result from the Vector Search backend. And as you can see, in this demonstrations, we have used the research agent as a tool. This is design pattern so-called agent as a tool. So, rather than having the two agent as a pair multi-agent system, we use we are using the sub agent as a tool. So, the the main agent, the Shop agent, takes all the controller, control over the multi-agent system. So, let's just the the final Shop agent with these all capabilities. So, first, you when you ask with the query like, can you find a birthday present? Then it uses the Research agent to generate those queries by using Google Search. Those generated queries are like a STEM kits, Lego sets and remote control cars. And it ask you, do you want me to search for for items using these queries? So, you will say yes to that. Then the agent uses the find find shopping items tool to use the Vector Search backend to get your final result, as you see.

[26:32]So, with this demonstrations, we have shown a so-called generative recommendation capability. That uses the not only the Vector Search backend, it also uses the external tool, such as the Google Search tool. To extend the queries, uh based on the user's intent to get more uh the relevant and interesting and fun result from the e-commerce website.

[26:59]That was a great walkthrough, Cass. Thank you so much. Thank you. Now, I have a one question. Can you tell why we using uh the agent as a tool instead of a sub agent here? Like what why would you opt for this design pattern instead? Yeah, this could be interesting discussions. It should be defined by the what kind of user experience you want to provide. For this case, I really wanted to have the single agent to be just like the bring a role of the concierge for the user. Usually, the human concierge will be in the looking at the uh the books or maybe they can be using Google Search to get the result and summarize it to reply to the user. And just like that, I have chosen to use the the tool as the agent design.

[27:48]To have the UI agent to be represented in the understand everything. But it's depending on the your requirement for the user experience. You can also use the sub agent or peer agent design pattern so that you can pass the request to the another agent. And another agent can be in the front-end for the users as well. So, it's totally depending on what kind of requirement you have. That makes sense. Thank you. So, let's recap what we have done for building the Shoppers Concier demo. The there are three things we have done. First, using ADK for being a AI agent with the multi-modal multi-agent capability that has the real-time multi-modal communication capability. The second, generative recommendations, because each AI agent has the intelligence by using the Gemini. You can let the agent to use to Google search for grounding to make a research on the latest trends and generate a bunch of queries. And do the multimodal item curations by looking at the item images and description. And finally, those agents uses the Vector Search backend. That has the everything, every uh practices I have discussed earlier. Right, multimodal search, hybrid search, task type embeddings, and I also use the ranking API. So, that was the ideas and concepts, uh that is used for the Concier demo. Now, that was lovely, Cass. Why don't you give us more information and where to find more resources? Yeah, to get started with those technologies, there are two resources you can take a look at. First one is the Vector Search documentations. That is, uh you can find on the cloud.google.com. So, there are a bunch of the getting started tutorials, notebooks and the broad cost use cases, so please take a look at. And another one is the ADK documentations and samples. Again, there's another uh bunch of getting started uh documentations and samples and everything. And also, I have put the link of the my own samples uh for the the Shoppers Concier demo. So please take a look at on the description of the video. Thank you, Cass. Thank you everyone for watching this episode. We hope you really liked this and let us know in the comments what you think about this video and what we should be building next. Thank you. Thank you.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript