[0:01]Hello everyone and welcome to the Data A Express. If you are serious about becoming a data brick certified generative AI engineering associate,
[0:11]This video will definitely help you to get certified. I have packaged up to 100 must known solved practice questions. Discover every domain from Rag and Lang chain to model governance and prompt engineering.
[0:24]I can confidently tell you this is exact set of questions has been successfully used by most of my team members to pass the certification.
[0:32]This is a proven exam dump and a guaranteed guide to the success. And this one is dedicated to Logan and Gorlin who are asking me to post more pre video in my earlier video guide.
[0:44]Here is the format, each questions appears for 10 second, followed by 10 seconds for the answer and my clear technical justification.
[0:52]This is a perfect speed run format. Let's let's dive in and tackle these 100 questions and get you certified.
[0:57]Question-1 Question: A user submitted feedback indicating that a GenAI model's responses were accurate but came across as rude. What should the engineer investigate to resolve this issue? Options: (A) System prompt tone and instruction settings (B) Token limit constraints (C) Embedding dimensionality (D) Retrieval accuracy from vector store Correct Answers: Option A Explanation: The System Prompt is where engineers define the model's persona, behavior, and tone (e.g., 'You are a helpful, polite, and friendly assistant...'). Since the output is accurate but the tone is wrong, adjusting the system prompt instructions is the direct way to fix the conversational style. Question-2 Question: A multinational bank launches a GenAI-powered support assistant that must distinguish between customer service and fraud investigation queries. Which prompt design will most reliably enable the LLM to route queries to the correct process every time? Options: (A) Merge all logic in a single prompt without guidance. (B) Let the LLM guess intent from user phrasing. (C) Explicitly instruct the LLM to classify the intent (customer vs. fraud) before generating an answer. (D) Always default to customer support process. Correct Answers: Option C Explanation: Explicitly instructing the LLM to perform a classification (intent routing) step first is a core component of reliable prompt engineering. This ensures the correct downstream logic (e.g., calling the right tool or chain) is triggered, making the process more deterministic and reliable than relying on implicit guessing. Question-3 Question: A Generative AI Engineer is developing a pipeline on Databricks to identify and redact personally identifiable names from legal contracts before using them in downstream LLM applications. What is the most appropriate underlying NLP task to support this functionality? Options: (A) Summarization (B) Text Classification (C) Sentiment Analysis (D) Named Entity Recognition (NER) Correct Answers: Option D Explanation: Named Entity Recognition (NER) is the specific NLP task dedicated to identifying and classifying proper nouns, such as names (Person), locations (Location), and organizations (Org), within text. This is precisely what is needed to find and redact personally identifiable names (PII) Question-4 Question: A Generative AI Engineer is developing a model-serving endpoint that needs to validate and format user inputs before passing them to the model, and also adjust the model's outputs before returning them to the client. Which technique supports this requirement? Options: (A) Use pre- and post-processing logic with input/output transformation layers (B) Fine-tune the LLM to accept only valid inputs (C) Set a system prompt that controls all input/output formatting (D) Lower the temperature to reduce hallucinations Correct Answers: Option A Explanation: Pre- and post-processing layers are standard engineering patterns used to handle data governance, validation, serialization (e.g., JSON), and formatting outside of the core model inference step. This ensures reliable input/output transformations, which the LLM itself cannot reliably guarantee. Question-5 Question: An enterprise is deploying a hosted LLM on Databricks and wants to ensure only authorized employees from specific business units can access the model. What security configuration should be implemented? Options: (A) Assign access permissions using Unity Catalog and identity federation (B) Use Unity Catalog to track model lineage (C) Enable public access with API token authentication (D) Apply model endpoint rate limiting Correct Answers: Option A Explanation: Assign access permissions using Unity Catalog and identity federation Unity Catalog provides granular security and access control over all data and models (including served LLMs) in Databricks. Identity federation connects this control to the enterprise's existing user groups and security policies, ensuring only authorized BUs can access the hosted model Question-6 Question: A developer needs to prepare multilingual text for vectorization. What preprocessing step is essential? Options: (A) Merge all text into a single chunk (B) Use language-specific tokenizers (C) Remove language tags (D) Convert all text to English Correct Answers: Option B Explanation: A tokenizer is the critical first step in vectorization. A tokenizer trained on one language (e.g., English) will perform poorly on another (e.g., Japanese or Arabic), potentially splitting words incorrectly. Using a model or tokenizer specifically trained for multiple languages is essential for multilingual data Question-7 Question: A data scientist is ready to productionize their LLM by registering it to Unity Catalog using MLflow. What MLflow function allows this? Options: (A) mlflow.log_model() (B) mlflow.register_model() (C) mlflow.evaluate() (D) mlflow.set_registry_uri("databricks-uc") Correct Answers: Option B Explanation: In the MLflow Model Registry workflow, 'mlflow.log_model()' saves the model artifacts to persistent storage (like S3/ADLS), but 'mlflow.register_model()' formally adds the model to the Registry (Unity Catalog in this context), enabling version control and deployment management Question-8 Question: You're building a customer service chatbot in Databricks. The assistant must return personalized responses based on the user's past chat history. What should be the system's input and output design? Options: (A) Aggregated chat logs per customer -> Personalized summary at customer level (B) Full chat log from current session -> Single message response (C) Most recent query only -> Short direct answer (D) User name and age -> Summary of entire customer base Correct Answers: Option A Explanation: To provide a personalized response based on past history, the system needs the full context (Aggregated chat logs per customer) as input, and it should output a relevant response tailored to that customer's unique situation (Personalized summary at customer level) Question-9 Question: You're developing a Databricks-hosted GenAI assistant to answer compliance-related queries. For legal reasons, some responses must be strictly limited to "Yes" or "No" - without any further elaboration. What is the most effective way to ensure the model consistently adheres to this constraint? Options: (A) Use a few-shot approach with examples of yes/no answers (B) Reduce temperature to 0.0 and rely on model's default behavior (C) Use zero-shot prompting with a general instruction (D) Explicitly instruct the model in the system prompt to respond only with "Yes" or "No" Correct Answers: Option D Explanation: For mission-critical, high-stakes tasks where a specific format is a strict constraint (legal reasons), the most direct and reliable method is to use the system prompt to enforce the desired behavior rigidly ("Your ONLY response must be 'Yes' or 'No'. Do not add any other words.") Question-10 Question: You're building a vector search system on Databricks to support legal document lookup. Which factor should most influence your choice of embedding model? Options: (A) Trained on similar domain vocabulary (B) Outputs large vector sizes (e.g., 3072-dim) (C) Highest score on HuggingFace leaderboard (D) Lowest inference latency Correct Answers: Option A Explanation: For specialized domains like law, the ability of the embedding model to understand the nuances, jargon, and specific relationships between legal terms is paramount. A model trained on domain-specific vocabulary will produce significantly better semantic matches than a general-purpose model Question-11 Question: Your RAG-based assistant is returning incomplete or vague answers, despite the knowledge base containing relevant information. Which adjustment is most likely to improve retrieval accuracy? Options: (A) Use a more creative decoding strategy in the LLM (B) Decrease chunk size to under 100 tokens (C) Increase embedding vector dimension and chunk size (D) Switch from semantic to keyword search Correct Answers: Option C Explanation: Increasing embedding dimension allows the vector space to capture more complex semantic details, improving search quality. Increasing chunk size (up to a reasonable limit) provides more surrounding context for the embedding model to encode, leading to better-quality representations and thus more accurate retrieval Question-12 Question: A Generative AI Engineer is tuning prompts for a finance assistant that occasionally hallucinates. What instruction should they add to the prompt? Options: (A) Always answer confidently, even if unsure (B) Use only the provided context. If unsure, say you don't know. (C) Rephrase each answer to match a friendly tone (D) Summarize using your pre-trained finance knowledge Correct Answers: Option B Explanation: Hallucinations occur when the LLM generates information outside the provided context. The most effective prompt engineering technique to mitigate this is to explicitly instruct the model to be grounded (use only the provided context) and to employ an escape hatch (say "I don't know") if the question cannot be answered Question-13 Question: A developer wants to optimize latency in their RAG chain that uses multiple tools. What LangChain component should they use to manage sequential execution? Options: (A) ToolExecutor (B) RunnableSequence (C) PromptTemplate (D) AsyncRetriever Correct Answers: Option B Explanation: The LangChain Expression Language (LCEL) uses RunnableSequence to define a series of steps that must be executed in order, passing the output of one step as the input to the next. This is the standard pattern for composing chains and maximizing execution efficiency in a structured way Question-14 Question: A developer wants to inspect and modify the prompt sent to the LLM just before it executes. Which LangChain concept allows this? Options: (A) AgentExecutor (B) PromptTemplate (C) LLMChain (D) CallbackHandler Correct Answers: Option D Explanation: Callback Handlers provide hooks into various stages of a LangChain application's lifecycle, including immediately before and after the LLM is called (e.g., 'on_llm_start'). This mechanism is used for inspecting, logging, modifying, or auditing prompts just prior to execution Question-15 Question: You are developing a voice-enabled customer service assistant on Databricks that transcribes user speech into text. Which model should you choose to convert spoken input into accurate text across accents and environments? Options: (A) BERT (B) dbscr (C) Whisper (D) Vosk Correct Answers: Option C Explanation: Whisper (developed by OpenAI) is a leading and highly robust automatic speech recognition (ASR) model known for its high accuracy in transcribing diverse languages, accents, and noisy environments, making it the industry standard choice for general-purpose transcription tasks Question-16 Question: You're building a GenAI-powered assistant for healthcare professionals that can summarize research papers and ingest the latest publications. What is the most critical factor in selecting the appropriate foundation model? Options: (A) Supports image-to-text conversion (B) Supports multilingual output (C) Trained or fine-tuned on medical domain data (D) High token limit to support long papers Correct Answers: Option C Explanation: For specialized fields like healthcare, the model must understand the unique domain-specific vocabulary, contexts, and acronyms. A model trained on general web data will often fail to provide accurate summaries or analysis of medical research compared to one fine-tuned on relevant data Question-17 Question: A developer is building a Databricks GenAI assistant that can respond to user queries with real-time weather information. The LLM should decide when and how to call a weather API based on the user input. Which LangChain component should the developer use? Options: (A) Tool and AgentExecutor (B) PromptTemplate with chained calls (C) RetrievalQA (D) LLMSingleActionAgent Correct Answers: Option A Explanation: In LangChain, an AgentExecutor is the orchestrator that uses the LLM's reasoning abilities to decide which Tool (like the weather API) to call, what arguments to pass, and when to stop. This combination enables dynamic tool use Question-18 Question: Before integrating a summarization model from Hugging Face into a production pipeline, what should the engineer review? Options: (A) The total number of GitHub stars the model has received (B) The default temperature setting of the summarization pipeline (C) The model's license, intended use cases, and known limitations (D) Whether the model supports retrieval-based search Correct Answers: Option C Explanation: When moving a model (especially an open-source one from a platform like Hugging Face) into a production environment, legal and ethical compliance is paramount. The license dictates commercial use, and limitations/intended use guide safe deployment Question-19 Question: A content strategist is working on a system to automatically generate catchy blog headlines. The requirement is for these headlines to be under 10 words and written in title case. Which prompt format should they use to consistently elicit the desired output? Options: (A) Extract key phrases (B) Provide a headline in under 10 words with title case (C) List important topics (D) Summarize the post Correct Answers: Option B Explanation: Good prompt engineering for structured, creative, and constrained output requires explicit instructions covering all constraints. Option B directly states the format (headline), the length constraint (under 10 words), and the styling constraint (title case) Question-20 Question: A GenAI agent needs to use tools like calculators and web search. How should these be integrated so the model can call them dynamically during execution? Options: (A) Register tools via Tool class and orchestrate with AgentExecutor (B) Fine-tune the model on tool usage data (C) Use RetrievalQA chain with embedded logic (D) Pre-compute answers and add them to the system prompt Correct Answers: Option A Explanation: This is the standard pattern in LangChain (and similar agent frameworks) for enabling an LLM to dynamically interact with external capabilities. The Tool class defines the external function, and the AgentExecutor uses the LLM to decide when to call it Question-21 Question: A customer asks a support chatbot, "Where's my order?" The engineering team wants to provide personalized responses. What should be included in the prompt? Options: (A) Inject customer-specific data (e.g., order ID, shipping status) (B) A generic delivery time range (C) Add a polite system prompt (D) Fine-tuned model based on order history Correct Answers: Option A Explanation: To provide a personalized response to an order query, the model needs the relevant, specific data (like the order ID and its current status). This is achieved by retrieving this data from a database and injecting it directly into the prompt as context Question-22 Question: A Generative AI Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative AI Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from. Which will fulfill their need? Options: (A) context length 512: smallest model is 0.13GB with and embedding dimension 384 (B) context length 514: smallest model is 0.44GB and embedding dimension 768 (C) context length 2048: smallest model is 11GB and embedding dimension 2560 (D) context length 32768: smallest model is 14GB and embedding dimension 4096 Correct Answers: Option A Explanation: The primary constraint is that cost and latency are prioritized. This generally means selecting the smallest model size (GB) and the smallest context length that can still accommodate the data chunks (which are max 512 tokens). Option A meets the context length requirement (512 tokens) and is the smallest available model by a large margin, ensuring the lowest latency and cost Question-23 Question: A Generative AI Engineer is assessing the responses from a customer-facing GenAI application that they are developing to assist in selling automotive parts. The application requires the customer to explicitly input account_id and transaction_id to answer order questions about products that the company carries. However, after initial launch, the customer feedback was that the application did well on answering order and billing details, but failed to accurately answer shipping and expected arrival date questions. Which of the following receivers would improve the application's ability to answer these questions? Options: (A) Create a vector store that includes the company shipping policies and payment terms for all automotive parts (B) Create a feature store table with transaction_id as primary key that is populated with invoice data and expected delivery date (C) Provide examples data for expected arrival dates as a tuning dataset, then periodically fine-tune the model so that it has updated shipping information (D) Amend the chat prompt to input when the ordered was placed and instruct the model to add 14 days to that as no shipping method is expected to exceed 14 days Correct Answers: Option B Explanation: Shipping and arrival dates are specific, structured, and constantly changing data points tied to a transactional database. This type of information is best retrieved from a structured database or Feature Store using the provided keys (transaction_id), not from an unstructured vector store (A) or static fine-tuning data (C). The Feature Store allows for fast, low-latency lookups of factual data required for the query. Question-24 Question: Which steps are essential when writing chunked text into Delta Lake tables in Unity Catalog? (Select TWO) Options: (A) Writing all chunks as a single file (B) Partitioning data based on categories (C) Avoiding partitions for simplicity (D) Structuring chunks to support efficient querying Correct Answers: Option B & D Explanation: (D) Structuring chunks: The schema of the Delta table must be carefully defined to include fields like chunk text, metadata (source document ID, page number), and often the embedding vector itself, all of which are necessary for the downstream RAG application to perform retrieval and citing. (B) Partitioning data: Partitioning (e.g., by document type or category) is a Delta Lake best practice that speeds up query performance by allowing the engine to skip large amounts of irrelevant data, which is crucial for retrieval during RAG inference. Question-25 Question: A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that have been scanned and saved as image files in formats like .jpeg or .png. They want to develop a solution using the least amount of lines of code. Which Python package should be used to extract the text from the source documents? Options: (A) beautifulsoup (B) scrapy (C) pytesseract (D) pyquery Correct Answers: Option C Explanation: Scanned image files (.jpeg, .png) require Optical Character Recognition (OCR) to extract the text. Pytesseract is a widely used Python wrapper for Google's Tesseract OCR engine, making it a common, low-code solution for this specific task. Options A, B, and D are used for web scraping and HTML parsing Question-26 Question: A Generative AI Engineer is loading 150 million embeddings into a vector database that takes a maximum of 100 million. Which TWO actions can they take to reduce the record count? Options: (A) Increase the document chunk size (B) Decrease the overlap between chunks (C) Decrease the document chunk size (D) Increase the overlap between chunks (E) Use a smaller embedding model Correct Answers: Option A & B Explanation: The number of records in the vector database is equal to the number of chunks. (A) Increase the chunk size: Embedding more content per chunk directly reduces the total number of chunks created from the same corpus. (B) Decrease the overlap: Overlap creates redundant chunks. Reducing or eliminating overlap directly reduces the total number of chunks. Question-27 Question: Which of the following are key considerations when identifying source documents for a RAG application? (Select TWO) Options: (A) Information density (B) Document size (C) Document format (D) Document relevance to the task Correct Answers: Option A & D Explanation: (D) Document relevance: The RAG system will only be as good as the information it retrieves. Documents must be directly relevant to the questions the application is designed to answer (e.g., policy manuals for a policy bot). (A) Information density: Documents with high information density (minimal boilerplate, tables, or repetition) are preferred because they result in high-quality, focused chunks that improve the semantic search signal and reduce indexing noise Question-28 Question: Which of the following is the most effective way to design a prompt that emits a specific format in the response? Options: (A) Provide clear examples of the desired format (B) Use short and ambiguous instructions (C) Avoid specifying any format in the prompt (D) Use multiple tasks in a single prompt Correct Answers: Option A Explanation: The most robust technique for controlling the LLM's output structure is Few-Shot Prompting, where you provide the model with one or more examples (<input> -> <output in desired format>). This gives the model a concrete pattern to follow, which is more reliable than just abstract instructions. Question-29 Question: Which of the following are essential components when designing a prompt for a Generative AI model? (Select TWO) Options: (A) Clear intent (B) Multiple ambiguous instructions (C) Examples of expected output (D) Irrelevant details to increase model flexibility Correct Answers: Option A & C Explanation: (A) Clear intent: A direct statement of the goal (e.g., "Summarize the following text," or "Classify the sentiment"). (C) Examples of expected output: Concrete examples (Few-Shot) or a description of the desired format (Zero-Shot) to guide the response, which dramatically improves output quality and adherence to constraints. Question-30 Question: A Generative AI Engineer would like to build an application that can update a memo field that is about a paragraph long to just a single sentence gist that shows intent of the memo field, but fits into their application front end. With which Natural Language Processing task category should they evaluate potential LLMs for this application? Options: (A) text2text Generation (B) Sentencizer (C) Text Classification (D) Summarization Correct Answers: Option D Explanation: The task involves reducing a longer text (paragraph memo) into a much shorter, concise version (single sentence gist) while retaining the key meaning (intent). This is the definition of the NLP task category of Summarization Question-31 Question: Which of the following strategies would best help in designing a prompt that leads to a well-formatted response in an LLM? (Select TWO) Options: (A) Provide vague instructions for flexibility (B) Use structured steps for formatting (C) Use simple language with clear directives (D) Incorporate multiple unrelated tasks in a single prompt Correct Answers: Option B & C Explanation: (C) Clear directives: Simple, unambiguous language is fundamental to reducing ambiguity and ensuring the LLM understands the exact requirement. (B) Structured steps: For complex formatting (e.g., producing a multi-line output or specific JSON), instructing the LLM to follow a step-by-step process helps it organize its output and adhere to the structural constraints more consistently. Question-32 Question: A team is evaluating LLMs for a multilingual customer support chatbot. What attribute is most critical for model selection? Options: (A) High context length (B) Fine-tuned on support scenarios (C) Multilingual training data and token vocabulary (D) Low temperature for deterministic output Correct Answers: Option C Explanation: For a multilingual system, the model must be trained on data from multiple languages and possess a large token vocabulary that effectively covers the language mix being supported. Without this foundation, accuracy and fluency in non-primary languages will be severely limited Question-33 Question: You're using a foundation model on Databricks to extract structured information from from customer support emails. The business team needs only three fields: date, sender_email, and subject. What is the best prompt to extract these fields in JSON format? Options: (A) From the following email, extract only the sender_email, subject, and date. Return the result as a JSON object with exactly these keys. (B) Extract the sender, subject, and date from the following email. Return the output as a list. (C) Summarize the email in three bullet points. Include the sender and topic. (D) Read the email carefully. Return the most important details in a readable summary. Correct Answers: Option A Explanation: This is the most effective prompt because it combines three key prompting best practices: Explicit instruction ("extract only"), Constraint setting ("Return the result as a JSON object"), and Schema definition ("with exactly these keys"). Question-34 Question:A developer is creating a multi-modal assistant that can interact using text and images. Which LangChain tools are relevant for this setup? (Choose 2 options) Options: (A) MultiModalPrompt (B) BaseRetriever (C) ChatPromptTemplate (D) RunnableParallel Correct Answers: Option A & D Explanation: MultiModalPrompt: Necessary to handle and format both text and image inputs (or outputs) within the LLM chain. RunnableParallel: This is key for optimizing chains involving multiple data streams (like text and image processing) because it allows different components to be processed concurrently (in parallel), which is often desired for multi-modal systems Question-35 Question:A developer wants to format multi-turn dialogue history before injecting it into a prompt. What should they use? Options: (A) ChatPromptTemplate (B) TextSplitter (C) PromptValue (D) RunnableSequence Correct Answers: Option A Explanation: The ChatPromptTemplate (or 'MessagesPlaceholder' within it) is specifically designed in LangChain to handle and format the structure of conversation history, converting lists of chat messages (e.g., Human, AI, System) into the format required by the underlying chat model API Question-36 Question:A data engineer is developing a Retrieval-Augmented Generation (RAG) pipeline on Databricks. The internal knowledge base is stored on SharePoint, and the engineer needs to retrieve and embed document content from this source. Which document loader should the engineer use? Options: (A) langchain.document_loaders.DirectoryLoader (B) langchain.document_loaders.UnstructuredSharePointLoader (C) langchain.document_loaders.PyPDFLoader (D) langchain.document_loaders.WebBaseLoader Correct Answers: Option B Explanation: LangChain offers dedicated document loaders for common enterprise data sources. The 'UnstructuredSharePointLoader' is specifically designed to connect to and extract text from documents stored within a SharePoint environment Question-37 Question:What's a best practice when cleaning text data for use in foundation model embeddings? Options: (A) Translate all input into English (B) Strip all punctuation and numbers (C) Preserve semantic meaning and structure (D) Convert text to binary vectors Correct Answers: Option C Explanation: Embeddings rely on the context and semantics of the surrounding text to generate high-quality vector representations. Aggressively stripping features like punctuation, numbers, or translating all text (unless the model is only monolingual) can destroy the subtle semantic meaning that the embedding model needs to capture Question-38 Question:You're designing a semantic search pipeline on Databricks using transformer-based embeddings. Which similarity metric should you use for scoring relevance between query and document embeddings? Options: (A) Cosine similarity (B) Edit distance (C) Levenshtein distance (D) Jaccard index Correct Answers: Option A Explanation: Cosine similarity measures the cosine of the angle between two vectors and is the standard metric used in semantic search with transformer-based embeddings. It determines how closely two documents (or a query and a document) are related in the high-dimensional vector space, focusing on orientation rather than magnitude Question-39 Question:A team is preparing documents for indexing in a RAG pipeline. Which preprocessing steps will improve the quality of generated embeddings? (Choose 3 options) Options: (A) Remove HTML tags (B) Remove duplicate whitespace (C) Normalize unicode characters (D) Retain stopwords Correct Answers: Option A, B & C Explanation: These steps clean up "noise" (like extra whitespace, messy encoding, or non-text markup) that does not contribute to semantic meaning but can confuse the embedding model. Stopwords are often retained because their presence is crucial for grammatical and semantic context in modern transformer embeddings Question-40 Question:A data engineer is setting up a Retrieval-Augmented Generation (RAG) pipeline on Databricks. The requirement is to match user queries to relevant source documents, convert the retrieved data into prompts, and then generate an answer using a foundation model. What is the correct sequence of components in this pipeline? Options: (A) Query -> Retriever -> Prompt Template -> LLM (B) Query -> Prompt Template -> Retriever -> LLM (C) Query -> LLM -> Retriever -> Prompt Template (D) LLM -> Prompt Template -> Retriever -> Query Correct Answers: Option A Explanation: This is the canonical RAG chain sequence: the user Query is used by the Retriever to fetch relevant context. The retrieved context is then formatted along with the original query into a Prompt Template. Finally, this combined prompt is sent to the LLM for answer generation Question-41 Question: A developer is outlining the deployment steps for a new RAG application. What is the correct sequence to bring the app from chunked data to a live endpoint? Options: (A) Create vector embeddings -> Store in vector index -> Deploy API endpoint -> Build prompt template (B) Chunk text -> Generate embeddings -> Store in vector store -> Build RAG chain -> Deploy to endpoint (C) Store in vector index -> Create embeddings -> Build prompt -> Serve LLM endpoint (D) Generate prompt -> Train embedding model -> Create vector index -> Deploy application Correct Answers: Option B Explanation: This sequence reflects the necessary steps: data must first be chunked, then converted to embeddings, then persisted in a vector store (index). Only then can the RAG chain (which performs retrieval) be built and finally deployed to a serving endpoint Question-42 Question: You're building a GenAI system in Databricks to assist with root cause analysis using two files: 'call_log' (call transcripts) and 'rca_data.json' (structured issues and fixes). Which files should you use to generate insights effectively? Options: (A) Only use call_log for both input and output (B) Use call_log as input and rca_data.json to guide output generation (C) Use only rca_data.json to train a summarization model (D) Neither file is useful; rely on real-time input from support agents only Correct Answers: Option B Explanation: The 'call_log' (transcripts) provides the unstructured context needed for the problem. The 'rca_data.json' (structured issues/fixes) acts as a high-quality, grounded knowledge base that the LLM can use to generate accurate, useful, and structured insights (like potential fixes) based on historical data Question-43 Question: A Generative AI Engineer is indexing a large corpus of documents into a vector database hosted on Databricks. The database has a strict limit on the number of records it can hold. The current chunking configuration is generating too many small chunks, pushing the system over its storage capacity. Which adjustment should the engineer make to resolve this issue? Options: (A) Use a compression algorithm to reduce the embedding size (B) Reduce the chunk size to improve embedding quality (C) Lower the vector dimensionality to save on vector database costs (D) Increase chunk size and combine adjacent text segments to reduce total chunks Correct Answers: Option D Explanation: Storage limits based on the number of records directly correspond to the number of chunks. By increasing the chunk size and reducing overlap, you embed more text per record, thus drastically reducing the total number of records (chunks) stored in the vector database while minimizing quality loss Question-44 Question: A data engineer on a GenAI team has chunked and preprocessed raw text from thousands of corporate documents. The next step is to persist these chunks in a format that supports fast retrieval during inference and complies with enterprise governance policies. What is the best approach to store this data in Databricks? Options: (A) Store the data in a CSV file for compatibility with spreadsheet tools (B) Write the chunks into a Delta Table managed by Unity Catalog (C) Use Pickle files for fast binary loading in Python-based pipelines (D) Save the chunks as plain text files in an S3 bucket Correct Answers: Option B Explanation: Delta Lake provides fast, versioned, and reliable storage for large datasets, which is ideal for RAG chunks. Unity Catalog adds the necessary layer of enterprise governance, access control, and lineage required for production data in Databricks Question-45 Question: An engineering team is tasked with summarizing thousands of documents overnight using a scheduled pipeline. Which serving approach should they use? Options: (A) Unity Catalog-registered model with MLflow autologging (B) Batch inference using MLflow in a scheduled job (C) Hugging Face hosted model API (D) Real-time model serving endpoint Correct Answers: Option B Explanation: The task involves processing a large volume of data (thousands of documents overnight/scheduled). This defines a Batch job, not real-time serving. Using MLflow ensures the specific model and parameters are consistently used and tracked within the scheduled environment Question-46 Question: A Generative AI Engineer wants to evaluate the performance of two summarization models. Which metrics are appropriate for this task? (Choose 3 options) Options: (A) ROUGE-L (B) BLEU (C) Human preference score (D) Mean Squared Error (MSE) Correct Answers: Option A, B & C Explanation: ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy) are standard quantitative metrics used to measure the lexical overlap (similarity of words/phrases) between the generated summary and a reference summary Human preference score (or human evaluation) is essential for subjective tasks like summarization, as it captures fluency, coherence, and relevance which automated metrics often miss. MSE is for regression tasks Question-47 Question: A developer is building a retrieval system using an LLM with a limited context window of 512 tokens. What chunking approach will optimize accuracy and avoid truncation? Options: (A) Disable chunking and send full documents (B) Use smaller overlapping chunks and sliding windows (C) Set chunk size to 512 tokens and truncate excess data (D) Use large chunks (500+ tokens) Correct Answers: Option B Explanation: A limited context window (512 tokens) means the model can only "see" a very small amount of text at a time. Small chunks ensure you retrieve only highly relevant information, and overlapping chunks (sliding windows) prevent the semantic context from being split across chunk boundaries, optimizing both retrieval and LLM processing Question-48 Question: You are building a GenAI assistant in Databricks that handles two kinds of queries - product-related (based on unstructured manuals) and inventory-related (from structured databases) - What is the ideal architectural approach to handle these distinct data sources? Options: (A) Use RAG for product queries and semantic keyword matching for inventory (B) Use RAG with vector search for product queries, and NLQ-to-SQL for inventory queries (C) Use RAG-based retrieval for both product and inventory queries (D) Use NLQ-to-SQL for both types of queries Correct Answers: Option B Explanation: Unstructured data (manuals) is best handled by RAG (vector search). Structured data (databases/inventory) is best handled by Natural Language Query to SQL (NLQ-to-SQL), which translates the user's question into a query that can retrieve the exact inventory numbers, combining the strengths of both architectural patterns Question-49 Question: You are building a multi-component GenAI application on Databricks that includes prompt templates, retrievers, tools, and LLM chains. To streamline the orchestration and integration of these components, your team considers adopting a composite framework like LangChain. When evaluating such a library, which two factors are most important to consider? Options: (A) Accuracy vs speed tradeoff of the underlying model (B) Stability and ability to manage complexity (C) Whether it supports mobile deployment (D) Integration ability with various tools and frameworks Correct Answers: Option B Explanation: Orchestration frameworks like LangChain are chosen specifically for managing the interaction between multiple components (Complexity) and ensuring they work together reliably in production (Stability). Their utility also depends on being able to connect to the various LLM providers, data stores, and APIs used in the application (Integration Ability) Question-50 Question: An AI developer is building a GenAI-powered tool on Databricks to prioritize incoming emails by urgency. The model must output one of several fixed categories: 'urgent', 'normal', or 'low'. What is the most appropriate description of this model's output behavior? Options: (A) Multi-class classification (B) Named entity recognition (C) Summarization (D) Text generation Correct Answers: Option A Explanation: The task requires assigning an input (email text) to one of several mutually exclusive, predefined categories ('urgent', 'normal', 'low'). This is the definition of a multi-class classification task Question-51 Question: A product team is designing a GenAI-powered tool on Databricks that transforms lengthy user-generated reviews into concise one-sentence insights to be displayed on product pages. When choosing a model or API function for this use case, which task type should the team select? Options: (A) Embedding generation (B) Summarization (C) Named entity recognition (D) Text classification Correct Answers: Option B Explanation: The core requirement is to reduce a lengthy text (review) into a concise one-sentence insight. This process of condensing information while retaining key meaning is precisely the definition of summarization Question-52 Question: You are developing a GenAI-powered customer service chatbot in Databricks. The requirement is for the chatbot to analyze customer sentiment in real time (e.g., anger, loyalty, fear) and adapt the tone of its response accordingly. Which design approach should you take? Options: (A) Use a static response template for all user queries to maintain consistency (B) Use embeddings alone to determine tone, without sentiment analysis (C) Append a fixed tone instruction to every prompt regardless of context (D) Analyze sentiment from the incoming message, then dynamically inject tone instructions into the system prompt Correct Answers: Option D Explanation: The response tone must adapt based on the customer's sentiment, meaning the instruction cannot be static. The best approach is a multi-step pipeline where an initial analysis (sentiment classification) informs the final generation step by modifying the system prompt (e.g., "The customer is angry; respond apologetically and calmly") Question-53 Question: A team is comparing two summarization models. One model shows a significantly higher ROUGE-L score. What can they conclude? Options: (A) Model A is faster and more efficient than Model B (B) Model A uses larger vector embeddings for document chunking (C) Model A has lower temperature and returns shorter outputs (D) Model A generates summaries with better lexical overlap and structural similarity to the references Correct Answers: Option D Explanation: ROUGE-L (Longest Common Subsequence) is a metric that measures the longest sequence of words shared between the generated summary and the reference summary. A higher score directly indicates better lexical overlap and captured structural similarity compared to the ground truth Question-54 Question: An engineer notices poor semantic match results in their RAG app. Which preprocessing step is most likely missing? Options: (A) Normalizing data types using Spark (B) Ensuring consistent formatting and whitespace (C) Lowercasing and stemming all text (D) Removing HTML tags and stopwords Correct Answers: Option D Explanation: Poor semantic match often stems from "noise" in the index that throws off the embedding model. HTML tags and unnecessary stopwords (while sometimes retained, can be removed to reduce noise in complex documents) add tokens that don't contribute to core meaning, reducing the purity of the semantic vector Question-55 Question: An engineer needs to implement semantic search on a Databricks Vector Search index to retrieve contextually similar chunks for generation. What command should be used? Options: (A) vector_index.get_nearest_neighbors(query_embedding, limit=5) (B) vector_index.semantic_match(text=query, k=5) (C) vector_index.query(embedding=query_embedding, top_k=5) (D) vector_index.search("keyword-based input", top_k=5) Correct Answers: Option C Explanation: Databricks Vector Search uses the _.query_ method for vector-based (semantic) search. This method requires a pre-computed vector ('embedding=query_embedding') and returns the nearest neighbors, which is the definition of semantic search Question-56 Question: A developer using LangChain needs to bind a prompt to a specific LLM to enable basic interactions in their application. Which class should they use? Options: (A) PromptTemplate (B) RetrievalQA (C) LLMChain (D) ChatOpenAI Correct Answers: Option C Explanation: In LangChain, an LLMChain is the simplest sequence, consisting of a PromptTemplate (to format input) and an LLM (to generate output). It represents the binding of a simple prompt to a specific large language model Question-57 Question: A customer service chatbot built using a Retrieval-Augmented Generation (RAG) architecture on Databricks is failing to answer user questions about refunds. After reviewing the vector index and source content, the engineer discovers that the refund policy was never included in the indexed knowledge base. Which action should the engineer take to resolve this issue? Options: (A) Increase the chunk size in the current retrieval pipeline (B) Lower the temperature setting to encourage more factual responses (C) Replace the LLM with a newer version that supports better refund-related reasoning (D) Add the refund policy document to the indexed knowledge base and re-embed it Correct Answers: Option D Explanation: RAG's core purpose is to answer questions based on its knowledge base. If the essential knowledge (the refund policy) is missing from the index, no amount of model tuning (B, C) or chunking changes (A) will help. The fundamental data must be added and indexed first Question-58 Question: A team wants to prototype an LLM solution without managing model infrastructure. They decide to use Databricks-hosted models. What service should they leverage? Options: (A) Databricks Unity Catalog (B) MLflow Model Registry (C) Databricks Vector Search (D) Databricks Foundation Model APIs Correct Answers: Option D Explanation: The Foundation Model APIs (or Model Serving Endpoints for Foundation Models) provide access to state-of-the-art LLMs (like DBRX, Llama 2, Mistral, etc.) as fully managed endpoints, removing the burden of setting up and scaling the model inference infrastructure Question-59 Question: An engineer is preparing prompt/response examples to fine-tune a foundation model for the task of summarizing customer reviews. Which example is most appropriate for this fine-tuning task? Options: (A) Prompt: Summarize this review: 'The delivery was delayed but the product was exactly as described. I'd buy again.' Response: Product matched description, delayed delivery, customer would repurchase. (B) Prompt: Translate this review into Spanish. Response: La entrega fue retrasada pero el producto fue exactamente como se describió... (C) Prompt: What is the sentiment of this review? Response: Positive (D) Prompt: How many stars did the customer give? Response: 4 out of 5 Correct Answers: Option A Explanation: Fine-tuning requires the training examples to match the desired input and output format of the target task. Since the goal is summarization, the input must be the text to summarize, and the output must be the desired summary. Options B, C, and D represent translation, classification, and extraction tasks, respectively Question-60 Question: A GenAI assistant needs to support both document-based answers and structured SQL queries. Which design pattern fits this requirement? Options: (A) Conversational memory chain (B) Multi-agent workflow (C) RetrievalQA chain (D) Router chain Correct Answers: Option D Explanation: A Router chain (or multi-chain architecture) uses an initial LLM call to classify the user's intent (e.g., "Is this a SQL query or a document query?") and then forwards the query to the correct specialized downstream chain (e.g., the NLQ-to-SQL chain or the RAG chain) Question-61 Question: A team is building a GenAI assistant and must ensure that Personally Identifiable Information (PII) is removed before text is stored in a vector index. What should they do? Options: (A) Set max chunk size to 500 tokens (B) Hash PII using SHA256 (C) Encode all content with base64 (D) Use a rule-based regex engine before chunking Correct Answers: Option D Explanation: PII must be redacted/removed from the text before it is indexed, as embeddings capture the semantic meaning of the words, which includes the PII itself. A rule-based engine (like regex) is a fast, reliable, and deterministic way to identify patterns (like phone numbers, emails) for removal or masking prior to processing Question-62 Question: An engineer is building a GenAI-powered logistics assistant that provides estimated arrival dates. To match internal systems, dates must be in MM/DD/YYYY format. What's the best prompt technique to enforce this? Options: (A) Post-process the output to reformat dates using regex (B) Use a zero-shot prompt with general instructions (C) Include explicit format instructions in the prompt template (D) Fine-tune the model on logistics data Correct Answers: Option C Explanation: To reliably enforce a specific, rigid output format, the instructions must be explicitly included in the prompt (usually the system prompt or user prompt). While post-processing (A) is a fallback, getting the model to produce the correct format directly is more efficient and reliable Question-63 Question: A retail company's RAG system is at embedding capacity. What TWO changes will most effectively reduce the number of document embeddings, without major quality loss? (Select 2 options) Options: (A) Increase chunk size so more content is embedded per chunk. (B) Embed every word as a chunk. (C) Only index document titles. (D) Reduce overlap between document chunks Correct Answers: Option A & D Explanation: The total number of embeddings is equal to the total number of chunks. Increasing chunk size (A) means fewer chunks are created per document. Reducing overlap (D) eliminates redundant chunks that cover the same information multiple times. Both reduce the total record count (embedding capacity) while largely preserving semantic context Question-64 Question: A QA team is reviewing a GenAI app and wants to prevent it from responding to unethical or illegal prompts. What should the team implement? Options: (A) Increase chunk overlap (B) Fine-tune the model on ethical responses (C) Reduce temperature to reduce creativity (D) Add guardrails to filter unsafe inputs and outputs Correct Answers: Option D Explanation: Guardrails (or safety layers) are mandatory in production GenAI systems to detect and block malicious or harmful inputs (jailbreaks, dangerous content) and filter unsafe outputs, ensuring compliance with safety policies. This is a dedicated security measure, while other options are for performance or accuracy Question-65 Question: You are designing a GenAI solution in Databricks that processes call transcripts. Your goal is to identify if a customer discussed product availability and return a simple 'Yes' or 'No'. Which approach is most effective? Options: (A) Use multi-class classification with topics like Billing, Support, Inventory (B) Use RAG pipeline to retrieve similar historical calls and mirror results (C) Use a prompt that instructs the LLM to analyze and return only 'Yes' or 'No' (D) Summarize the full transcript in natural language Correct Answers: Option C Explanation: The goal is a simple, binary (Yes/No) output based on text analysis. This is simpler and more effective than complex classification (A) or RAG (B) Question-66 Question: A Generative AI Engineer wants to build an LLM-based solution to help a restaurant improve its online customer experience with bookings by automatically handling common customer inquiries. The goal of the solution is to minimize escalations to human intervention and phone calls while maintaining a personalized interaction. To design the solution, the Generative AI Engineer needs to define the input data to the LLM and the task it should perform. Which input/output pair will support their goal? Options: (A) Input: Online chat logs; Output: Group the chat logs by users, followed by summarizing each user's interactions (B) Input: Online chat logs; Output: Buttons that represent choices for booking details (C) Input: Customer reviews; Output: Classify review sentiment (D) Input: Online chat logs; Output: Cancellation options Correct Answers: Option A Explanation: To achieve personalized interaction and minimize escalations, the LLM needs the full context of a customer's history. Summarizing individual user interactions from chat logs provides a concise, personalized profile that the LLM can leverage to maintain context and resolve complex issues without human hand-off Question-67 Question: A Generative AI Engineer has been asked to design an LLM-based question-answering application for HR documentation. Which of the following approaches is best suited for efficient and accurate retrieval? Options: (A) Use a search index and retrieve the most relevant document, then prompt the LLM. (B) Calculate averaged embeddings for each HR document, compare embeddings during retrieval. (C) Create an interaction matrix of historical employee questions and answers, use collaborative filtering. (D) Split HR documentation into chunks, embed into a vector store, use the employee question to retrieve the most relevant chunk, and use the LLM to generate a response. Correct Answers: Option D Explanation: Chunking and Vector Search: HR documentation tends to be lengthy, so splitting into smaller, manageable chunks is essential. These chunks are embedded into a vector store (a database that stores vector representations of each chunk). Each query is transformed into an embedding, which is compared to the stored vectors for efficient similarity-based retrieval Question-68 Question: A company needs to generate highly confidential answers where external data transmission is not allowed. Which LLM model is most appropriate? Options: (A) OpenAI GPT-4 (B) Google Bard (C) Llama-2-70b-chat (D) BGE-large-en Correct Answers: Option C Explanation: Llama-2-70b-chat is an open-source LLM that can be deployed in a secure, private environment, ensuring that data is not transmitted to third parties. This aligns with requirements for high confidentiality and compliance Question-69 Question: What is the best practice when using original data to train a model, especially when it includes sensitive or proprietary information? Options: (A) Use data that is explicitly labeled as having legal risk. (B) Always use data with an open license and ensure licensing is tracked. (C) Use data that is explicitly labeled as confidential. (D) Reach out to the data curators directly after you have started using the trained model to let them know. Correct Answers: Option B Explanation: Use data that is appropriately licensed with an open license and adhering to the license terms is critical. It's essential to ensure compliance with legal requirements Question-70 Question: A data scientist wants to provide more information when referencing a Delta table for feature engineering. What is the best practice? Options: (A) Use a short name for the Delta table. (B) Use a fully qualified name for the Delta table. (C) Use a random name for the Delta table. (D) Use a nickname for the Delta table. Correct Answers: Option B Explanation: Use a fully qualified name (database.schema.table) ensures clarity and avoids confusion, especially in environments with multiple databases and schemas. Question-71 Question: What is the best way to partition a vector search index for news articles? Options: (A) Split articles by 10 day blocks and add the block closest to the query. (B) Include metadata grouping for each article based on source and topic. (C) Group the articles by the vector search model rather than topic. (D) Create separate indexes by source and return the closest article. Correct Answers: Option B Explanation: Partitioning increases setup complexity, but metadata filtering is preferred. Metadata filtering is efficient and flexible for direct query-time filtering Question-72 Question: What is the benefit of metadata filtering in vector search? Options: (A) Search results only on embeddings, ignoring structured filtering. (B) Enables filtering on document attributes like date or category for precise retrieval. (C) Pass the query multiple times to the vector index and return the best articles. (D) Search relies solely on topic modeling. Correct Answers: Option B Explanation: Metadata filtering allows for precise filtering on attributes like date or category, improving retrieval accuracy Question-73 Question: What is the best practice for managing model versions and access in Databricks? Options: (A) Store models in notebooks and manually version. (B) Use MLflow to log models and register with Unity Catalog. (C) Train a duplicate training pipeline in every workspace. (D) Export models as pickles and store in cloud storage. Correct Answers: Option B Explanation: MLflow model registry with Unity Catalog provides cross-workspace and cross-catalog access, ensuring proper versioning and governance Question-74 Question: Which metric is commonly used for evaluating vector search accuracy? Options: (A) Cosine similarity of embeddings (B) Bilingual Evaluation Understudy (BLEU) (C) ROUGE-L (D) Levenshtein distance Correct Answers: Option A Explanation: Cosine similarity is a standard metric for evaluating vector search accuracy. Question-75 Question:Which open-source LLM is best suited for speech-to-text applications? Options: (A) MPT-30B (B) Llama-2-70b-chat (C) whisper-large-v3 (1.6B) (D) BGE-large-en Correct Answers: Option C Explanation: whisper-large-v3 is optimized for speech-to-text tasks, making it the most suitable option. Question-76 Question:A Generative AI Engineer just deployed an LLM application at a digital marketing company that assists with answering customer service inquiries. Which metric should they monitor for their customer service LLM application in production? Options: (A) Number of customer inquiries processed per unit of time (B) Energy usage per query (C) Final perplexity scores for the training of the model (D) HuggingFace Leaderboard values for the base LLM Correct Answers: Option A Explanation: This metric directly measures the throughput (or volume) and business utility of the application in production. For a customer service application, the primary goal is often to handle a large volume of inquiries efficiently. (C) and (D) are development/training metrics, and (B) is an infrastructure cost metric, not a core application performance metric Question-77 Question:A Generative AI Engineer is designing an LLM-powered live sports commentary platform. The platform provides real-time updates and LLM-generated analyses for any users who would like to have live summaries, rather than reading a series of potentially outdated news articles. Which tool below will give the platform access to real-time data for generating game analyses based on the latest game scores? Options: (A) DatabricksIQ (B) Foundation Model APIs (C) Feature Serving (D) AutoML Correct Answers: Option C Explanation: For real-time applications that need to retrieve the latest factual data (like current game scores) with low latency, Databricks Feature Serving endpoints are the ideal solution. They provide a high-performance, low-latency API layer over structured data (like a Delta Table) to feed real-time context into the LLM chain. Question-78 Question:A Generative AI Engineer has a provisioned throughput model serving endpoint as part of a RAG application and would like to monitor the serving endpoint's incoming requests and outgoing responses. The current approach is to include a micro-service in between the endpoint and the user interface to write logs to a remote server. Which Databricks feature should they use instead which will perform the same task? Options: (A) Vector Search (B) Lakeview (C) DBSQL (D) Inference Tables Correct Answers: Option D Explanation: Inference Tables (or payload logging) are a feature in Databricks Model Serving that automatically capture the input requests and model outputs (payloads) to a Delta Table in near real-time. This provides a simplified, managed, and scalable way to monitor and audit model traffic without needing a custom micro-service Question-79 Question:A Generative AI Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs. Which action would be most effective in mitigating the problem of offensive text outputs? Options: (A) Increase the frequency of upstream data updates (B) Inform the user of the expected RAG behavior (C) Restrict access to the data sources to a limited number of users (D) Curate upstream data properly that includes manual review before it is fed into the RAG system Correct Answers: Option D Explanation: In a RAG system, the outputs are directly influenced by the retrieved source data. If the source data is inflammatory, the RAG output will reflect it. To fix this at the root cause, the engineer must curate and clean the upstream data (the knowledge base) to remove the offensive content before it is indexed. Question-80 Question:A Generative AI Engineer has developed an LLM application to answer questions about internal company policies. The application doesn't hallucinate or leak confidential data. Which approach should NOT be used to mitigate hallucination or confidential data leakage? Options: (A) Add guardrails to filter outputs from the LLM before it is shown to the user (B) Fine-tune the model on your data, hoping it will learn what is appropriate and not (C) Limit the data available based on the user's access level (D) Use a strong system prompt to ensure the model aligns with your needs. Correct Answers: Option B Explanation: While fine-tuning (SFT) can improve domain knowledge and reduce some hallucinations, it is not a security guarantee against leakage. Security and leakage prevention require deterministic methods like Guardrails (A), Access Control (C), and Prompt Engineering (D). Relying on an LLM's non-deterministic training process to prevent security breaches is considered a high-risk and inappropriate strategy Question-81 Question:A Generative AI Engineer at an electronics company just deployed a RAG application for customers to ask questions about products that the company carries. However, they received feedback that the RAG response often returns information about an irrelevant product. What can the engineer do to improve the relevance of the RAG's response? Options: (A) Assess the quality of the retrieved context (B) Implement caching for frequently asked questions (C) Use a different LLM to improve the generated response (D) Use a different semantic similarity search algorithm Correct Answers: Option A Explanation: In irrelevant, the root cause is almost always that the Retriever provided irrelevant context. The first step is to assess the quality of the retrieved chunks (What chunks were returned for that query?) before changing models or algorithms. If the context is bad, the output will be bad, regardless of the LLM or search algorithm Question-82 Question:A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application. What strategy should the Generative AI Engineer use? Options: (A) Switch to using External Models instead (B) Deploy the model using pay-per-token throughput as it comes with cost guarantees (C) Change to a model with a fewer number of parameters in order to reduce hardware constraint issues (D) Throttle the incoming batch of requests manually to avoid rate limiting issues Correct Answers: Option B Explanation: Provisioned throughput is designed for high-volume use cases with high service level agreement (SLA) regardless of usage. When volume is low, switching to a Pay-per-Token or Serverless endpoint is the most cost-effective solution, as the user only pays for the actual tokens consumed Question-83 Question:A Generative AI Engineer is designing a chatbot for a gaming company that aims to engage users on its platform while its users play online video games. Which metric would help them increase user engagement and retention for their platform? Options: (A) Randomness (B) Diversity of responses (C) Lack of relevance (D) Repetition of responses Correct Answers: Option B Explanation: For an entertainment or engagement-focused chatbot, diversity (meaning varied, creative, and non-repetitive answers) is crucial to keeping users interested and promoting continued interaction, which directly correlates with engagement and retention. Low diversity or repetition is boring and leads to low retention Question-84 Question:A Generative AI Engineer is building an LLM to generate article summaries in the form of a type of poem, such as a haiku, given the article content. However, the initial output from the LLM does not match the desired tone or style. Which approach will NOT improve the LLM's response to achieve the desired response? Options: (A) Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style (B) Use a neutralizer to normalize the tone and style of the underlying documents (C) Include few-shot examples in the prompt to the LLM (D) Fine-tune the LLM on a dataset of desired tone and style Correct Answers: Option B Explanation: The goal is to achieve a specific, desired style (haiku/poem). (B) is the opposite of the goal; it would make the LLM less likely to produce the required creative/styled output. (A), (C), and (D) are all valid methods for enforcing a specific style Question-85 Question:Which indicator should be considered to evaluate the safety of the LLM outputs when qualitatively assessing LLM responses for a translation use case? Options: (A) The ability to generate responses in code (B) The similarity to the previous language (C) The latency of the response and the length of text generated (D) The accuracy and relevance of the responses Correct Answers: Option D Explanation: In the context of translation, safety is directly tied to the accurate and faithful conveyance of the original message. If a translation is inaccurate or irrelevant (D), it could introduce harmful bias, misinformation, or violate compliance/legal requirements, thus compromising the safety and utility of the application Question-86 Question:Which steps are essential when writing chunked text into Delta Lake tables in Unity Catalog? (Select TWO) Options: (A) Writing all chunks as a single file (B) Partitioning data based on categories (C) Avoiding partitions for simplicity (D) Structuring chunks to support efficient querying Correct Answers: Option B & D Explanation: (D) Structuring chunks: The schema of the Delta table must be carefully defined to include fields like chunk text, document ID, and metadata. This structured format is essential for the downstream RAG retriever to perform efficient lookups and filtering. (B) Partitioning data: Partitioning (e.g., by document type or category or date) is a Delta Lake best practice that speeds up query performance by allowing the engine to skip large amounts of irrelevant data, which is crucial for RAG inference. Question-87 Question:What is the most suitable library for building a multi-step LLM-based workflow? Options: (A) Pandas (B) TensorFlow (C) PySpark (D) LangChain Correct Answers: Option D Explanation: LangChain is specifically designed as a framework to orchestrate complex, multi-step workflows that integrate Large Language Models (LLMs) with other components like retrievers, tools, and databases. While the other options are great for data analysis (Pandas, PySpark) or deep learning model building (TensorFlow), only LangChain provides the chaining, agent, and memory components needed for multi-step reasoning tasks Question-88 Question:When developing an LLM application, it's crucial to ensure that the data used for training the model complies with licensing requirements to avoid legal risks. Which action is NOT appropriate to avoid legal risks? Options: (A) Reach out to the data curators directly before you have started using the trained model to let them know. (B) Use any available data you personally created which is completely original and you can decide what license to use. (C) Only use data explicitly labeled with an open license and ensure the license terms are followed. (D) Reach out to the data curators directly after you have started using the trained model to let them know Correct Answers: Option D Explanation: Legal compliance must be established before using the data, especially before using the data for training a model. Reaching out to data curators after the model is trained and deployed (Option D) exposes the project to legal liability if the training data was used without permission or violation of its licensing terms, which is too late Question-89 Question:A Generative AI Engineer is creating an LLM system that will retrieve news articles from the year 1918 and related to a user's query and summarize them. The engineer has noticed that the summaries are generated well but often also include an explanation of how the summary was generated, which is undesirable. Which change could the Generative AI Engineer perform to mitigate this issue? Options: (A) Split the LLM output by newline characters to truncate away the summarization explanation. (B) Tune the chunk size of news articles or experiment with different embedding models. (C) Revisit their document ingestion logic, ensuring that the news articles are being ingested properly. (D) Provide few shot examples of desired output format to the system and/or user prompt Correct Answers: Option D Explanation: The issue is an unwanted conversational addition to the response, which is a common LLM behavior. This is best fixed by Prompt Engineering. Providing Few-Shot examples (input/output pairs) or using a clear instruction in the system prompt (e.g., "Respond ONLY with the summary, no explanations") directly constrains the LLM's output format to suppress extraneous text Question-90 Question:A Generative AI Engineer has developed an LLM application to answer questions about internal company policies. The application doesn't hallucinate or leak confidential data. Which approach should NOT be used to mitigate hallucination or confidential data leakage? Options: (A) Add guardrails to filter outputs from the LLM before it is shown to the user (B) Fine-tune the model on your data, hoping it will learn what is appropriate and not (C) Limit the data available based on the user's access level (D) Use a strong system prompt to ensure the model aligns with your needs. Correct Answers: Option B Explanation: While fine-tuning (SFT) can improve domain knowledge and reduce some hallucinations, it is not a security guarantee against leakage. Security and leakage prevention require deterministic methods like Guardrails (A), Access Control (C), and Prompt Engineering (D). Relying on an LLM's non-deterministic training process to prevent security breaches is considered a high-risk and inappropriate strategy Question-91 Question:A Generative AI Engineer interfaces with an LLM with prompt/response behavior that has been trained on customer calls inquiring about product availability. The LLM is designed to output "In Stock" if the product is available or only the term "Out of Stock" if not. Which prompt will work to allow the engineer to respond to call classification labels correctly? Options: (A) Respond with "In Stock" if the customer asks for a product. (B) You will be given a customer call transcript where the customer asks about product availability. The outputs are either "In Stock" or "Out of Stock". Format the output in JSON, for example: {"call_id": "123", "label": "In Stock"}. (C) Respond with "Out of Stock" if the customer asks for a product. (D) You will be given a customer call transcript where the customer inquires about product availability. Respond with "In Stock" if the product is available or "Out of Stock" if not Correct Answers: Option B Explanation: This prompt is the most effective because it adheres to best practices for production data pipelines: it provides the most important, a rigid output format (JSON with an example). This ensures the output is easily machine-readable for downstream processing Question-92 Question:A Generative AI Engineer is tasked with developing a RAG application that will help a small internal group of experts... They want the best possible quality in the answers, and neither latency nor throughput is a huge concern given that the user group is small and they're willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties. Which model meets all the Generative AI Engineer's needs in this situation? Options: (A) Dolly 1.5B (B) OpenAI GPT-4 (C) BGE-large (D) Llama2-70B Correct Answers: Option C Explanation: The key constraints are high quality and strict confidentiality/no transmission to third parties. (B) GPT-4 requires sending data to an external, cloud-based third party, which violates the confidentiality requirement. (C) BGE-large (Big Green Engine) is assumed to be a powerful model that can be configured to operate on-premises or within a secure internal environment, meeting both the quality needs (best possible answer) and the regulatory requirement for internal data processing. (A) and (D) (while open source) are less likely to provide the "best possible quality" compared to the BGE large model for a complex RAG task Question-93 Question:A Generative AI Engineer would like an LLM to generate formatted JSON from emails. This will require parsing and extracting the following information: order ID, date, and sender email. Here's a sample email. They will need to write a prompt that will extract the relevant information in JSON format with the highest level of output accuracy Options: (A) You will receive customer emails and need to extract date, sender email, and order ID. You should return the date, sender email, and order ID information in JSON format. (B) You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format. Here's an example: {"date": "April 16, 2024", "sender_email": "sarah.lee925@gmail.com", "order_id": "RE987D"} (C) You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in a human-readable format. (D) You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format Correct Answers: Option B Explanation: This prompt is the most effective because it combines the two best strategies for structured output: Clear Instruction: Specify the exact fields and the desired format (JSON). Few-Shot Example: Providing a concrete example of the JSON structure (schema) is the most reliable way to enforce output accuracy and consistency across different emails. Question-94 Question:A Generative AI Engineer has been asked to build an LLM-based question-answering application. The application should take into account new documents that are frequently published. The engineer wants to build this application with the least cost and least development effort and have it operate at the lowest cost possible. Which combination of chaining components and configuration meets these requirements? Options: (A) For the application a prompt, a retriever, and an LLM are required. The retriever output is inserted into the prompt which is given to the LLM to generate answers. (B) The LLM needs to be frequently with the new documents in order to provide most up-to-date answers. (C) For the question-answering application, prompt engineering and an LLM are required to generate answers. (D) For the application a prompt, an agent and a fine-tuned LLM are required. The agent is used by the LLM to retrieve relevant content that is inserted into the prompt which is given to the LLM to generate answers Correct Answers: Option A Explanation: This describes a basic Retrieval-Augmented Generation (RAG) architecture. RAG is the best fit for these constraints. Fine-tuning (B) or using a more complex agent (D). Frequently Published Data: RAG handles frequently updated data by simply updating the vector store, without the need for expensive, time-consuming model retraining. Question-95 Question:A Generative AI Engineer is creating an agent-based LLM system for their favorite monster truck team. The system can answer text based questions about the monster truck team, lookup event dates via an API call, or query tables on the team's latest standings. How could the Generative AI Engineer best design these capabilities into their system? Options: (A) Ingest PDF documents about the monster truck team into a vector store and query it in a RAG architecture. (B) Write a system prompt for the agent listing available tools and bundle it into an agent system that runs a number of calls to solve a query. (C) Instruct the LLM to respond with "RAG", API", or "TABLE" depending on the query, then use text parsing and conditional statements to resolve the query. (D) Build a system prompt with all possible event dates and table information in the system prompt. Use a RAG architecture to lookup generic text questions and otherwise leverage the information in the system prompt Correct Answers: Option B Explanation: The requirement is for a flexible system that can handle different tasks (general text, API call, SQL query) dynamically. An Agent system (often orchestrated via an AgentExecutor in frameworks like LangChain) is the optimal design pattern. The agent's LLM uses its prompt to reason about the query and decide which external Tool (RAG, API tool, SQL tool) to call next Question-96 Question:A Generative AI Engineer has been asked to design an LLM-based application that accomplishes the following business objective: answer employee HR questions using HR PDF documentation. Which new set of high level tasks should the Generative AI Engineer's system perform? Options: (A) Calculate averaged embeddings for each HR document, compare embeddings to user query to find the best document. Pass the best document with the user query into an LLM with a large context window to generate a response to the employee. (B) Use an LLM to summarize HR documentation. Provide summaries of documentation and user query into an LLM with a large context window to generate a response to the user. (C) Create an interaction matrix of historical employee questions and answers, use ALS to factorize the matrix and create embeddings. Calculate the embeddings of new queries and use them to find the best HR documentation. Use an LLM to generate a response to the employee question based upon the documentation retrieved. (D) Split HR documentation into chunks and embed into a vector store. Use the employee question to retrieve best matched chunks of documentation, and use the LLM to generate a response to the employee based upon the documentation retrieved. Correct Answers: Option D Explanation: Option D describes the core, standard steps of a Retrieval-Augmented Generation (RAG) pipeline. This is the most effective and precise method for question answering over a static knowledge base (PDFs). RAG ensures the LLM is grounded in the documentation by retrieving the relevant small chunks of text, minimizing hallucination and improving answer quality Question-97 Question:Generative AI Engineer at an electronics company just deployed a RAG application for customers to ask questions about products that the company carries. However, they received feedback that the RAG response often returns information about an irrelevant product. What can the engineer do to improve the relevance of the RAG's response? Options: (A) Assess the quality of the retrieved context (B) Implement caching for frequently asked questions (C) Use a different LLM to improve the generated response (D) Use a different semantic similarity search algorithm Correct Answers: Option A Explanation: In a RAG application, if the final output is irrelevant, the root cause is almost always that the Retriever component returned irrelevant source material (chunks) to the LLM. The first and most crucial diagnostic step is to assess the retrieved context. If the context is wrong, changing the LLM (C) or the search algorithm (D) will not fix the data input problem Question-98 Question: A Generative AI Engineer is developing a chatbot... it must not provide responses to questions about politics. Instead, when presented with political inquiries, the chatbot should respond with a standard message: "Sorry, I cannot answer that... only answer questions around insurance." Which framework type should be implemented to solve this? Options: (A) Safety Guardrail (B) Security Guardrail (C) Contextual Guardrail (D) Compliance Guardrail Correct Answers: Option A Explanation: This scenario involves detecting and mitigating a type of undesirable content (politics, harassment, toxicity, etc.) in the user's input or the LLM's output. A Safety Guardrail (often implemented via input/output filters or a separate classification model) is the dedicated framework used to enforce content policies and prevent responses to disallowed topics Question-99 Question: A Generative AI Engineer I using the code below to test setting up a vector store: from databricks.vector_search.client import VectorSearchClient vsc = VectorSearchClient() vsc.create_endpoint( name="vector_search_test", endpoint_type="STANDARD" ) Assuming they intend to use Databricks managed embeddings with the default embedding model, what should be the next logical function call? Options: (A) vsc.get_index() (B) vsc.create_delta_sync_index() (C) vsc.create_direct_access_index() (D) vsc.similarity_search() Correct Answers: Option B Explanation: Given that the system is built on Databricks (implying the source data is a Delta Table) and is using Databricks managed embeddings, the most common and robust setup is the Delta Sync Index. This method automatically synchronizes the vector index with the source Delta Table when data changes, requiring minimal ongoing management, which is the typical next step after initializing the client ('vsc') Question-100 Question: A Generative AI Engineer is tasked with deploying an application that takes advantage of a custom MLflow Pyfunc model to return some interim results. How should they configure the endpoint to pass the secrets and credentials? Options: (A) Use `spark.conf.set()` (B) Pass variables using the Databricks Feature Store API (C) Add credentials using environment variables (D) Pass the secrets in plain text Correct Answers: Option C Explanation: The industry standard and most secure method for passing non-plaintext secrets to a model serving endpoint (including MLflow Pyfunc models) is via environment variables. In Databricks, these variables are often configured to securely reference secrets stored in the Databricks Secrets utility I hope this video is useful to you with this sample questions, answers and my justifications you will be able to prepare well. My best wishes to you all to take the exam and get passed!! If my video did help any of you and you were able to pass Do let me know in comments!! I will be very happy!!



