Generative AI Quiz
Quiz 1 - Building Large Language Models
Questions
Question 1: What does LLM mean?
- A) Linguistic Laugh Machine
- B) Legendary Lemonade Maker
- C) Large Language Model
- D) Levitating Lawn Mower
Question 2: What is tokenization?
- A) The process of encrypting text data for security purposes.
- B) The method of assigning numerical frequency values to words.
- C) The process of breaking down text into individual units called tokens.
- D) The technique of summarizing large texts into key points.
Question 3: Which metric is commonly used to evaluate the performance of a language model?
- A) Accuracy Score
- B) Perplexity
- C) Recall Rate
- D) F1 Score
Question 4: How are language models generally trained?
- A) By manually programming grammar rules into the system.
- B) By feeding the model a vast set of text data to predict the next word.
- C) By translating texts between languages to improve linguistic understanding.
- D) By analyzing and replicating human brain activity patterns.
Question 5: What is supervised fine-tuning (SFT)?
- A) Compressing a language model to reduce computational requirements
- B) Refining a pre-trained LLM with task-specific, human-annotated data
- C) Training a language model from scratch using only labeled data
- D) Adjusting a language model’s outputs based on automated feedback loops
Question 6: What is one of the challenges when using perplexity as an evaluation metric?
- A) It requires labeled datasets, which are hard to obtain
- B) It only measures the speed of the model, not accuracy
- C) It depends on the vocabulary size
- D) It cannot be computed for large language models
Question 7: Which of the following is a typical step in data preprocessing for large language models (LLMs)?
- A) Encrypting all textual data to protect privacy
- B) Injecting noise into data to test model robustness
- C) Removing duplicates and low-quality data from the dataset
- D) Translating all data into a single language
Question 8: What does the Chinchilla scaling law propose?
- A) Increasing layers indefinitely improves performance
- B) Balancing model size and training data for optimal performance
- C) Always train the largest model, regardless of data amount
- D) Using less computational resources enhances generalization
Question 9: What do scaling laws propose?
- A) Doubling data size halves training time
- B) Performance improves predictably with larger models and more data
- C) Smaller models are more efficient than larger ones
- D) Less training data leads to better generalization
Question 10: What is Direct Preference Optimization (DPO)?
- A) Uses a reward model and reinforcement learning
- B) Directly optimizes model outputs based on human feedback
- C) Augments data by reversing text sequences
- D) Compresses models to reduce computation
Question 11: What is a main challenge in evaluating LLMs?
- A) Answer preference is not trivial
- B) LLMs agree with themselves only 66% of the time
- C) Humans have a lot of variance
- D) Metrics capture all language aspects perfectly
Solutions
- Solution 1: C
- Solution 2: C
- Solution 3: B
- Solution 4: B
- Solution 5: B
- Solution 6: C
- Solution 7: C
- Solution 8: B
- Solution 9: B
- Solution 10: B
- Solution 11: A
Quiz 2 - Before Transformers
Questions
Question 1: What is an N-gram in language modeling
- A) A neural network component.
- B) A statistical sequence of N items.
- C) A word embedding technique.
- D) A parsing algorithm.
Question 2: Word embeddings are sparse vectors.
- A) True
- B) False
Question 3: How do recurrent neural networks (RNNs) work?
- A) Process inputs independently
- B) Use internal states to handle sequences
- C) Apply convolution over data
- D) Utilize attention mechanisms
Question 4: What is the limitation of RNNs in language modeling?
- A) Difficulty learning long-term dependencies
- B) Suffering from vanishing gradient problem
- C) Cannot be implemented using GPU for acceleration
- D) Cannot handle variable-length sequences
Question 5: LSTM models improve upon RNNs by
- A) Requiring less computational power than RNN
- B) Solving the vanishing gradient problem
- C) Introducing the gating mechanism for long-term dependencies
- D) Not overfitting due to their architectural improvements over RNN
Question 6: How do LSTMs solve the vanishing gradient problem?
- A) By resetting the hidden state after each time step
- B) By avoiding using recurrent connections
- C) Using gate mechanisms to reduce the flow of information
- D) With the cell state allowing gradient to flow unchanged
Question 7: How do LSTMs learn?
- A) Using self-supervised learning for time series forecasting
- B) Using backpropagation through time
- C) Using supervised learning for next word prediction
- D) Using perplexity as a loss function
Solutions
- Solution 1: B
- Solution 2: B
- Solution 3: B
- Solution 4: A, B
- Solution 5: B, C
- Solution 6: C, D
- Solution 7: B
Quiz 3 - Transformers Architecture
Questions
Question 1: What is the purpose of the self-attention mechanism?
- A) Allow the model to focus on different parts of the input at inference.
- B) To capture positional information of tokens in a sequence.
- C) To reduce the computational complexity compared to RNNs.
- D) To enable parallel processing of sequence elements.
Question 2: What is the role of multi-head attention in Transformer architectures?
- A) Enable the model to focus on different sequence positions simultaneously.
- B) To increase the model’s capacity to learn different types of relationships.
- C) To decrease the overall computational cost of the model.
- D) To provide positional information to the model.
Question 3: In the Transformer architecture, what is the purpose of positional encoding?
- A) To allow the model to understand the order of the sequence elements.
- B) To replace the need for self-attention mechanisms.
- C) To improve the convergence rate during training.
- D) To provide additional input features for better performance.
Question 4: What is the function of the scaling factor \frac{1}{\sqrt{d_k}} in the scaled dot-product attention?
- A) To make computations more efficient.
- B) Ensure the softmax function operates in a highly sensitive region.
- C) Adjust dot product magnitude to prevent extremely small gradients.
- D) To reduce computational complexity.
Question 5: In the Transformer architecture, what is the primary reason for using residual connections and layer normalization?
- A) To facilitate the flow of gradients during training.
- B) To allow deeper networks without the vanishing gradient problem.
- C) To reduce overfitting on the training data.
- D) To decrease the computational requirements during inference.
Question 6: Which of the following statements about the encoder and decoder in the Transformer architecture are true?
- A) The encoder processes input sequence into continuous representations.
- B) The translation models use only the encoder part of the Transformers.
- C) The decoder does not use any attention mechanism.
- D) Both encoder and decoder use self-attention mechanisms.
Question 7: What is the purpose of masked self-attention in the decoder?
- A) Prevent the model from attending to future positions in the sequence during training.
- B) To allow the model to attend only to relevant parts of the input sequence.
- C) To enforce causality in sequence generation.
- D) To reduce the computational complexity of the attention mechanism.
Question 8: What are the advantages of using the Transformer architecture over traditional RNN-based models?
- A) Parallel processing of sequence elements, leading to faster training.
- B) Capture long-range dependencies more effectively.
- C) Fixed computational cost per time step regardless of the sequence length.
- D) Fewer parameters than RNNs.
Question 9: In the context of Transformers, what is the role of the softmax function in the attention mechanism?
- A) To normalize the attention scores into probabilities.
- B) To allow the model to focus on the input sequence’s most relevant parts.
- C) To reduce the dimensionality of the data.
- D) To introduce non-linearity into the model.
Question 10: Which of the following correctly describe the key, query, and value vectors in the Transformer architecture?
- A) They are linear projections of the input embeddings.
- B) The query vector represents the content of the current position.
- C) They are used to compute attention scores.
- D) The value vector is used to store the actual content to be aggregated.
Question 11: In multi-head attention, why are separate linear transformations applied to queries, keys, and values for each head?
- A) To allow each head to attend to different aspects of the input.
- B) To reduce overfitting by increasing parameter sharing.
- C) To increase model capacity without increasing computational cost too much.
- D) To ensure that all heads produce identical outputs.
Question 12: Suppose that the embedding size is d_{\text{model}} = 512 and number of heads = 8. What are Q, K, V vectors dimensions d_k ?
- A) 512
- B) 128
- C) 64
- D) 256
Solutions
- Solution 1: A
- Solution 2: B
- Solution 3: A
- Solution 4: C
- Solution 5: A, B
- Solution 6: A, D
- Solution 7: A, C
- Solution 8: A, B
- Solution 9: A, B
- Solution 10: A, B, C, D
- Solution 11: A
- Solution 12: B
Quiz 4 - Retrieval Augmented Generation
Questions
Question 1: What are the primary components of a Retrieval-Augmented Generation (RAG) system?
- A) A retrieval module to fetch relevant documents.
- B) A generative language model to produce outputs using retrieved documents.
- C) An encoder-decoder architecture for language translation.
- D) A reinforcement learning agent to optimize retrieval strategies.
Question 2: In the context of RAG, what is the main purpose of using vector embeddings?
- A) To represent text documents and queries in a high-dimensional space.
- B) To enable efficient similarity searches between queries and documents.
- C) To reduce the dimensionality of data for computational efficiency.
- D) To train the language model with fewer parameters.
Question 3: Which of the following are advantages of using RAG over traditional language models?
- A) Ability to provide up-to-date information not present in the training data.
- B) Reduction of hallucinations by grounding responses in retrieved documents.
- C) Increased computational efficiency due to smaller model sizes.
- D) Improved handling of long-range dependencies within sequences.
Question 4: What are considered as the most powerful for retrieving documents?
- A) TF-IDF
- B) BM25
- C) Cosine similarity
- D) Hybrid Search
Question 5: Which retrieval method should I use for the query: “What is the name of the capital city of Joe Biden’s country?”
- A) TD-IDF
- B) BM25
- C) Cosine Similarity
- D) Hybrid method
Question 6: How does Self-RAG improve the performance of the RAG system?
- A) By generating multiple possible response segments in parallel.
- B) By using a critic model to select the most accurate segment.
- C) By recursively summarizing retrieved documents.
- D) By adding a retrieval evaluator to assess source quality.
Question 7: Which metrics are used to evaluate the retriever in a RAG system?
- A) Precision at k
- B) Recall at k
- C) Normalized Discounted Cumulative Gain (NDCG)
- D) BLEU / ROUGE
Question 8: What technique does RAG use to overcome context size limitations when processing large documents?
- A) It ignores parts of the document that don’t fit in the context.
- B) It segments the document into smaller blocks based on titles or headings.
- C) It compresses documents to reduce their size.
- D) It increases the context size of the language model.
Question 9: What does Corrective RAG (CRAG) add to the traditional RAG system?
- A) A retrieval evaluator to assess the quality of retrieved sources.
- B) An integrated machine translation module.
- C) A critic model to select the best responses.
- D) A hierarchy of summaries to preserve context.
Question 10: What is the primary function of RAPTOR?
- A) Generates multiple responses in parallel and uses a critic model.
- B) Adds a retrieval evaluator to assess the quality of retrieved sources.
- C) Recursively summarizes retrieved documents, creating a hierarchy of summaries.
- D) Processes the entire dataset to create a knowledge graph.
Question 11: What is the primary function of GraphRAG?
- A) Generates multiple possible response segments in parallel
- B) Create a full dataset knowledge graph that organizes data hierarchically
- C) Creates a hierarchy of summaries to reduce information overload
- D) Adds a retrieval evaluator to assess the quality of retrieved sources
Question 12: What is the primary function of HyDE?
- A) It compresses large documents into shorter embeddings
- B) It creates a hierarchical structure of documents to improve retrieval
- C) It employs RLHF to achieve better retrieval
- D) It generates hypothetical answers to queries used for retrieving
Question 13: What primary issue does the “Lost in the Middle” paper address?
- A) Models cannot handle sequences longer than 512 tokens
- B) Models have increased computational costs with longer contexts
- C) Models focus on the beginning and end of contexts
- D) Models perform equally well across all parts of the context
Question 14: What are HNSW and FAISS famous for?
- A) Algorithms for compressing language models to reduce size
- B) Methods for training language models with fewer parameters
- C) Approximate nearest neighbor search for efficient retrieval of embeddings
- D) Techniques for augmenting training datasets with synthetic data
Question 15: What is the primary purpose of Reciprocal Rank Fusion (RRF)?
- A) Combine rankings from multiple retrieval models
- B) Fuse document embeddings into a single representation for better context
- C) Dynamically adjust retrieval strategies based on user interactions
- D) Prioritize documents with the highest retrieval latency
Solutions
- Solution 1: A, B
- Solution 2: A, B
- Solution 3: A, B
- Solution 4: D
- Solution 5: D
- Solution 6: B, D
- Solution 7: A, B, C
- Solution 8: B
- Solution 9: A
- Solution 10: D
- Solution 11: C
- Solution 12: D
- Solution 13: C
- Solution 14: C
- Solution 15: A
Quiz 5 - Beyond LLM, Tools and (Multi)-Agents
Questions
Question 1: Which of the following statements are true about emergent abilities in large language models (LLMs)?
- A) They are present in both large and small language models
- B) They appear unexpectedly when the model size surpasses a critical threshold
- C) They are deliberately engineered into the model during training
- D) An example of an emergent ability is solving complex arithmetic problems
Question 2: What are some advantages / limitations of large language models (LLMs)?
- A) LLM can generate toxic, biased, or misleading content, posing societal risk
- B) Training larger models reduces risks of overfitting and data memorization
- C) Massive datasets required for training can amplify existing biases
- D) All limitations of LLMs are fully understood and documented
Question 3: How can the capabilities of LLMs be improved?
- A) Reducing the size of the models to prevent overfitting
- B) Developing new model architectures and using higher-quality data
- C) Focusing research on making emergent abilities accessible to smaller models
- D) Limiting training to known data to prevent unknown limitations
Question 4: Which statements correctly describe LLM agents as per the course summary?
- A) An LLM agent uses a large language model as the main controller
- B) They can perform complex tasks using advanced planning techniques and tools
- C) The concept of LLM agents was first introduced in early 2021
- D) LLM agents cannot use external tools like web search and code interpreters
Question 5: Why are agents preferred over single LLM calls for complex tasks?
- A) They can plan sequences of actions and learn from experiences
- B) Agents are limited to predefined responses and cannot adapt to new tasks
- C) Agents can perform complex tasks by utilizing external tools
- D) Parallelisation offers faster answers for Agents based LLM than single LLM
Question 6: Which of the following are key components of an LLM agent?
- A) Agent/Brain: The LLM acting as the main controller and decision-maker
- B) Randomization Module: Introducing randomness to the agent’s decisions
- C) Memory: Storing past interactions and experiences
- D) Emotion Engine: Simulating human emotions in the agent’s responses
Question 7: What does “reflection” refer to?
- A) The agent’s ability to mirror the user’s language style
- B) A framework allowing agent to improve performance using linguistic feedback
- C) The process of agents self-evaluating their actions for errors
- D) Agent mechanism transforming environmental feedback into improvements
Question 8: What are some current limitations of LLM agents and multi-agent systems?
- A) Ensuring agents don’t deviate from the initial plan - divergence of planning
- B) Guaranteeing that agents can perform tasks without any form of communication
- C) Developing robust methods for testing and evaluating agent performance
- D) Trust and security in real-world deployments
Solutions
- Solution 1: B, D
- Solution 2: A, C
- Solution 3: B, C
- Solution 4: A, B
- Solution 5: C
- Solution 6: A, C
- Solution 7: B
- Solution 8: A, D