Generative AI Quiz

Quiz 1 - Building Large Language Models

Questions

Question 1: What does LLM mean?

  • A) Linguistic Laugh Machine
  • B) Legendary Lemonade Maker
  • C) Large Language Model
  • D) Levitating Lawn Mower

Question 2: What is tokenization?

  • A) The process of encrypting text data for security purposes.
  • B) The method of assigning numerical frequency values to words.
  • C) The process of breaking down text into individual units called tokens.
  • D) The technique of summarizing large texts into key points.

Question 3: Which metric is commonly used to evaluate the performance of a language model?

  • A) Accuracy Score
  • B) Perplexity
  • C) Recall Rate
  • D) F1 Score

Question 4: How are language models generally trained?

  • A) By manually programming grammar rules into the system.
  • B) By feeding the model a vast set of text data to predict the next word.
  • C) By translating texts between languages to improve linguistic understanding.
  • D) By analyzing and replicating human brain activity patterns.

Question 5: What is supervised fine-tuning (SFT)?

  • A) Compressing a language model to reduce computational requirements
  • B) Refining a pre-trained LLM with task-specific, human-annotated data
  • C) Training a language model from scratch using only labeled data
  • D) Adjusting a language model’s outputs based on automated feedback loops

Question 6: What is one of the challenges when using perplexity as an evaluation metric?

  • A) It requires labeled datasets, which are hard to obtain
  • B) It only measures the speed of the model, not accuracy
  • C) It depends on the vocabulary size
  • D) It cannot be computed for large language models

Question 7: Which of the following is a typical step in data preprocessing for large language models (LLMs)?

  • A) Encrypting all textual data to protect privacy
  • B) Injecting noise into data to test model robustness
  • C) Removing duplicates and low-quality data from the dataset
  • D) Translating all data into a single language

Question 8: What does the Chinchilla scaling law propose?

  • A) Increasing layers indefinitely improves performance
  • B) Balancing model size and training data for optimal performance
  • C) Always train the largest model, regardless of data amount
  • D) Using less computational resources enhances generalization

Question 9: What do scaling laws propose?

  • A) Doubling data size halves training time
  • B) Performance improves predictably with larger models and more data
  • C) Smaller models are more efficient than larger ones
  • D) Less training data leads to better generalization

Question 10: What is Direct Preference Optimization (DPO)?

  • A) Uses a reward model and reinforcement learning
  • B) Directly optimizes model outputs based on human feedback
  • C) Augments data by reversing text sequences
  • D) Compresses models to reduce computation

Question 11: What is a main challenge in evaluating LLMs?

  • A) Answer preference is not trivial
  • B) LLMs agree with themselves only 66% of the time
  • C) Humans have a lot of variance
  • D) Metrics capture all language aspects perfectly

Solutions

  • Solution 1: C
  • Solution 2: C
  • Solution 3: B
  • Solution 4: B
  • Solution 5: B
  • Solution 6: C
  • Solution 7: C
  • Solution 8: B
  • Solution 9: B
  • Solution 10: B
  • Solution 11: A

Quiz 2 - Before Transformers

Questions

Question 1: What is an N-gram in language modeling

  • A) A neural network component.
  • B) A statistical sequence of N items.
  • C) A word embedding technique.
  • D) A parsing algorithm.

Question 2: Word embeddings are sparse vectors.

  • A) True
  • B) False

Question 3: How do recurrent neural networks (RNNs) work?

  • A) Process inputs independently
  • B) Use internal states to handle sequences
  • C) Apply convolution over data
  • D) Utilize attention mechanisms

Question 4: What is the limitation of RNNs in language modeling?

  • A) Difficulty learning long-term dependencies
  • B) Suffering from vanishing gradient problem
  • C) Cannot be implemented using GPU for acceleration
  • D) Cannot handle variable-length sequences

Question 5: LSTM models improve upon RNNs by

  • A) Requiring less computational power than RNN
  • B) Solving the vanishing gradient problem
  • C) Introducing the gating mechanism for long-term dependencies
  • D) Not overfitting due to their architectural improvements over RNN

Question 6: How do LSTMs solve the vanishing gradient problem?

  • A) By resetting the hidden state after each time step
  • B) By avoiding using recurrent connections
  • C) Using gate mechanisms to reduce the flow of information
  • D) With the cell state allowing gradient to flow unchanged

Question 7: How do LSTMs learn?

  • A) Using self-supervised learning for time series forecasting
  • B) Using backpropagation through time
  • C) Using supervised learning for next word prediction
  • D) Using perplexity as a loss function

Solutions

  • Solution 1: B
  • Solution 2: B
  • Solution 3: B
  • Solution 4: A, B
  • Solution 5: B, C
  • Solution 6: C, D
  • Solution 7: B

Quiz 3 - Transformers Architecture

Questions

Question 1: What is the purpose of the self-attention mechanism?

  • A) Allow the model to focus on different parts of the input at inference.
  • B) To capture positional information of tokens in a sequence.
  • C) To reduce the computational complexity compared to RNNs.
  • D) To enable parallel processing of sequence elements.

Question 2: What is the role of multi-head attention in Transformer architectures?

  • A) Enable the model to focus on different sequence positions simultaneously.
  • B) To increase the model’s capacity to learn different types of relationships.
  • C) To decrease the overall computational cost of the model.
  • D) To provide positional information to the model.

Question 3: In the Transformer architecture, what is the purpose of positional encoding?

  • A) To allow the model to understand the order of the sequence elements.
  • B) To replace the need for self-attention mechanisms.
  • C) To improve the convergence rate during training.
  • D) To provide additional input features for better performance.

Question 4: What is the function of the scaling factor \frac{1}{\sqrt{d_k}} in the scaled dot-product attention?

  • A) To make computations more efficient.
  • B) Ensure the softmax function operates in a highly sensitive region.
  • C) Adjust dot product magnitude to prevent extremely small gradients.
  • D) To reduce computational complexity.

Question 5: In the Transformer architecture, what is the primary reason for using residual connections and layer normalization?

  • A) To facilitate the flow of gradients during training.
  • B) To allow deeper networks without the vanishing gradient problem.
  • C) To reduce overfitting on the training data.
  • D) To decrease the computational requirements during inference.

Question 6: Which of the following statements about the encoder and decoder in the Transformer architecture are true?

  • A) The encoder processes input sequence into continuous representations.
  • B) The translation models use only the encoder part of the Transformers.
  • C) The decoder does not use any attention mechanism.
  • D) Both encoder and decoder use self-attention mechanisms.

Question 7: What is the purpose of masked self-attention in the decoder?

  • A) Prevent the model from attending to future positions in the sequence during training.
  • B) To allow the model to attend only to relevant parts of the input sequence.
  • C) To enforce causality in sequence generation.
  • D) To reduce the computational complexity of the attention mechanism.

Question 8: What are the advantages of using the Transformer architecture over traditional RNN-based models?

  • A) Parallel processing of sequence elements, leading to faster training.
  • B) Capture long-range dependencies more effectively.
  • C) Fixed computational cost per time step regardless of the sequence length.
  • D) Fewer parameters than RNNs.

Question 9: In the context of Transformers, what is the role of the softmax function in the attention mechanism?

  • A) To normalize the attention scores into probabilities.
  • B) To allow the model to focus on the input sequence’s most relevant parts.
  • C) To reduce the dimensionality of the data.
  • D) To introduce non-linearity into the model.

Question 10: Which of the following correctly describe the key, query, and value vectors in the Transformer architecture?

  • A) They are linear projections of the input embeddings.
  • B) The query vector represents the content of the current position.
  • C) They are used to compute attention scores.
  • D) The value vector is used to store the actual content to be aggregated.

Question 11: In multi-head attention, why are separate linear transformations applied to queries, keys, and values for each head?

  • A) To allow each head to attend to different aspects of the input.
  • B) To reduce overfitting by increasing parameter sharing.
  • C) To increase model capacity without increasing computational cost too much.
  • D) To ensure that all heads produce identical outputs.

Question 12: Suppose that the embedding size is d_{\text{model}} = 512 and number of heads = 8. What are Q, K, V vectors dimensions d_k ?

  • A) 512
  • B) 128
  • C) 64
  • D) 256

Solutions

  • Solution 1: A
  • Solution 2: B
  • Solution 3: A
  • Solution 4: C
  • Solution 5: A, B
  • Solution 6: A, D
  • Solution 7: A, C
  • Solution 8: A, B
  • Solution 9: A, B
  • Solution 10: A, B, C, D
  • Solution 11: A
  • Solution 12: B

Quiz 4 - Retrieval Augmented Generation

Questions

Question 1: What are the primary components of a Retrieval-Augmented Generation (RAG) system?

  • A) A retrieval module to fetch relevant documents.
  • B) A generative language model to produce outputs using retrieved documents.
  • C) An encoder-decoder architecture for language translation.
  • D) A reinforcement learning agent to optimize retrieval strategies.

Question 2: In the context of RAG, what is the main purpose of using vector embeddings?

  • A) To represent text documents and queries in a high-dimensional space.
  • B) To enable efficient similarity searches between queries and documents.
  • C) To reduce the dimensionality of data for computational efficiency.
  • D) To train the language model with fewer parameters.

Question 3: Which of the following are advantages of using RAG over traditional language models?

  • A) Ability to provide up-to-date information not present in the training data.
  • B) Reduction of hallucinations by grounding responses in retrieved documents.
  • C) Increased computational efficiency due to smaller model sizes.
  • D) Improved handling of long-range dependencies within sequences.

Question 4: What are considered as the most powerful for retrieving documents?

  • A) TF-IDF
  • B) BM25
  • C) Cosine similarity
  • D) Hybrid Search

Question 5: Which retrieval method should I use for the query: “What is the name of the capital city of Joe Biden’s country?”

  • A) TD-IDF
  • B) BM25
  • C) Cosine Similarity
  • D) Hybrid method

Question 6: How does Self-RAG improve the performance of the RAG system?

  • A) By generating multiple possible response segments in parallel.
  • B) By using a critic model to select the most accurate segment.
  • C) By recursively summarizing retrieved documents.
  • D) By adding a retrieval evaluator to assess source quality.

Question 7: Which metrics are used to evaluate the retriever in a RAG system?

  • A) Precision at k
  • B) Recall at k
  • C) Normalized Discounted Cumulative Gain (NDCG)
  • D) BLEU / ROUGE

Question 8: What technique does RAG use to overcome context size limitations when processing large documents?

  • A) It ignores parts of the document that don’t fit in the context.
  • B) It segments the document into smaller blocks based on titles or headings.
  • C) It compresses documents to reduce their size.
  • D) It increases the context size of the language model.

Question 9: What does Corrective RAG (CRAG) add to the traditional RAG system?

  • A) A retrieval evaluator to assess the quality of retrieved sources.
  • B) An integrated machine translation module.
  • C) A critic model to select the best responses.
  • D) A hierarchy of summaries to preserve context.

Question 10: What is the primary function of RAPTOR?

  • A) Generates multiple responses in parallel and uses a critic model.
  • B) Adds a retrieval evaluator to assess the quality of retrieved sources.
  • C) Recursively summarizes retrieved documents, creating a hierarchy of summaries.
  • D) Processes the entire dataset to create a knowledge graph.

Question 11: What is the primary function of GraphRAG?

  • A) Generates multiple possible response segments in parallel
  • B) Create a full dataset knowledge graph that organizes data hierarchically
  • C) Creates a hierarchy of summaries to reduce information overload
  • D) Adds a retrieval evaluator to assess the quality of retrieved sources

Question 12: What is the primary function of HyDE?

  • A) It compresses large documents into shorter embeddings
  • B) It creates a hierarchical structure of documents to improve retrieval
  • C) It employs RLHF to achieve better retrieval
  • D) It generates hypothetical answers to queries used for retrieving

Question 13: What primary issue does the “Lost in the Middle” paper address?

  • A) Models cannot handle sequences longer than 512 tokens
  • B) Models have increased computational costs with longer contexts
  • C) Models focus on the beginning and end of contexts
  • D) Models perform equally well across all parts of the context

Question 14: What are HNSW and FAISS famous for?

  • A) Algorithms for compressing language models to reduce size
  • B) Methods for training language models with fewer parameters
  • C) Approximate nearest neighbor search for efficient retrieval of embeddings
  • D) Techniques for augmenting training datasets with synthetic data

Question 15: What is the primary purpose of Reciprocal Rank Fusion (RRF)?

  • A) Combine rankings from multiple retrieval models
  • B) Fuse document embeddings into a single representation for better context
  • C) Dynamically adjust retrieval strategies based on user interactions
  • D) Prioritize documents with the highest retrieval latency

Solutions

  • Solution 1: A, B
  • Solution 2: A, B
  • Solution 3: A, B
  • Solution 4: D
  • Solution 5: D
  • Solution 6: B, D
  • Solution 7: A, B, C
  • Solution 8: B
  • Solution 9: A
  • Solution 10: D
  • Solution 11: C
  • Solution 12: D
  • Solution 13: C
  • Solution 14: C
  • Solution 15: A

Quiz 5 - Beyond LLM, Tools and (Multi)-Agents

Questions

Question 1: Which of the following statements are true about emergent abilities in large language models (LLMs)?

  • A) They are present in both large and small language models
  • B) They appear unexpectedly when the model size surpasses a critical threshold
  • C) They are deliberately engineered into the model during training
  • D) An example of an emergent ability is solving complex arithmetic problems

Question 2: What are some advantages / limitations of large language models (LLMs)?

  • A) LLM can generate toxic, biased, or misleading content, posing societal risk
  • B) Training larger models reduces risks of overfitting and data memorization
  • C) Massive datasets required for training can amplify existing biases
  • D) All limitations of LLMs are fully understood and documented

Question 3: How can the capabilities of LLMs be improved?

  • A) Reducing the size of the models to prevent overfitting
  • B) Developing new model architectures and using higher-quality data
  • C) Focusing research on making emergent abilities accessible to smaller models
  • D) Limiting training to known data to prevent unknown limitations

Question 4: Which statements correctly describe LLM agents as per the course summary?

  • A) An LLM agent uses a large language model as the main controller
  • B) They can perform complex tasks using advanced planning techniques and tools
  • C) The concept of LLM agents was first introduced in early 2021
  • D) LLM agents cannot use external tools like web search and code interpreters

Question 5: Why are agents preferred over single LLM calls for complex tasks?

  • A) They can plan sequences of actions and learn from experiences
  • B) Agents are limited to predefined responses and cannot adapt to new tasks
  • C) Agents can perform complex tasks by utilizing external tools
  • D) Parallelisation offers faster answers for Agents based LLM than single LLM

Question 6: Which of the following are key components of an LLM agent?

  • A) Agent/Brain: The LLM acting as the main controller and decision-maker
  • B) Randomization Module: Introducing randomness to the agent’s decisions
  • C) Memory: Storing past interactions and experiences
  • D) Emotion Engine: Simulating human emotions in the agent’s responses

Question 7: What does “reflection” refer to?

  • A) The agent’s ability to mirror the user’s language style
  • B) A framework allowing agent to improve performance using linguistic feedback
  • C) The process of agents self-evaluating their actions for errors
  • D) Agent mechanism transforming environmental feedback into improvements

Question 8: What are some current limitations of LLM agents and multi-agent systems?

  • A) Ensuring agents don’t deviate from the initial plan - divergence of planning
  • B) Guaranteeing that agents can perform tasks without any form of communication
  • C) Developing robust methods for testing and evaluating agent performance
  • D) Trust and security in real-world deployments

Solutions

  • Solution 1: B, D
  • Solution 2: A, C
  • Solution 3: B, C
  • Solution 4: A, B
  • Solution 5: C
  • Solution 6: A, C
  • Solution 7: B
  • Solution 8: A, D