Attention Mechanism Prompting

  • the prompt is broken up into tokens (word fragments)
    • and processes them through multiple attention layers
    • each token initially has a single embedding (vector representation)
      • once processing within context
        • attention mechanisms dynamically alter a token's effective meaning
  • attention head are independent attention mechanisms
    • that each focus on different aspects of an input

      think of them as multiple spotlights examining a sentence from different angles

      review

      These attention heads analyze relationships by creating:

      • Q query vectors: what each token is 'looking for' in other tokens
      • K key vectors: how relevant each token is to others
      • V value vectors: the actual information that gets passed along
  • Zero-shot prompting

We provide a textual description of the task and a test example. The model either provides an answer using open-ended generation, or ranks the proposed answers

  • Few-Shot Prompting

We provide a few examples of the task (between 1 and 64) and a test example. The model takes this text as input and generates the answer or ranks different options.

  • Context Layering
    • start with foundational concepts before adding nuance
      • "First, do research on <concepts> before doing <task>"
  • Directed Attention Weighting
    • explicitly instruct how to distribute attention across sections
      • "Dedicate equal attention to each section"
      • "Allocate 30% to definition, 40% to resolution, and 30% to evaluation"
  • Chain-of-thought
    • a series of intermediate reasoning steps
      • significantly improves the ability of large language models
        • to perform complex reasoning

          Auto-enable "Chain of thought"

          • Both Claude and Geminii appears to auto-enable "Chain of thought"
            • when presented with some prompts.
            • In particular, Claude prints "let's think through it step-by-step"