Skip to main content

Self-Consistency

Self-Consistency is an advanced prompting technique designed to improve the reliability and accuracy of language model outputs, especially for complex reasoning tasks.

Core Concept 🎯

Instead of relying on a single response, generate multiple answers and pick the most frequent one!

Instead of relying on a single response, the model is prompted multiple times (often with different sampling seeds), and the most consistent answer among the outputs is selected as the final result.

This approach was introduced by Wang et al. (2022) to address the variability and occasional errors in chain-of-thought (CoT) reasoning, particularly for arithmetic and commonsense reasoning tasks.

Use When​

  • The task is ambiguous or has multiple possible answers.
  • You want to improve reliability by aggregating outputs.
  • The model’s responses vary significantly between runs.

Pattern​

  1. Prompt the model with the same question multiple times (using temperature sampling or different seeds).
  2. Collect all generated answers.
  3. Select the most frequent or consistent answer as the final output.

Examples​

Example 1: Arithmetic Reasoning (from Wang et al., 2022)​

Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?

Model outputs (sampled multiple times):

- When I was 6, my sister was 3. Now I am 70, so she is 70 - 3 = 67. The answer is 67.
- When I was 6, my sister was half my age, so she was 3. Now that I am 70, she is 70 - 3 = 67. The answer is 67.
- When I was 6, my sister was half my age, so she was 3. Now I am 70, so she is 70/2 = 35. The answer is 35.

Final answer (by majority):

67

Example 2: Sentiment Classification (new)​

Classify the sentiment of the following review: "The product exceeded my expectations."

Model outputs (sampled multiple times):

- Positive
- Positive
- Positive
- Neutral

Final answer (by majority):

Positive

Benefits​

  • Reliability: Reduces the impact of random errors or outlier responses.
  • Accuracy: Aggregates multiple reasoning paths to find the most likely answer.
  • Robustness: Especially useful for tasks with inherent ambiguity or multiple valid solutions.

Pitfalls​

  • Increases computational cost (requires multiple model runs).
  • May not resolve ambiguity if the model is inconsistent or biased.
  • Not always necessary for simple or deterministic tasks.

Self-Consistency Process​

The following diagram illustrates how self-consistency works:

Input Problem: "If a train travels 60 miles in 45 minutes, 
what's its speed in mph?"

Multiple Sampling Paths
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό β–Ό β–Ό
Sample 1 Sample 2 β”‚ Sample 3 Sample 4
β”‚
"60 miles in "45 min = β”‚ "Speed = "Distance = 60
45 min = 3/4 0.75 hr β”‚ distance/ Time = 45 min
hour. Speed = Speed = β”‚ time = = 0.75 hr
60 Γ· 0.75 = 60/0.75 = β”‚ 60/0.75 = Speed = 60/0.75
80 mph" 80 mph" β”‚ 80 mph" = 80 mph"
β”‚ β”‚ β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό β–Ό β–Ό
80 mph 80 mph β”‚ 80 mph 80 mph
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ VOTE & SELECT β”‚
β”‚ β”‚
β”‚ 80 mph: β–ˆβ–ˆβ–ˆβ–ˆ β”‚ ← Most frequent
β”‚ 75 mph: β–ˆ β”‚ ← Outlier
β”‚ 85 mph: β–ˆ β”‚ ← Outlier
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
Final Answer: 80 mph

This approach reduces errors by generating multiple reasoning paths and selecting the most consistent result.

References​