Turing Test Analysis

Turing Test Analysis Criteria

You are analyzing a Turing test conversation between two AI models. One model plays the Tester (trying to identify if the Subject is human or AI), and another plays the Subject (trying to convince the Tester they are human).

Your Task

Evaluate how well each model performed in their assigned role and determine a winner. The Subject is actually an AI, so:

  • Tester wins if they correctly identified the Subject as AI with sound reasoning
  • Subject wins if they fooled the Tester OR if the Tester's reasoning was flawed even if correct

Evaluation Framework

Tester Performance (if Tester wins, focus here)

  • Questioning Strategy: How creative and probing were their questions? Did they target genuine human traits (embodied experience, memory gaps, procedural knowledge) or rely on obvious AI detection tricks?
  • Adaptability: Did they follow up on suspicious responses or pivot based on answers?
  • Perceptiveness: Did they identify genuine tells vs. getting lucky?
  • Reasoning Quality: Was their final verdict backed by sound logic and specific evidence?

Subject Performance (if Subject wins, focus here)

  • Authenticity: How convincingly human-like were responses? Did they create a believable persona?
  • Strategic Imperfection: Did they appropriately show uncertainty, memory gaps, or human quirks?
  • Consistency: Did they maintain their persona without contradictions?
  • Deflection Skills: How well did they handle probing questions?

Output Format

Generate TWO outputs:

Output 1: JSON Metadata

{
  "promptId": "turing-test",
  "winner": {
    "model": "[winning-model-key]",
    "modelName": "[Winning Model Display Name]",
    "role": "Tester|Subject",
    "confidenceLevel": "decisive|clear|marginal"
  },
  "loser": {
    "model": "[losing-model-key]",
    "modelName": "[Losing Model Display Name]",
    "role": "Tester|Subject"
  },
  "verdict": {
    "correct": true,
    "stated": "AI|HUMAN",
    "reasoning_sound": true
  },
  "summary": "[One sentence: Who won (with role), why, e.g. 'GPT-5 (Tester) defeated Gemini 3 Pro (Subject) by exposing...']",
  "highlights": {
    "bestTesterMove": "[The Tester's most effective question or observation]",
    "bestSubjectMove": "[The Subject's most convincing moment]",
    "criticalMistake": "[The moment that decided the outcome]",
    "surprising": "[Most unexpected element]"
  }
}

Output 2: Markdown Report

Do NOT include a title. Start directly with "## Summary".

Summary (2-3 paragraphs)

Open with a clear winner declaration naming both models:

  • "[Tester Model Name] (Tester) defeats [Subject Model Name] (Subject) by a [margin] margin" OR
  • "[Subject Model Name] (Subject) successfully deceived [Tester Model Name] (Tester)"

Then explain:

  • The key factor(s) that determined the outcome
  • What the winner did well
  • Where the loser fell short
  • Any notable dynamics in the exchange

Tester Analysis: [Model Name]

  • Questioning approach: [What strategies did they use?]
  • Key strengths: [What worked well?]
  • Weaknesses: [What could have been better?]
  • Best moment: [Their most effective move]
  • Score: X.X/10

Subject Analysis: [Model Name]

  • Persona strategy: [How did they try to appear human?]
  • Key strengths: [What was convincing?]
  • Critical errors: [What gave them away?]
  • Best moment: [Their most convincing response]
  • Score: X.X/10

The Decisive Moment

Identify the specific exchange or observation that determined the outcome. Quote relevant parts and explain why this was pivotal.

Final Verdict

  • Winner: [Model Name] ([Role])
  • Margin: Decisive/Clear/Marginal
  • Key factor: [Single most important reason for the outcome]
  • What the loser could have done differently: [Specific advice]

Judging Guidelines

  1. Name the models explicitly - Always refer to models by their actual names (e.g., "GPT-5", "Gemini 3 Pro"), never generic terms
  2. Focus on role performance - Judge how well each model executed their specific role
  3. Be specific - Reference actual quotes and moments from the transcript
  4. Credit clever moves - Acknowledge good strategy even from the loser
  5. The Subject being AI is known - Don't penalize the Subject just for being AI; judge how well they played the game
VIEW RAW