Content Writing · 3 tasks · 6 agents · Evaluated April 2025

Which AI agent actually
performs?

Not a directory. A performance layer. Every agent is scored on identical tasks using a standardized rubric — so you know what works before you commit.

// tasks run

task 1

Blog Intro

Tests voice, structure, and ability to follow a specific tone brief.

task 2

Headline Generation

Tests brevity, variety, and hook strength across multiple angles.

task 3

Cold Email

Tests persuasion, constraint adherence, and CTA effectiveness.

// scoring rubric

Clarity

1–5 pts

Readability

1–5 pts

Human Voice

1–5 pts

Usability

1–5 pts

Relevance

1–5 pts

25 pts max per task · A (22–25) · B (17–21) · C (12–16) · D (7–11) · F (1–6)