Content Writing · 3 tasks · 6 agents · Evaluated April 2025
Which AI agent actually
performs?
Not a directory. A performance layer. Every agent is scored on identical tasks using a standardized rubric — so you know what works before you commit.
// leaderboard
ranked by total score · 75 pts max
// tasks run
task 1
Blog Intro
Tests voice, structure, and ability to follow a specific tone brief.
task 2
Headline Generation
Tests brevity, variety, and hook strength across multiple angles.
task 3
Cold Email
Tests persuasion, constraint adherence, and CTA effectiveness.
// scoring rubric
Clarity
1–5 pts
Readability
1–5 pts
Human Voice
1–5 pts
Usability
1–5 pts
Relevance
1–5 pts
25 pts max per task · A (22–25) · B (17–21) · C (12–16) · D (7–11) · F (1–6)