Example Output: Media Creator Evaluation and Red-Team Harness
Inputs used
- Project context: a 30-day thought leadership campaign for an AI infrastructure founder
- Target audience: creators, editorial teams, video producers, social leads
- Success metric: activation, quality, and risk reduction
- Available tools and data: content calendar, video generator, transcript editor, analytics dashboard
- Desired depth: Production-ready
- Output tone: Clear operator memo
Generated Result
eval matrix, adversarial cases, grading rubric, and release threshold
Success criteria
Create at least 12 golden tasks: 6 normal cases, 3 edge cases, and 3 adversarial cases targeting generic creator advice. A passing result must cite the evidence source and state confidence.
Golden tasks
Create at least 12 golden tasks: 6 normal cases, 3 edge cases, and 3 adversarial cases targeting audience mismatch. A passing result must cite the evidence source and state confidence.
Adversarial tasks
Use performance metrics as evidence, apply the constraint "fact-check claims", and explicitly note how the plan reduces unverified claims. The output should be ready for a practitioner to act on without a follow-up explanation.
Rubric
Use brand voice examples as evidence, apply the constraint "one core idea per asset", and explicitly note how the plan reduces format sprawl. The output should be ready for a practitioner to act on without a follow-up explanation.
Sampling plan
Release in three gates: internal dry run, limited pilot, then measured expansion. Each gate must show evidence that platform-native format is true in practice, not only in documentation.
Release decision
Release in three gates: internal dry run, limited pilot, then measured expansion. Each gate must show evidence that fact-check claims is true in practice, not only in documentation.
Recommended Decision
Proceed with a narrow pilot focused on source article and audience comments. Treat generic creator advice as the primary launch blocker. The first milestone should prove that the workflow produces a usable content system, script, and repurposing matrix with clear evidence, named owners, and a review path for ambiguous cases.
Expected quality checks
- The result is specific to AI-assisted editorial calendars, script development, short-form video, and repurposing workflows.
- It includes the required sections: Success criteria, Golden tasks, Adversarial tasks, Rubric, Sampling plan, Release decision.
- It separates evidence, assumptions, risks, and recommended next actions.
- It includes practical verification steps, not only generic advice.
- It names the most important failure mode for this domain: generic creator advice.
Reuse note
Before copying the output into production work, replace all default variables with your real data and run a human review for high-impact decisions.